ÁÖ¿ä ¾Ë°í¸®Áò
¿ì¼± cephÀÇ °æ¿ì metadata ¼¹ö³ª storage ¼¹ö¿¡¼ Á÷Á¢ °è»êÇÏ´Â ¹æ½ÄÀÌ ¾Æ´Ñ client¿¡¼ policy±â¹ÝÀ¸·Î °è»êÀ» ÇÏ¿© objectÀÇ À§Ä¡¸¦ Ž»öÇÑ´Ù. ÇØ´çÇÏ´Â ¾Ë°í¸®ÁòÀÌ CRUSH ¾Ë°í¸®ÁòÀÌ´Ù.
CRUSH (Controlled Replication Under Scalable Hashing)
- Ŭ·¯½ºÅͳ»¿¡ objectÀÇ Áö´ÉÀûÀÎ ºÐ»ê ÀÛ¾÷À» ¼öÇàÇÑ´Ù.
- RADOS¿¡¼ »ç¿ëÇÏ´Â ¾Ë°í¸®ÁòÀ¸·Î objectÀÇ À§Ä¡¸¦ °è»êÀ» ÅëÇØ °áÁ¤ÇÑ´Ù.
- °è»êµÈ µ¥ÀÌÅÍ ÀúÀå¼Ò¿¡ ÀÇÇØ ¾î¶»°Ô µ¥ÀÌÅ͸¦ ÀúÀåÇÏ°í °¡Á®¿ÃÁö °áÁ¤ÇÏ´Â ¾Ë°í¸®Áò
- ceph client´Â Áß¾Ó¿¡ ¼¹ö³ª ºê·ÎÄ¿¸¦ ÅëÇϱ⺸´Ù Á÷Á¢ OSD¿Í Åë½ÅÇÑ´Ù.
- CRUSH´Â ºü¸£°Ô °è»êÀÌ µÇ°í lookup °úÁ¤ÀÌ ¾ø´Ù. ¶ÇÇÑ inputÀÌ output¿¡ ¿µÇâÀ» ¹ÌÄ¡°í °á°ú°¡ º¯ÇÏÁö ¾Ê´Â´Ù.
- ruleset
- pool¿¡ ÇÒ´çµÇ°í ceph client°¡ µ¥ÀÌÅ͸¦ ÀúÀåÇϰųª °¡Á®°¥¶§ CRUSH ruleset¸¦ È®ÀÎÇÑ´Ù.
- object¿¡ ´ëÇÑ PG°¡ Æ÷ÇÔµÈ primary OSD¸¦ ½Äº°ÇÑ´Ù.
- ÇØ´ç rulesetÀº ceph client°¡ OSD¿¡ Á÷Á¢ ¿¬°áÇÏ¿© µ¥ÀÌÅÍÀÇ ¾²±â/Àб⸦ ÇÏ°Ô ÇÑ´Ù.
- workflow
1. PG ÇÒ´ç
- pg = Hash(object name) % num pg
2. CRUSH(pg, cluster map, rule) ÇÔ¼ö °è»ê
- ÇØ´ç CRUSH¸¦ °ÅÄ¡¸é OSD.(NUM)ÀÌ ³ª¿È
Áï, ÀúÀåµÉ OSD number°¡ ³ª¿Í ÇØ´ç OSD¿¡ ÀúÀåµÊ
- https://javiermunhoz.com/blog/2016/04/30/scalable-placement-of-replicated-data-in-ceph.html
- Cluster map
- °èÃþÀûÀÎ OSD map
- Failure Domain
- Pseudo-Random (À¯»ç³¼ö)
(http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/)
- https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
- https://www.slideshare.net/Yuryu/what-you-need-to-know-about-ceph
ÁÖ¿ä°³³ä
cluster > pool > PG > object
1. cluster
- ceph ¹èÆ÷¿¡ ±âº»ÀÌ´Ù.
- default ´Â ceph·Î create µÈ´Ù.
- ´Ù¼öÀÇ cluster¸¦ »ý¼ºÇÒ¼öµµ ÀÖ´Ù.
(¸¸¾à cluster¸¦ ´Ù¼ö »ý¼ºÇÏ´Â °æ¿ì¶ó¸é cluster ¿¡ ÀûÀýÇÑ port¸¦ ¼³Á¤ÇÏ¿© conflictÀÌ ¹ß»ýµÇÁö ¾Êµµ·Ï ÇÑ´Ù.)
- http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-new/
2. pool
- PoolÀº CephÀÇ ³í¸®Àû ÆÄƼ¼ÇÀ» ÀǹÌÇÑ´Ù.
- µ¥ÀÌÅÍ Å¸ÀÔ¿¡ µû¸¥ PoolÀ» »ý¼ºÇÒ ¼ö ÀÖ´Ù. (¿¹¸¦ µé¾î block devices, object gateways µî)
- pool ¸¶´Ù pg °¹¼ö / replica °¹¼ö¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖ´Ù.
Áï, pool¸¶´Ù º¹Á¦µÉ object °¹¼ö¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖ´Ù.
- pool type : replication, Erasure Code ÀÌ·¸°Ô µÎ°³ÀÇ ¹æ½ÄÀÌ »ç¿ëµÈ´Ù.
3. placement group(PG)
- exabyte scaleÀÇ storage cluster¿¡¼´Â ¸î¹é¸¸°³, ±×ÀÌ»óÀÇ object°¡ Á¸ÀçÇÒÁö ¸ð¸¥´Ù.
ÀÌ·¯ÇÑ È¯°æ¿¡¼ÀÇ °ü¸®°¡ ¾î·Æ±â¿¡ pg¸¦ µÎ¾î poolÀÇ Á¶°¢À» ¸¸µç´Ù.
- ¸¸¾à cluster sizeº¸´Ù »ó´ëÀûÀ¸·Î ÀûÀº PG¸¦ °¡Áú °æ¿ì PG¸¶´Ù ³Ê¹« ¸¹Àº data¸¦ °¡Áö°Ô µÇ¾î ÁÁÀº ¼º´ÉÀ» ³¾¼ö ¾ø´Ù.
- pool¿¡ ´ëÇÑ pgÀÇ °¹¼ö´Â ´ÙÀ½ÀÇ °ø½ÄÀ» »ç¿ëÇØ °è»êÇÑ´Ù.
(osds X 100) / replicas = (ceph osd stat X 100) / ceph osd pool get [pool_name] size
4. object
- ½ÇÁ¦ OSD¿¡ ÀúÀåµÇ´Â dataÀÌ´Ù.
- ID + binary data + metadata (key/value)
½ÇÁ¦ µ¥ÀÌÅ͸¦ ÀúÀåÇÏ°í °¡Á®¿À´Â ÀÏ·ÃÀÇ °úÁ¤
Client´Â ÃÖÃÊ Monitor¿Í Cluster MapÀ» ¿äûÇÏ¿© Àü´Þ¹Þ°í Cluster Map¿¡ ¿äûÇÏ·Á Çß´ø DATAÀÇ À§Ä¡¸¦ È®ÀÎÇÏ°í ÇØ´ç OSD¿Í Á÷Á¢ ¿¬°áÀ» ¼öÇàÇÏ¿© µ¥ÀÌÅÍ¿¡ ´ëÇÑ OperationÀ» ¼öÇàÇÑ´Ù.
1. ceph client´Â ceph monitor¿¡ Á¢¼ÓÇÏ¿© ÃÖ½ÅÀÇ cluster mapÀ» °¡Á®¿Â´Ù.
(cluster mapÀº nodeÀÇ up/down »óÅÂÁ¤º¸¸¦ °¡Áö°í ÀÖ´Ù.)
2. ceph client´Â cluster mapÀ» cacheÇÏ°í ¾÷µ¥ÀÌÆ®°¡ °¡´ÉÇÑ °æ¿ì »õ·Î¿î mapÀ» °¡Á®¿Â´Ù.
3. ceph client¿¡¼´Â data¸¦ object/pool id°¡ Æ÷ÇÔµÈ object·Î º¯È¯ÇÑ´Ù.
4. ceph client¿¡¼´Â CRUSH algorithmÀ» »ç¿ëÇØ PG¿Í Primary OSD¸¦ °áÁ¤ÇÑ´Ù.
5. ceph client´Â primary OSD¸¦ ¿¬°áÇÏ¿© data¸¦ Á÷Á¢ ÀúÀåÇÏ°í °¡Á®¿Â´Ù.
(client¿¡¼ OSD¿¡ Á÷Á¢ data¸¦ ¾´´Ù.)
6. Primary OSD´Â Crush LookupÀ» ½ÇÇàÇÏ¿© secondary PGs¿Í OSD¸¦ °áÁ¤ÇÑ´Ù.
7. replicate pool¿¡¼´Â primary OSD´Â secondary OSDµé¿¡ data¸¦ Àü¼ÛÇÏ°í object¸¦ º¹»çÇÑ´Ù.
8. erasure coding pool¿¡¼´Â primary OSD´Â object¸¦ chunk·Î ºÐÇÒÇÏ°í chunkµéÀ» encode ÇÏ¿© secondary OSDµé¿¡ writing ÇÑ´Ù.
-
https://youtu.be/799XZyHOuHA
°³ÀÎÀûÀ¸·Î Á¤¸®ÇÑ ³»¿ëÀ» ¾Æ·¡¿Í °°Àº diagramÀ¸·Î Ç¥ÇöÇغ¸¾Ò´Ù.
Object ÀÇ Á¤º¸
¾Æ·¡ command¸¦ ÅëÇØ object ´ëÇÑ Á¤º¸¸¦ È®ÀÎÇÒ ¼ö ÀÖ´Ù.
¨ç OSD map version
¨è pool name
¨é pool ID
¨ê object name
¨ë placement group ID (object °¡ ¼ÓÇÑ)
¨ì OSD 5,6 ¹ø¿¡ ¼ÓÇØ ÀÖ°í ÇöÀç UP
¨í OSD 5,6 ¹ø¿¡ ¼ÓÇØ ÀÖ°í ÇöÀç µ¿ÀÛÁß
- http://ceph.com/geen-categorie/how-data-is-stored-in-ceph-cluster/