¿ù°£ Àα⠰Խù°

°Ô½Ã¹° 1,355°Ç
   
RADOS(Reliable Autonomic Distributed Object Store) ¼³¸í
±Û¾´ÀÌ : ÃÖ°í°ü¸®ÀÚ ³¯Â¥ : 2019-03-06 (¼ö) 13:26 Á¶È¸ : 6090
±ÛÁÖ¼Ò :
                                
RADOS(Reliable Autonomic Distributed Object Store)

- RADOS ÀǹÌ
   - Reliable : º¹Á¦¸¦ ÅëÇØ µ¥ÀÌÅÍ ºÐ½ÇÀ» ȸÇÇÇÑ´Ù.
   - Autonomic : ¼­·Î Åë½ÅÇÏ¿© failure¸¦ °¨ÁöÇÏ°í Åõ¸íÇÑ º¹Á¦¸¦ ¼öÇà
   - Distributed
   - Object Store

- CephÀÇ ±â¹ÝÀÌ¸ç ¸ðµç °ÍµéÀº RADOS³»¿¡ ÀúÀåµÈ´Ù.
- CephÀÇ Áß¿äÇÑ ±â´ÉÀ» Á¦°øÇÑ´Ù.(ºÐ»ê ¿ÀºêÁ§Æ® ÀúÀå, HA, ½Å·Ú¼º, no SPOF, self healing, self managing µî)
- CRUSH ¾Ë°í¸®ÁòÀ» Æ÷ÇÔÇÑ´Ù.
- ¸ðµç Data´Â RADOS¸¦ °ÅÃÄ object·Î ÀúÀåÇÏ°Ô µÈ´Ù.
- µ¥ÀÌÅÍ Å¸ÀÔÀ» ±¸ºÐÇÏÁö ¾Ê´Â´Ù.(Áï, object, block, file µîÀ» ±¸ºÐÇÏÁö ¾Ê°í ÃÖÁ¾ object·Î ÀúÀåÇÑ´Ù.)
- object chunk´Â 4MB ÀÌ´Ù.



1. librados : C/C++/Python/Ruby µîÀÌ »ç¿ë°¡´ÉÇϸç TCP/IP Åë½ÅÀ» ±â¹ÝÇÑ raw socket¸¦ »ç¿ëÇÏ¿© Åë½ÅÀÌ ÀÌ·ç¾îÁø´Ù. HTTP¸¦ »ç¿ëÇÏÁö ¾Ê°í direct·Î ¿¬°áÀÌ µÇ±â ¶§¹®¿¡ HTTP overhead¸¦ °¡ÁöÁö ¾Ê´Â´Ù.
- ½ÇÁ¦ librados¸¦ ÅëÇØ ¿¬°áµÇ´Â ¹æ½ÄÀº ´ÙÀ½°ú °°´Ù.

import rados
cluster = rados.Rados(conffile='ceph.conf')
cluster.connect()
cluster_stats = cluster.get_cluster_stats()


2. RadosGW : Rest (ȤÀº S3, Swift µî) interface ¸¦ ÅëÇØ Rados cluster ¿¡ object¸¦ ÀúÀåÇÑ´Ù.

3. RBD : Block Storage·Î ´ÙÀ½°ú °°Àº ±â´ÉÀ» Á¦°øÇÑ´Ù.
- RADOS¿¡¼­ disk imageÀÇ ÀúÀå¼Ò
- Host·ÎºÎÅÍ VMµéÀ» ºÐ¸®ÇÑ´Ù.
- object block size´Â ±âº»ÀÌ 4MÀÌ°í 4kºÎÅÍ 32M±îÁö ¼³Á¤ÀÌ °¡´ÉÇÏ´Ù.
  (http://docs.ceph.com/docs/kraken/man/8/rbd/)
- kernel moduleÀ» 2.6.39ºÎÅÍ °ø½Ä Áö¿øÇÑ´Ù.

4. CephFS : RADOS³»¿¡ Á¸ÀçÇÏ´Â metadata ¼­¹ö¸¦ ¿¬°áÇÏ¿© file Á¤º¸¸¦ È®ÀÎÇÑÈÄ ¿¬°áÀ» ¼öÇàÇÑ´Ù.




ÁÖ¿ä ¾Ë°í¸®Áò

¿ì¼± cephÀÇ °æ¿ì metadata ¼­¹ö³ª storage ¼­¹ö¿¡¼­ Á÷Á¢ °è»êÇÏ´Â ¹æ½ÄÀÌ ¾Æ´Ñ client¿¡¼­ policy±â¹ÝÀ¸·Î °è»êÀ» ÇÏ¿© objectÀÇ À§Ä¡¸¦ Ž»öÇÑ´Ù. ÇØ´çÇÏ´Â ¾Ë°í¸®ÁòÀÌ CRUSH ¾Ë°í¸®ÁòÀÌ´Ù.


CRUSH (Controlled Replication Under Scalable Hashing)
- Ŭ·¯½ºÅͳ»¿¡ objectÀÇ Áö´ÉÀûÀÎ ºÐ»ê ÀÛ¾÷À» ¼öÇàÇÑ´Ù.
- RADOS¿¡¼­ »ç¿ëÇÏ´Â ¾Ë°í¸®ÁòÀ¸·Î objectÀÇ À§Ä¡¸¦ °è»êÀ» ÅëÇØ °áÁ¤ÇÑ´Ù.
- °è»êµÈ µ¥ÀÌÅÍ ÀúÀå¼Ò¿¡ ÀÇÇØ ¾î¶»°Ô µ¥ÀÌÅ͸¦ ÀúÀåÇÏ°í °¡Á®¿ÃÁö °áÁ¤ÇÏ´Â ¾Ë°í¸®Áò
- ceph client´Â Áß¾Ó¿¡ ¼­¹ö³ª ºê·ÎÄ¿¸¦ ÅëÇϱ⺸´Ù Á÷Á¢ OSD¿Í Åë½ÅÇÑ´Ù.
- CRUSH´Â ºü¸£°Ô °è»êÀÌ µÇ°í lookup °úÁ¤ÀÌ ¾ø´Ù. ¶ÇÇÑ inputÀÌ output¿¡ ¿µÇâÀ» ¹ÌÄ¡°í °á°ú°¡ º¯ÇÏÁö ¾Ê´Â´Ù.

- ruleset
  - pool¿¡ ÇÒ´çµÇ°í ceph client°¡ µ¥ÀÌÅ͸¦ ÀúÀåÇϰųª °¡Á®°¥¶§ CRUSH ruleset¸¦ È®ÀÎÇÑ´Ù.
  - object¿¡ ´ëÇÑ PG°¡ Æ÷ÇÔµÈ primary OSD¸¦ ½Äº°ÇÑ´Ù.
  - ÇØ´ç rulesetÀº ceph client°¡ OSD¿¡ Á÷Á¢ ¿¬°áÇÏ¿© µ¥ÀÌÅÍÀÇ ¾²±â/Àб⸦ ÇÏ°Ô ÇÑ´Ù.

- workflow
  1. PG ÇÒ´ç
    - pg = Hash(object name) % num pg
  2. CRUSH(pg, cluster map, rule) ÇÔ¼ö °è»ê
    - ÇØ´ç CRUSH¸¦ °ÅÄ¡¸é OSD.(NUM)ÀÌ ³ª¿È
      Áï, ÀúÀåµÉ OSD number°¡ ³ª¿Í ÇØ´ç OSD¿¡ ÀúÀåµÊ
       - https://javiermunhoz.com/blog/2016/04/30/scalable-placement-of-replicated-data-in-ceph.html

  


- Cluster map
  - °èÃþÀûÀÎ OSD map
- Failure Domain
- Pseudo-Random (À¯»ç³­¼ö)
  (http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/)
https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
https://www.slideshare.net/Yuryu/what-you-need-to-know-about-ceph


ÁÖ¿ä°³³ä
cluster > pool > PG > object
1. cluster
   - ceph ¹èÆ÷¿¡ ±âº»ÀÌ´Ù.
   - default ´Â ceph·Î create µÈ´Ù.
   - ´Ù¼öÀÇ cluster¸¦ »ý¼ºÇÒ¼öµµ ÀÖ´Ù.
     (¸¸¾à cluster¸¦ ´Ù¼ö »ý¼ºÇÏ´Â °æ¿ì¶ó¸é cluster ¿¡ ÀûÀýÇÑ port¸¦ ¼³Á¤ÇÏ¿© conflictÀÌ ¹ß»ýµÇÁö ¾Êµµ·Ï ÇÑ´Ù.)
   - http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-new/

2. pool
   - PoolÀº CephÀÇ ³í¸®Àû ÆÄƼ¼ÇÀ» ÀǹÌÇÑ´Ù. 
   - µ¥ÀÌÅÍ Å¸ÀÔ¿¡ µû¸¥ PoolÀ» »ý¼ºÇÒ ¼ö ÀÖ´Ù. (¿¹¸¦ µé¾î block devices, object gateways µî)
   - pool ¸¶´Ù pg °¹¼ö / replica °¹¼ö¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖ´Ù.
     Áï, pool¸¶´Ù º¹Á¦µÉ object °¹¼ö¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖ´Ù.
   - pool type : replication, Erasure Code ÀÌ·¸°Ô µÎ°³ÀÇ ¹æ½ÄÀÌ »ç¿ëµÈ´Ù.

3. placement group(PG)
   - exabyte scaleÀÇ storage cluster¿¡¼­´Â ¸î¹é¸¸°³, ±×ÀÌ»óÀÇ object°¡ Á¸ÀçÇÒÁö ¸ð¸¥´Ù.
     ÀÌ·¯ÇÑ È¯°æ¿¡¼­ÀÇ °ü¸®°¡ ¾î·Æ±â¿¡ pg¸¦ µÎ¾î poolÀÇ Á¶°¢À» ¸¸µç´Ù.
   - ¸¸¾à cluster sizeº¸´Ù »ó´ëÀûÀ¸·Î ÀûÀº PG¸¦ °¡Áú °æ¿ì PG¸¶´Ù ³Ê¹« ¸¹Àº data¸¦ °¡Áö°Ô µÇ¾î ÁÁÀº ¼º´ÉÀ» ³¾¼ö ¾ø´Ù.
   - pool¿¡ ´ëÇÑ pgÀÇ °¹¼ö´Â ´ÙÀ½ÀÇ °ø½ÄÀ» »ç¿ëÇØ °è»êÇÑ´Ù.
     (osds X 100) / replicas = (ceph osd stat  X 100) / ceph osd pool get [pool_name] size

4. object
   - ½ÇÁ¦ OSD¿¡ ÀúÀåµÇ´Â dataÀÌ´Ù.
   - ID + binary data + metadata (key/value)


½ÇÁ¦ µ¥ÀÌÅ͸¦ ÀúÀåÇÏ°í °¡Á®¿À´Â ÀÏ·ÃÀÇ °úÁ¤
Client´Â ÃÖÃÊ Monitor¿Í Cluster MapÀ» ¿äûÇÏ¿© Àü´Þ¹Þ°í Cluster Map¿¡ ¿äûÇÏ·Á Çß´ø DATAÀÇ À§Ä¡¸¦ È®ÀÎÇÏ°í ÇØ´ç OSD¿Í Á÷Á¢ ¿¬°áÀ» ¼öÇàÇÏ¿© µ¥ÀÌÅÍ¿¡ ´ëÇÑ OperationÀ» ¼öÇàÇÑ´Ù.

1. ceph client´Â ceph monitor¿¡ Á¢¼ÓÇÏ¿© ÃÖ½ÅÀÇ cluster mapÀ» °¡Á®¿Â´Ù.
   (cluster mapÀº nodeÀÇ up/down »óÅÂÁ¤º¸¸¦ °¡Áö°í ÀÖ´Ù.)

2. ceph client´Â cluster mapÀ» cacheÇÏ°í ¾÷µ¥ÀÌÆ®°¡ °¡´ÉÇÑ °æ¿ì »õ·Î¿î mapÀ» °¡Á®¿Â´Ù.
3. ceph client¿¡¼­´Â data¸¦ object/pool id°¡ Æ÷ÇÔµÈ object·Î º¯È¯ÇÑ´Ù.
4. ceph client¿¡¼­´Â CRUSH algorithmÀ» »ç¿ëÇØ PG¿Í Primary OSD¸¦ °áÁ¤ÇÑ´Ù.

5. ceph client´Â primary OSD¸¦ ¿¬°áÇÏ¿© data¸¦ Á÷Á¢ ÀúÀåÇÏ°í °¡Á®¿Â´Ù.
   (client¿¡¼­ OSD¿¡ Á÷Á¢ data¸¦ ¾´´Ù.)
6. Primary OSD´Â Crush LookupÀ» ½ÇÇàÇÏ¿© secondary PGs¿Í OSD¸¦ °áÁ¤ÇÑ´Ù.
7. replicate pool¿¡¼­´Â primary OSD´Â secondary OSDµé¿¡ data¸¦ Àü¼ÛÇÏ°í object¸¦ º¹»çÇÑ´Ù.
8. erasure coding pool¿¡¼­´Â primary OSD´Â object¸¦ chunk·Î ºÐÇÒÇÏ°í chunkµéÀ» encode ÇÏ¿© secondary OSDµé¿¡ writing ÇÑ´Ù.
https://youtu.be/799XZyHOuHA


°³ÀÎÀûÀ¸·Î Á¤¸®ÇÑ ³»¿ëÀ» ¾Æ·¡¿Í °°Àº diagramÀ¸·Î Ç¥ÇöÇغ¸¾Ò´Ù.



Object ÀÇ Á¤º¸
¾Æ·¡ command¸¦ ÅëÇØ object ´ëÇÑ Á¤º¸¸¦ È®ÀÎÇÒ ¼ö ÀÖ´Ù.

¨ç OSD map version
¨è pool name
¨é pool ID
¨ê object name
¨ë placement group ID (object °¡ ¼ÓÇÑ)
¨ì OSD 5,6 ¹ø¿¡ ¼ÓÇØ ÀÖ°í ÇöÀç UP
¨í OSD 5,6 ¹ø¿¡ ¼ÓÇØ ÀÖ°í ÇöÀç µ¿ÀÛÁß
   - http://ceph.com/geen-categorie/how-data-is-stored-in-ceph-cluster/



À̸§ Æнº¿öµå
ºñ¹Ð±Û (üũÇÏ¸é ±Û¾´À̸¸ ³»¿ëÀ» È®ÀÎÇÒ ¼ö ÀÖ½À´Ï´Ù.)
¿ÞÂÊÀÇ ±ÛÀÚ¸¦ ÀÔ·ÂÇϼ¼¿ä.
   

 



 
»çÀÌÆ®¸í : ¸ðÁö¸®³× | ´ëÇ¥ : ÀÌ°æÇö | °³ÀÎÄ¿¹Â´ÏƼ : ·©Å°´åÄÄ ¿î¿µÃ¼Á¦(OS) | °æ±âµµ ¼º³²½Ã ºÐ´ç±¸ | ÀüÀÚ¿ìÆí : mojily°ñ¹ðÀÌchonnom.com Copyright ¨Ï www.chonnom.com www.kyunghyun.net www.mojily.net. All rights reserved.