Sunday, December 20, 2015

Blob File System v.s. GFS/HDFS


Similarities

  • Master nodes
    • Both GFS and TFS (taobao file system) uses single master node with multiple slaves. 
    • My comments: facebook does not use such approaches? I think it uses P2P approach.

Differences

  • Functions
    • GFS/HDFS are more popular
      • On top of it, we can build the big tables such as BigTable, Hypertable, HBase.
    • Blob File System
      • Usually used for Photos, Albums (These are called Blob Data)
  • Challenges
    • Blob FS
      • For each write, it will request the master node to assign a blob number and machine lists to write to. 
      • Challenge
        • The volume of meta-data is of huge size
          • E.g., Taobao has more than 10G photos, assume each photo has meta-data of size 20Bytes, the total size will be 20*10 = 200G, much more than the memory of a single machine.
      • Solution
        • The meta-data is not stored in Blob FS.
        • The meta-data are stored in external systems.
          • e.g., Taobao TFS has an id for each photo, the id are stored in external databases, such as Oracle or Mysql sharding cluster.
        • Blob FS use chunk to organize data.
          • Every blob file is a logical file. 
          • Every chuck in a physical file.
          • Multiple logical file will share a physical file, so that it can reduces the number of physical files. 
            • All meta-data for physical files can be in memory, thus every read of Blob file only needs one I/O access.

    • HDFS/GFS
      • GFS v2 may be able to combine GFS and Blob FS into a system. It is difficult to do so, since
        • It needs to support both large and small files.
        • The size of meta-data is too large and thus the master nodes also need to be distributed.

No comments:

Post a Comment