Blob File System v.s. GFS/HDFS
Similarities
- Master nodes
- Both GFS and TFS (taobao file system) uses single master node with multiple slaves.
- My comments: facebook does not use such approaches? I think it uses P2P approach.
Differences
- Functions
- GFS/HDFS are more popular
- On top of it, we can build the big tables such as BigTable, Hypertable, HBase.
- Blob File System
- Usually used for Photos, Albums (These are called Blob Data)
- Challenges
- Blob FS
- For each write, it will request the master node to assign a blob number and machine lists to write to.
- Challenge
- The volume of meta-data is of huge size
- E.g., Taobao has more than 10G photos, assume each photo has meta-data of size 20Bytes, the total size will be 20*10 = 200G, much more than the memory of a single machine.
- Solution
- The meta-data is not stored in Blob FS.
- The meta-data are stored in external systems.
- e.g., Taobao TFS has an id for each photo, the id are stored in external databases, such as Oracle or Mysql sharding cluster.
- Blob FS use chunk to organize data.
- Every blob file is a logical file.
- Every chuck in a physical file.
- Multiple logical file will share a physical file, so that it can reduces the number of physical files.
- All meta-data for physical files can be in memory, thus every read of Blob file only needs one I/O access.
- HDFS/GFS
- GFS v2 may be able to combine GFS and Blob FS into a system. It is difficult to do so, since
- It needs to support both large and small files.
- The size of meta-data is too large and thus the master nodes also need to be distributed.
No comments:
Post a Comment