Overview
HDFS is a simplified version of GFS.
Similarities
- Master and Slaves
- Both GFS and HDFS use single master + multiple slaves mode.
- The master node maintains the check-point, data migration, log
- Data blocks and replication
- It maintains multiple copies (usually 3) to support better reliability and performance
- Tree structure
- It maintains a tree-structure file system, and allows operations like those under Linux system
- copy, rename, move, copy, delete etc.
Differences
- File appends
- GFS
- allow multiple appends and allow multiple clients to append simultaneously
- if every append will visit the master node, it will be of low efficiency.
- GFS use "Leasing Mechanism" to deliver the write permission of Chunk to Chunk Server.
- Check server can write the chunks within the lease (e.g., 12s).
- Since multiple servers may write simultaneously, and the API is asynchronous, the records might be in different order. This makes the system design very complicated.
- HDFS
- Only allow one open and data append
- The client will first write the data in local tmp file, and when the size of tmp data reach the size of a chunk (64M), then it will ask the HDFS master to assign a machine and chucn number to write the Chuck data.
- Advantage
- The master will not be bottleneck. Since each write only occur when the data accumulated to be up to 64M.
- Disadvantage
- If the machine down in the process, some logs are not in the HDFS, and it might lose some data.
- Master failure
- GFS
- Backup master node. When the main master node fails, a new master node will be voted from the backup nodes.
- Support snapshot by using "copy on write" approach.
- HDFS
- HDFS needs human-interations in terms of failure.
- Does not support snapshot.
- Garbage Collection (GC)
- GFS
- Lazy GC.
- It will marks the files to be deleted (e.g., rename the file to one contains time information), thus the files will not be able to be visited by normal users.
- The master node will periodically check the files and delete the out-date ones (usually the files with more than 3 days).
- HDFS
- HDFS use simple and directly delete mechanism.
No comments:
Post a Comment