Sunday, December 20, 2015

Facebook Distributed System Optimization via Asynchronization -- Big Data Queries

Asynchronization Query

  • Asynchronization Query
    • Each query is asynchronous, all functions return “Future Object"
    • Every DB query are divided into two parts
      • Set request
      • Receive response
  • Future Object tree

    • Each future object has two states
      • Waiting for execution
      • Finished execution
    • Once the tree structure is constructed, the execution will start from the bottom to the root. 
      • When the root finishes execution, it means the page loading is completed.
    • Lazy manner
      • The execution process is lazy, since it first construct the trees and then execute. This is similar with Spark map-reduce, the functions forms a DAG structure, only when a node is being needed will its predecessor be executed.

Memcache

  • In terms of the problem of which query should be executed first, it should not be depended in the coding process.
    • But, there should be an extra phase to determine such kind of schedule. 
  • Importance of which query to be executed first
    • "比如我们现在有两个查询需求。一个是查询你在淘宝上买过东西的朋友,另一个是查询你在淘宝上买过保时捷的朋友。常理来说,我们会先想到查询你在淘宝上的朋友,再进行另一个条件的查询,比如这样:”
      IdList friends = waitFor(getFriends(myId));
      yield return getTaoBaoBuyers(friends);
      
      但是对于保时捷这个查询而言,这是不对的,因为淘宝上买保时捷的人是很少的,可能就一两个,而淘宝上的好友数可能有上百。因此保时捷的查询应该是这个次序比较优化:
      IdList buyers = waitFor(getPorscheBuyer());
      yield return getFriends(buyers);



Reference
[1] http://www.infoq.com/cn/news/2015/04/async-distributed-haiping

No comments:

Post a Comment