(Deprecated) Lisa's Tech Blog: December 2015

Thursday, December 31, 2015

Consistency and Replication (II)

Basic Architecture for Replica Data Management

System Model

Five phases in performing a request

Front end issues the request

Either sent to a single replica or multicast to all replicate managers

Coordination

Replica managers coordinate in preparation for the execution of the request

i.e., agree is requests is to be performed and the ordering of the request relative to others

Execution
Agreement

Reach consensus on effect of the request, e.g., agree to commit or abort in a transactional system

Response

Mechanism for Sequential Consistency

Primary-based replication protocols
Replicated-write protocols

Active replication using multicast communication
Quorum-based protocols

Primary-backup Model

Front ends only communicate with primary

Procedures

Request

FE issues a request containing a unique identifier to the primary replica manager

Coordination

The primary takes each request in the order in which it receives it

Execution

The primary executes the request and store the response

Agreement

If the request is an update, the primary sends the updated state, the response and the unique id to all backups. The backups send an acknowledgement.

Response

The primary responds to the front end, which hands the response back to the client.

Implementation

Implements linearizability if primary is correct, since primary sequences all the operations

Failures

If primary fails, the system retains linearizabilty if a single back becomes the new primary and if the new system configuration takes over exactly where the last left off

If primary fails, it should be replaced with a unique backup
Replica managers that survive have to agree upon which operation has been performed when the replacement primayr is over
Requirements met if replica managers organized as a group and if primary uses view-synchronous communication to propagate updates to backups.

Replicate-Write Protocol

Active replication

Front end multicasts request to each replica using a totally ordered reliable multicast
System achieves sequential consistency but not linearizability

Total order in which replica managers process requests may not be the same as real-time order in which clients made request.

Implementing Ordered Multicast

Incoming messages are held back in a queue until delivery guarantees can be met

The hold-back queue for arriving multicast messages

Coordinate all machines needed to determine delivery order
FIFO-ordering

Easy
Use a separate sequence number for each process

Total ordering

Use a sequencer
Distributed algorithm with three phases

Causal order

Use vector timestamps

The ISIS algorithm for total ordering

Causal Ordering using Vector Timestamps

Quorum-based Protocols

Procedures

Assign a number of votes to each replica
Let N be the total number of votes
Define R = read quorum, W = write quorum
R+W > N
W > N/2

guarantee that no two writes at the same time
since if yes, than the vote for w_1 and w_2 are larger than N

Only one writer at a time can achieve write quorum
Every reader sees at least one copy of the most recent read (takes one with most recent version number)

Examples

Scaling

None of the protocols for sequential consistency scale
To read or write, you have to either

Contact a primary copy
Use reliable totally ordered multicast
Contact over half the replicas

All this complexity is to ensure sequential consistency

Even the protocols for causal consistency and FIFO consistency are difficult to scale if they use reliable multicast

Can we weaken sequential consistency without losing some important features?

Highly Available Service

Emphasis on giving clients access to the service with reasonable response time, even if some results do not conform to sequential consistency
Examples

Gossip

Relaxed consistency

Causal update ordering

Bayou

Eventual consistency
Domain-specific conflict detection and resolution

Coda (file system)

Disconnected operation
Use vector timestamp to detect conflicts

Consistency and Replication (I)

1. Replication

Motivation

Performance enhancement
Enhanced availability/reliability
Fault tolerance
Scalability

Tradeoff between benefits of replication and work required to keep replica consistent

Requirements

Consistency

Depends upon applications
In many applications, we want different clients making (read/write) requests to different replicas of the same logical items should not obtain different results

Replica transparency

Desirable for most application

2. Consistency Models

Consistency model is a contract between processes and a data store

If process follow certain rules, then the store will work correctly

Needed for understanding how concurrent read and writes behave w.r.t shared data
Relevant for shared memory multiprocessors

Cache coherence algorithms

Shared database, files

Independent operations
transactions

Strict Consistency

Any read on a data item x returns a value corresponding to the result of the most recent write on x
Challenge

It requires absolute global time

Sequential Consistency

The result of any execution is the same as if the read and write operation by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.
i.e., required to be seen by the same process in the same order
Example

Linearizability

Definition of sequential consistency says nothing about time

There is no reference to the "most recent" write operation

Liearizability

Weaker than strict consistency, stronger than sequential consistency

Operations are assumed to receive a time stamp with a global available lock that is loosely synchronized

The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.

Causal Consistency

Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
Example

(a) is not casual-consistent, since in process P2, a performs before b, so that in process P3 and P4, a should also performs before b
While in (b) there is not casual relationship between a and b

FIFO consistency

Writes done by a single process are seen by all other processes in the order in which there were issues.
But writes from different processes may be seen in a different order by different process.

Weak Consistency

Access to synchronization variable associated with a data store are sequential consistent.
No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere
No read or write operation on data items are allowed to be performed until all previous operations to synchronization variable have been performed.

Release Consistency

Before a read or write operation on a shard data is performed, all previous requires done by the process must have completed successfully.
Before a release is allowed to be performed, all previous reads and writes by the process must have completed.
Access to synchronization variables are FIFO consistent (sequential consistent is not required).

Entry Consistency

An acquire process of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.
Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode.
After an exclusive mode access to a synchronization variable has been performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner.

3. A summary of Consistency Models

4. Weak Consistency Models

The weak consistency models that use synchronization variable (release, entry consistency) are mostly relevant to shared multiprocessor systems

Also modern CPU with multiple pipelines, out-of-order instruction execution, asynchronous writes, etc.

In distributed systems, weak consistency typically refers to weaker consistency models than sequential consistency

Casual consistency

e.g., used in the Gossip system

Optimistic approaches such as those used in Bayou, Coda that use application specific operations to achieve eventual consistency

5. Eventual Consistency

Session Guarantees

When clients move around and connects to different replicas, strange things can happen

Updates you just made are missing
Database goes back in time

Design choice

Insist strict consistency
Enforce some session guarantees, client-centric consistency

Monotonic Reads

If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value on a more recent value.
Disallow reads to a database less current than previous read
Example error

Get a list of email message, when attempts to read one, pop put "message does not exist"

Monotonic Writes

A write operation by a process on a data item x is completed before any successive write operation x by the same process
Writes must follow any previous writes that occurred within their session

Read your Writes

A read operation by a process on a data item x is completed before any successive write operation on x by the same process.
Every read in a session should see all previous writes in that session.
Example error

Deleted email message re-appear

Writes Follow Reads

A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read.

If a write W followed by a read R at a server X, ten at all other servers

If W is in T's database, then any writes relevant to R are also avaialble

After users outside session
Traditional write/read dependencies preserved at all servers
Two guarantees: ordering and propagation

Order

If a read precedes a write in a session, then that read depends on a previous non-session write, then previous write will never be seen after second write at any server. It may not be seen at all.

Propagation

Previous write will actually have propagated to any data base where the second write is applied.

6. Supporting Session Guarantees

Responsibility of session manager, not servers
Two sets

Read set

set of writes that are relevant to session reads

Write set

Set of writes that performed in session

Update dependencies captured in read-sets and write-sets
Causal order of writes

Use Lamport clocks

[Leetcode] Reverse Linked List

Problem

Reverse a singly linked list.

Analysis

这道题就是要把相邻两个node的指针指向反一反，反的过程当中要keep track of next node，不然这时候 node.next 变成前一个节点了

所以我们采用两个指针，prev 和 curr 来记录相邻的两个节点

反之前，先记录 curr.next, 即 tmp = curr.next
然后可以反了，curr.next = prev
然后把prev 和 curr往右移，继续去处理还没反的nodes

prev = curr
curr = tmp

Time: O(n)
Space: O(1)

Code

Problem

Implement int sqrt(int x).

Compute and return the square root of x.

Analysis

二分搜索 [0,x]

若 mid*mid 等于x，或者 (mid*mid<x && (mid+1)*(mid+1)>x), 则返回mid
否则若mid*mid < x，则搜索 [mid+1, right]
否则若mid*mid > x, 则搜索 [left, mid-1]

注意 mid*mid 可能会溢出，要用long类型
Time: O(log n)
Space: O(1)

Code

Reference
[1] https://leetcode.com/problems/sqrtx/

Definition

The utility functions of both players sum up to 0

Equilibrium

Unique Equilibrium

If the utility function of each player has concavity, then there exist a unique equilibrium.

Wednesday, December 30, 2015

Problem

Follow up for "Unique Paths":

Now consider if some obstacles are added to the grids. How many unique paths would there be?

An obstacle and empty space is marked as 1 and 0 respectively in the grid.

For example,

There is one obstacle in the middle of a 3x3 grid as illustrated below.

[
  [0,0,0],
  [0,1,0],
  [0,0,0]
]

The total number of unique paths is 2.

Note: m and n will be at most 100.

Analysis

Similar to Unique Path except that we need to decide whether a position has obstacle or not

If yes, the number of ways to it is 0

Code

Problem

A robot is located at the top-left corner of a m x n grid (marked 'Start' in the diagram below).

The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked 'Finish' in the diagram below).

How many possible unique paths are there?

Analysis

对于每一格，有两种到达它的方式，即从左边，或者从上边

所以 count[i][j] = count[i-1][j] + count[i][j-1]

Time: O(n^2)
Space: O(n^2)

Space可以进行优化为O(n), 因为计算count每一行的值的时候只需要上一行的信息

Code

How to come up with new ideas in security domain

The followings are my own experience in finding research ideas in security domain.

Ideas from Security Threat

What the defenders can do to improve security?

How to detect attackers?
How to defend against attackers?

For the cloud computing problem, the defenders can be further divided into two categories: (1) Cloud provider (2) Cloud benign users

What the attackers can do to increase damage?

Ideas from Existing Work

Improve

Scalability of their mechanism

Combine

Can we combine other mechanisms with their mechanism

New Problem

Can we use the proposed mechanism to solve a different problem

New Mechanism

Can we solve the problem with a better mechanism

Tuesday, December 29, 2015

Naming Entities

A name in a distributed system is a string of bits or characters that is used to refer to an entity
Type of names

Address

The name of an access point of an entity

Identifiers

A name that uniquely identifies an entity

Location-independent name

A name that is independent from its addresses

Name Resolution

A naming system maintains a name-to-address binding for resources
Main problem in naming

How to do name resolution in a distributed systems in a scalable way

Approaches for name resolution closely tied to naming scheme
Flat vs Structure naming

In a flat name space, identifiers are random bit strings
No information on location embedded in name

Name Resolution in a Flat Name Space

Simple solutions that work in a LAN environment

Broadcasting & Multicasting

Message containing identifier of the entity is boardcast
Machine with access point for the entity replies with the address of the access point

ARP protocol for finding the data-link address of a machine given the IP address
When network grows, multicast is more efficient

Forwarding pointers

When an entity moves from A to B, it leaves behind a reference to its new location at B

Home-based Approaches

Home agent keeps track of current location of mobile entity

Hierarchical Approaches

Similar to DNS

What is a DHT

Distributed Hash Table

key = hash(data)
lookup(key) -> IP address
send-RPC(IP address, PUT, key, value)
Send-RPC(IP address, GET, key) -> value

Chord

Structure Naming Systems

Names are organized into name spaces

A name space can be represented as a labeled, directed graph wit two types of nodes

Leaf nodes and directory nodes
Absolute vs relative path names
Local names vs global names

Name resolution: the process of looking up a name

Closure mechanism

Knowing where and how to start name resolution

Unix File Name Spaces

Linking and Mounting

Symbolic link

Mounting remote name spaces

Implementing Name Spaces

Naming service

A service that allows users and processes to add, remove, and lookup names

Name spaces for large-scale widely distributed systems are typically organized hierarchically
Three layers used to implement such distributed name spaces

Global layer

root node and its children

Administrational layer

Directory nodes within a single organization

Managerial layer

Example: DNS name space

Iterative vs Recursive Resolution

Recursive

puts a higher performance demand on each name server
cons

Too high for global layer name servers

Pros

Caching is more effective
Communication costs may be reduces

Example

Iterative

Example

Attribute-based Naming

An entity is described in terms of (attribute, value) pairs
Each entity can have several attributes
Attribute-based naming service are called directory services
LDAP (Lightweight directory access protocol) defacto industry standard

Based on OSI X.500 directory service

Another example: UDDI for web services

Cloud Virtual Machine Allocation

Concerns

When a user request for a virtual machine in the cloud, the cloud provider needs to allocate a VM for the user.

There are three concerns a cloud provider needs to take into account when doing the VM allocation.

Load Balance
Power Consumption
Security

For security, we aim to avoid the cloud covert channel attacks in which an attacker seeks to be co-locate with the target VM in order to extract private information from the victims.

Popular VM Allocation Policies

The following table [1] shows the popular VM allocation policies.

Reference

[1] Using Virtual Machine Allocation Policies to Defend against Co-resident Attacks in Cloud Computing, by Yi Han et al, in Transactions on Dependable and Secure Computing

Vickrey-Clarke-Groves (VCG) Mechanism

Introduction

VCG can maximize the social welfare given the individuals are all "selfish"

Model

Players

There are $n$ game players, $P_1, P_2, \cdots, P_n$

Actions

The actions of the players are denoted as $X = (x_1, x_2, \cdots, x_n)$

Payoff

$v_i(X)$

Real demand

$v_i$

Cost

$p_i$ for its action

Utility

$u_i = v_i - p_i$

Definition

Following Nisan's work, the terms "mechanisms" and "incentive compatible" are defined as

Mechanism

Given a set of n players, and a set of outcomes, A, let $v_i$ be the set of possible valuation functions of the form $v_i(a)$ which player $i$ could have for an outcome $a \in A$. A mechanism is a function $f: V_1 \cdot V_2 \cdot \cdots \cdot V_n \rightarrow A$. Given the evaluations claimed by the players, f selects an outcome, and n payment functions, $p_1, p_2, \cdots, p_n$, where $p_i = V_1 \cdot V_2 \cdot \cdots V_n \rightarrow R$.

The above defines Action and Reward.

Incentive Compatible

For every player $i$, every $v_1 \in V_1$, $v_2 \in V_2$, $\cdots$, $v_n \in V_n$, and every $v'_i \in V_i$, where $a = f(v_i, v_{-i})$ and $a' = f(v'_i, v_{-i})$, then

$v_i(a) - p_i(v_i, v_{-i}) \geq v_i(a') - p_i(v'_i, v_{-i})$, then the mechanism is incentive compatible.

Specifically, among those incentive compatible mechanisms, the Vickrey-Clarke-Groves (VCG) mechanism is the mostly used one.

The VCG generally seeks to maximize the social welfare of all players in one game, where the social welfare is calculated as $\sum^n_{i=1} v_i$. So the goal function of VCG is $argmax_{a \in A} \sum^n_{i=1} v_i$.

The VCG mechanism and the rule to design VCG mechanisms are defined as follows.

VCG Mechanism

A mechanism, consisting of payment functions $p_1, p_2, \cdots, p_n$ and a function $f$, for a game with outcome set $A$, is a Vickrey-Clarke-Groves mechanism if $f(v_1, v_2, \cdots, v_n) = argmax_{a \in A} \sum^n_{i=1} v_i(a)$ ( f maximizes the social walfare) for some functions $h_1, h_2, \cdots, h_n$, where $h_i : V_{-i} \rightarrow R$ (h_i does not depend on $v_i$)

$\forall v_i \in V$, $p_i(v_i) = h(v_{-i}) - \sum_{j \neq i} v_j$.

My understanding

For user $i$, its reward is depended on others, and not related to its action
But why in the payment function, it deducts other users' true value?

Clarke Pivot Rule

The choice $h_i(v_{-i}) = max_{b \in A} \sum_{j \neq i} v_i(b)$ is called the Clark pivot payment. Under this rule the payment of player $i$ is

$p_i(v_1, v_2, \cdots, v_n) = max_{b} \sum_{j \neq i} v_i(b) - \sum_{j \neq i} v_i(a)$ where $a= f(v_1, v_2, \cdots, v_n)$.

My understanding

I didn't understand it yet.

Monday, December 28, 2015

[Leetcode] Best Time to Buy and Sell Stock II

Problem

Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete as many transactions as you like (ie, buy one and sell one share of the stock multiple times). However, you may not engage in multiple transactions at the same time (ie, you must sell the stock before you buy again).

Analysis

Sum up all possible increase
Time; O(n)
Space; O(1)

Code

Reference

[1] https://leetcode.com/problems/best-time-to-buy-and-sell-stock-ii/

[Leetcode] Best Time to Buy and Sell Stock III

Problem

Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete at most two transactions.

Analysis

用一个数组pre记录在第i天或之前完成一个transaction的左右收益
用一个数组back记录从第i天或之后开始完成一个transaction的最大收益
则完成两个transactions的最大收益为 max(pre[i], back[i])

Code

Reference
[1] https://leetcode.com/problems/best-time-to-buy-and-sell-stock-iii/

[Leetcode] Best Time to Buy and Sell Stock

Problem

Say you have an array for which the ith element is the price of a given stock on day i.

If you were only permitted to complete at most one transaction (ie, buy one and sell one share of the stock), design an algorithm to find the maximum profit.

Analysis

遍历数组，记录当前最小值，记录当前最大profit

当前profit = 当前值 - 当前最小值

时间复杂度: O(n)
空间复杂度: O(1)

Code

Reference
[1] https://leetcode.com/problems/best-time-to-buy-and-sell-stock/

Client-Server Design Issues

Threads in Distributed Systems

Multithreaded clients

Thread can block waiting for server response but application/process is not blocked

Multithreaded servers

Simplified server code as opposed to finite-state-machine approach that users non-blocking system calls
Can handle multiple client requests in parallel (while making blocking ystem calls)
Improved performance over iterative servers on multiprocessor systems

NFS Architecture

Google File System (GFS)

Semantics of File Sharing

Sunday, December 27, 2015

[Leetcode] Longest Increasing Subsequence

Problem

Given an unsorted array of integers, find the length of longest increasing subsequence.

For example,
Given [10, 9, 2, 5, 3, 7, 101, 18],
The longest increasing subsequence is [2, 3, 7, 101], therefore the length is 4. Note that there may be more than one LIS combination, it is only necessary for you to return the length.

Your algorithm should run in O(n2) complexity.

Follow up: Could you improve it to O(n log n) time complexity?

Analysis

遍历所有的数字，同时维护一个valid数组，来保存当前扫描到的有用的数

tail 表示当前valid数组的最后一个数，
对于当前读到的数x

如果x>tail，则把x append到数组中
如果x<valid数组的第一个数，则将其替换为x
否则，则二分查找到valid数组中第一个大于或等于x的数，替换为x

返回valid数组中有效的数个数

时间复杂度: O(nlogn)

Code

Reference
[1] http://www.geeksforgeeks.org/longest-monotonically-increasing-subsequence-size-n-log-n/
[2] https://leetcode.com/problems/longest-increasing-subsequence/

Web Service -- REST

What is REST

a design pattern for implementing networked systems, stands for "Representational State Transfer"
A client references a web resources using a URL
The web serves as a guiding framework for the web
HTTP is not just a protocol

It provides an API (POST, GET, PUT, DELETE) for create, read, update, and delete operations on a resource

Approach isolates application complexity at the end points (client and server) and keeps it out of the transport

Three Fundamental Aspects of REST

Resources

Every distinguishable entity is a resource. A resource may be a web site, an HTML page, and XML document etc.

URLs

Every resource is uniquely identified by a URL.

Simple operations

REST vs. SOAP

REST

The web is the universe of globally accessible information
Resource oriented
User-driven interactions via forms
Few operations (generic interface) on many resources
URI: Consistent naming mechanism for resources
Focus on scalability and performance of large scale distributed hypermedia systems

SOAP

The web is the universal transport of message
Activity/Service oriented
Orchestrated reliable event flows
Many operations (service interface) on few resources
Lack of standard naming mechanism
Focus on design of integrated (distributed) applications

Web Service

Web Services Fundamentals

Two Competing Approaches

REST-style
SOAP-style

Four Fundamental Technologies

Describing information sent over the network

WSDL

Defining web service capability

SOAP

Accessing web services

UDDI

Finding web services

Web Service Infrastructure and Components

XML

Has emerged as the standard solution for describing information exchanged between heterogeneous system
Can be read by programs and interpreted in an application-specific way
Example

<Account>xx</Account>

WSDL: Describing the web service

Provides functional description of network services

IDL description
Protocol and deployment details
Platform independent description
Extensible language

As extended IDL: WSDL allows tools to generate compatible client and server stubs

Allows industries to define standardized service interfaces
Allows advertisement of service descriptions, enables dynamic discovery and binding of compatible services

Used in conjunction with UDDI registry

The main elements in a WSDL description

UDDI: Finding Web Service

Universal Description, Discovery, Integration
UDDI defines the operation of a service registry

Data structures for registering

Business
Technical specification: tModel is a keyed reference to a technical sepcifcaiton
Service and service endpoints

Referencing the supported tModels

The main UDDI data structures

SOAP

Why SOAP

A "wire protocol" necessary for accessing distributed object services
Vendor and/or platform-specific wire protocols hinder interoperability

SOAP

An Internet standard specification, the goal of which is to define a platform and vendor-neural WIRE PROTOCOL based on Internet standard protocols [HTTP & XML] to access Web Services.

Features

Uses XML to package requests for services exposed by Web Services, and responds generates by Web services
Typically uses HTTP as a transport protocol

SOAP message

Convey documents
Support client-server communication

RESTful Approach

Focus on using HTTP operations (GET, PUT, POST, DELETE) to manipulate data resources represented in XML

No WSDL + SOAP

Saturday, December 26, 2015

Security blogs

IT

IT技术博客大学习 http://blogread.cn/it/
网络安全研究国际学术论坛 http://www.inforsec.org/wp/
腾讯安全应急响应中心 http://security.tencent.com/index.php/blog
LinkedIn安全中心 https://security.linkedin.com/blog-archive

Security blog
- https://luyixing.wordpress.com/

[Leetcode] Two Sum

Problem

Given an array of integers, find two numbers such that they add up to a specific target number.

The function twoSum should return indices of the two numbers such that they add up to the target, where index1 must be less than index2. Please note that your returned answers (both index1 and index2) are not zero-based.

You may assume that each input would have exactly one solution.

Input: numbers={2, 7, 11, 15}, target=9

Output: index1=1, index2=2

Analysis

用一个hash表来围护每个数和其对应的index

如果有重复的数字，则overwrite，即保留最后出现的index

然后遍历所有的数，如果（target - 当前数）在哈希表中，且俩数的index不一样，则找到解了
复杂度

Time: O(n)
Space: O(n)

Code

Reference
[1] https://leetcode.com/problems/two-sum/

Friday, December 25, 2015

Remote Method Invocation - Design & Implementation

Middleware layers

Distributed Objects

Compile-time vs. Run-time Objects

Objects can be implemented in many different ways

Compile-time objects

e.g., instance of classes written in object-oriented language like Java, C++

Data-base objects
Procedural languages like C,with an appropriate "wrapper code" that gives it the appearance of an object

System like Java RMI support compile-time objects
Not possible or difficult in language-independent RMI middleware such as CORBA

These systems use object adapters
Implementations of object interfaces are registered at an object adapter, which acts as an intermediary between the client and the object implementation

Persistent vs. Transient Objects

Persistent objects

continue to exist even if they are not contained in the address space of server process
the "state" of a persistent object has to be stored on a persistent store, i.e., some second storage
invocation requests result in an instance of the object being created in the address space of a running process

many policies possible for object instantiation and (de)instantiation

Transient objects

Only exist as long as their container server process are running

i.e., only exist in memory

Static vs Dynamic Remote Method Invocations

Static invocation

Typical ways for writing code that uses RMI is similar to the process for writing RPCC
declare the interface in IDL, compile the IDL file to generate client and server stubs, link them to client and server side code to generate the client and the server executables
requires the object interface to be known when the client is being developed

Dynamic invocation

The method invocation is composed at run-time

invoke (object, method, input_parameters, output_parameters)

Useful for applications where object interface are discovered at runtime

e.g., object browser, batch processing systems for object invocations

Design Issues for RMI

RMI invocation semantics

Invocation semantics depend upon implementation of Request-Reply protocol used by RMI
Could be Maybe, At-least-once, At-most-once

Transparency

Should remote invocations be transparent to the programmer?

Partial failure, higher latency
Different semantics for remote objects, e.g., difficult to implement "cloning" in the same way for local and remote objects or to support synchronization operations e.g., wait/notify

Current consensus

Access transparency

Remote invocations should be made transparent in the sense that syntax of a remote invocation is the same as the syntax of local invocation
Distinguish

But programmers should be able to distinguish between remote and local objects by looking at their interfaces,
e.g., in Java RMI, remote objects implement the Remote interface

Implementing Issues for RMI

Parameter Passing

Representation of a remote object referece

Request/Reply protocol

Handling failures at client and/or server
Issues in marshaling of parameters and results

Input, output, inout parameters
Data representation
handling reference parameters

Distributed object references
handling failures in request-reply protocol

Partial failure

Client, server, network

Supporting persistent objects, object adapters, dynamic invocations, etc

Marshalling

Pack method arguments and results into a flat array of bytes
Use a canonical representation of data types

e.g., integers, characters, doubles

Example

CORBA CDR
Java serialization

Handling failures

Client unable to locate server

Reasons

Server has crashes

Server has moved

(RPC systems) client compiled using old version of service interfance

System must report error (remote exception) to client

Loss of transparency

Request message lost

Retransmit a fixed number of times before throwing an exception

Reply message lost

Client resubmits request

Server choices

Re-execute procedure

Server should be idempotent so that it can be repeated safely

Filter duplicates

Server should hold on to results until ackowledged

Server crashes after receiving a request

At least once

Keep trying till server comes up again

At most once

Return immediately

Exactly once impossible to achieve

Client crashes after sending a request

If a client crashes before RPC returns, we have an "orphan" computation at server

Waste resources, could also start other comutations

Orphan detection

Reincarnation

Client broadcasts new epoch when it comes up again

Expiration

RPC has fixed amount of time to do work

Note

Implementing the request-reply protocol on top of TCP

Does not help in providing applications with different invocation semantics

TCP does not help with server crashes

If a connection is broken, the end points do not have any guarantees about the delivery of messages that may have been in transit

RMI Software Components

Communication module

Implements the request-reply protocol

Remote reference module

Responsible for translating between local and remote object references and for creating remote object references

Maintains remote object table that maintains a mapping between local&remote object references
E.g., Object Adapter in CORBA

RMI - Object Activation

Activation of remote objects

Some applications require that information survive for long periods of time
However, objects not in user all the time, so keeping them in running processes is a potential waste of resources
Object can be activated on demand

E.g., standard TCP services such as FTP on UNIX machines are activated by inetd

Active and passive objects

Active objects

Instantiated in a running processes

Passive objects

Not currently active but can be made active
Implementation of its methods, and marshalled state stored on disk

Activator responsible for

Registering passive objects that are available for activation
Starting named server processes and activating remote objects in them
Keeping track of locations of servers for mote objects that it has already activated

Examples

CORBA implementation repository
JAVA RMI has once activator on each server computer

RMI - Other Topics

Persistent object stores

An object that is guaranteed to live between activations of process is called a persistent object
Stored the state of an object in a marshalled (serialized) form on disk

Location service

Objects can be migrated from one system to another during their lifetime
Maintains mapping between object references and the location of an object

Distributed Garbage Collection

Needed for reclaiming space on servers

Passing "behavior"

Java allows objects (data+code) to be passed by value

If the class for an object passed by value is not present in a JVM, its code is downloaded automatically

Use of reflection in Java RMI

Allows construction of generic dispatcher and skeleton

Distributed Garbage Collection

Java approach based on reference counting

Each server process maintains a list of remote processes that hold remote object references for its remote objects
When a client first acquires a remote reference to an object, it make addRef() invocation to server before creating a proxy
When a clients local garbage collector notices that a proxy is no longer reachable, it makes a removeRef() invocation to the server before deleting the proxy
When the local garbage collector on the server notices that the list of client processes that have a more reference to an object is empty, it will delete the object (unless there are any local objects that have a reference to the object)

Other approaches

Evictor pattern
Leases

Java RMI

lecture 4

Features

Integrate with Java language and libraries

Security, write once run anywhere, multithreaded
Object oriented

Can pass "behavior"

Mobile code
Not possible in CORBA, traditional RPC systems

Distributed garbage collection
Remoteness of objects intentionally not transparent

Good for handling failures

Remote Interfaces, Objects, and Methods

Object becomes remote by implementing a remote interface

A remote interface extends the interface java.rmi.Remote
Each method of the interface declares java.rmi.RemoteException in its throws clause in addition to any application-specific clauses

Creating distributed applications using RMI

Define the remote interfaces
Implement the remote objects and server
Implement the client
Compile the remote interface, server and client
Generate the stub and skeleton using rmic
Start the RMI registry
Start the server
Run the client

[Leetcode] Reverse Words in a String

Problem

Given an input string, reverse the string word by word.

For example,
Given s = "the sky is blue",
return "blue is sky the".

Analysis

首先split string by space
然后 append word in reverse order
注意如果当前非空，在append下一个word之前，需要append空格

Code

Reference

[1] https://leetcode.com/problems/reverse-words-in-a-string/

[Code Interview] Count number of ones

Problem

Counting the number of 1 bits in the binary representation of a given integer.

Analysis

Method 1:

shift the number right by one bit each time, and check whether the right-most bit is 1

If yes, count++

Method 2:

if n is not 0, count++
assign n = n & (n-1), repeat until n is 0

Time complexity

method 1: theta(n)
method 2: O(n)

Code

Reference

[1] http://www.geeksforgeeks.org/count-set-bits-in-an-integer/

Ads Optimization Suggestions for Advertisers

Analyze where money goes

Set dynamic conversion tracking

Ads optimization

do ROI based optimization

optimize your ads based on their ROI metrics

make separate campaigns for high ROI ad groups
make separate campaigns for different languages, geographic targeting
create negative keywords in a shared folder, and add it to relevant campaigns

extend your negative keyword lists at least once a week

use flexible bidding for devices, days/hours, geographic regions, demographic target etc.
set automatic budget increase for the best days of the week with higher search volume, and budget decrease for the worst days

Remarketing

combined remarketing list
similar user targeting

Reference
[1] https://www.quora.com/What-are-some-Google-AdWords-best-practices

Search Ads with Google

How it works

Every time for a search result, it will do an auction to price different ad locations. The advertiser would need to pay only if the user clicks the ad.

When the user clicks, the advertiser only needs to pay second-highest price.
This can guarantee that they bid their true value.

Quality Score

Click through rate
Relevance
Landing page

Ad Rank

bid price * quality = ad rank

Cost Per Click (CPC)

price to pay by each client in the rank list
p1 = p2*q2/q1
the last one in the rank list pay the minimum price

Reference

[1] https://www.quora.com/What-are-some-Google-AdWords-best-practices

雅诗兰黛红石榴面霜

Estee Lauder - Nutritious - Vita-Mineral Moisture Cream

这小瓶是之前买眼霜送的小样。

成分

采用萃取的红石榴精华。红石榴本身具有抗氧化功效，并且还蕴含着多种维他命、矿物质。

效果

我用的时候恰好是冬天，感觉很滋润。夏天用应该会太油。早晚都可以用，不搓泥。
质地比较薄，很好推开。
稍微有点提亮肤色的效果，尤其是刚刚擦上的时候。
没有什么味道。

评分

★★★★☆

Ads Optimization

Adaptive Shopping Campaigns

optimize the structure of your shopping campaigns in order to improve their performance.
help bidding higher on high-performance products
save money by bidding lower on poor performers

Remarketing

Engage those people who've left your site or clicked your ads without buying anything by

showing them relevant ads on search or display

Here are two ways to remarket

Reach high-value customers with remarketing lists for search ads.

modify your keyword bids, target search ads, or prevent ads from displaying based on inclusion in your AdWords remarketing list.
you can track the performance of each remarking list in any associated campaign.

Re-engage shopper with display remarking from search ads

remarket to users instantly with display ads on the Google Display Network or across exchanges via DoubleClick Bid Manager.

Reference
[1] http://doubleclickadvertisers.blogspot.com/2015/10/doubleclick-search-guide-to-holidays.html

Using real-time data to optimize your search ads

Gaps in search ads

Long Tail

Retailers today often have hundreds of thousands of products, but they only advertise on average of 49% of their inventory with search ads. Reason

Difficult to maintain so many keywords and bids on their own
They often don't see the value of their long-tail inventory items

Real Data

Link purchases to ads

identify the impact of advertising

on business goals like maximizing profitability or selling off inventory

improve ad targeting by

matching ads to products that consumers are most interested in

understand the value of brand and general terms in

selling your highest margin or most important products

Justin Johnson, Paid Search Manager at Cabela's explains

Having insights into where we spend our money, in additional to what people are looking for, has been invaluable in helping us make better decisions.
We have a better look into where we may not have adequate coverage, and are able to quickly make changes to address that.
Being able to pull in margin data to see if certain keywords are better or worse at driving profitable traffic than we anticipated helps us be more thoughtful with our spend.

Machine-learning-powered Bid Strategy

Adjust bid strategies to respond quickly to the bids that deliver results.

Cross-device Conversion Tracking

helps bid optimization system make the most of your mobile traffic.

Reference

[1] http://doubleclickadvertisers.blogspot.com/2014/10/doubleclick-search-using-real-time-data.html

Thursday, December 31, 2015

Basic Architecture for Replica Data Management

System Model

Mechanism for Sequential Consistency

Primary-backup Model

Procedures

Implementation

Failures

Replicate-Write Protocol

Active replication

Implementing Ordered Multicast

The ISIS algorithm for total ordering

Causal Ordering using Vector Timestamps

Quorum-based Protocols

Scaling

Highly Available Service

1. Replication

Motivation

Requirements

2. Consistency Models

Strict Consistency

Sequential Consistency

Causal Consistency

FIFO consistency

Weak Consistency

Release Consistency

Entry Consistency

3. A summary of Consistency Models

4. Weak Consistency Models

5. Eventual Consistency

Session Guarantees

Monotonic Reads

Monotonic Writes

Read your Writes

Writes Follow Reads

6. Supporting Session Guarantees

Problem

Analysis

Code

Problem

Analysis

Code

Definition

Equilibrium

Unique Equilibrium

Wednesday, December 30, 2015

Problem

Analysis

Code

Problem

Analysis

Code

Ideas from Security Threat

Ideas from Existing Work

Tuesday, December 29, 2015

Naming Entities

Name Resolution

Name Resolution in a Flat Name Space

What is a DHT

Structure Naming Systems

Linking and Mounting

Symbolic link

Mounting remote name spaces

Implementing Name Spaces

Iterative vs Recursive Resolution

Recursive

Iterative Example

Attribute-based Naming

Concerns

Popular VM Allocation Policies

Introduction

Model

Definition

Mechanism

Incentive Compatible

VCG Mechanism

Clarke Pivot Rule

Monday, December 28, 2015

Problem

Analysis

Iterative

Example