Each replica manager providers concurrency control and recovery of its own data items in the same way as it would for non-replicated data
Effects of transactions performed by various clients on replicated data items are the same as if they has been performed one at a time on a single data item
Additional complications
Failures should be serialized w.r.t. transactions, i.e., any failure observed by a transaction must appear to have happened before a transaction started
Replication Scheme
Primary copy
All client request are directed to a single primary server
Read one -- write all
cannot handle network partitions
Each write operation sets a write lock to each replica manager
Each read sets a read lock at one replica manager
Schemes that can handle network partitions
Available copies with validation
Quorum consensus
Virtual Partition
Available Copies Replication
Can handle the case when some replica managers are unavailable because they failed or communication failure
Reads can be performed by any available replica manager but writes must be performed by all available replica managers
Normal case is like one/write all
As long as the set of available replica managers does not change during a transaction
RM failure case
One copy serializability requires that failures and recovery be serialized w.r.t transactions
This is not achieved when different transactions make conflicting failure observations
Examples shows local concurrency control is not enough
Additional concurrency control procedure (called local validation) has to be performed to ensure correctness
Available copies with local validation assume no network partition
i.e, functioning replica managers can communicate with one another.
Local validation
Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed
Handling Network Partititons
Network partitions separate replica managers into two or more subgroups, in such a way that the members of a group can communicate with one another but members of different subgroups cannot communicate
Optimistic approaches
Available copies with validation
Pessimistic approaches
Quorum consensus
Available Copies with Validation
Available copies algorithm applied within each partition
Maintains availability for read operations
When partition is repaired, possibly conflicting transactions in separate partitions are validated
The effects of a committed transactions that is now aborted on validation will have to be undone
Only feasible for applications where such compensating actions can be taken
Validation
Versions vector (write-write conflicts)
Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions)
Log used to construct a precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations
No cycle in graph corresponding to each partition
If there are cycles in graph, validation fails.
Quorum Consensus
A quorum is a subgroup of replica managers whose size gives it the right to carry out operations
Majority voting one instance of a quorum consensus scheme
R+ W > total number of votes in group
W > half f the total votes
Ensure that each read quorum intersects a write quorum, and two write quorum will intersect
Each replica has a version number that is used to detect if the replica is up to date.
Virtual Partition Scheme
Combines available copies and quorum consensus
Virtual partition = set of replica managers that have a read and write quorum
If a virtual partition can be formed, available copies is used
Improve performance of Reads
If a failure occurs, and virtual partition changes during a transaction, it is aborted
Have to ensure virtual partitions do not overlap.
CAP Conjecture
Is it possible to achieve consistency, availability, and partition tolerance?.
Classic distributed systems: focused on ACID semantics
Atomic
Consistent
Isolated
Durable
Modern internet system: focused on BASE
Basically Available
Soft-state (or scalable)
Eventually consistent
ACID v.s. BASE
Why the Divide
What goals might you want for a shared-data system
consistency
availability
partition tolerance
Strong consistency
all clients see the same view, even in the presence of updates
High availability
All clients can find the same replica of the data, even in the presence of failures
Partition-tolerance
The system properties hold even when the system is partitioned.
CAP Conjecture
You can only have two out of these three properties.
The choice of which feature to discard determines the nature of your system.
Consistency and Availability
Comment
Providing transaction semantics requires all nodes to be in contact with each other
Example
Single-site clustered databases
Typical features
Two-phase commit
Cache invalidation protocol
Classic distributed system style
Consistency and Partition Tolerance
Comment
If one is willing to tolerate system-wide blocking, then can provide consistency even when there are temporary partitions
No comments:
Post a Comment