Transactions and Replications
One Copy Serializability
- Replicated transactional service
- Each replica manager providers concurrency control and recovery of its own data items in the same way as it would for non-replicated data
- Effects of transactions performed by various clients on replicated data items are the same as if they has been performed one at a time on a single data item
- Additional complications
- Failures should be serialized w.r.t. transactions, i.e., any failure observed by a transaction must appear to have happened before a transaction started
Replication Scheme
- Primary copy
- All client request are directed to a single primary server
- Read one -- write all
- cannot handle network partitions
- Each write operation sets a write lock to each replica manager
- Each read sets a read lock at one replica manager
- Schemes that can handle network partitions
- Available copies with validation
- Quorum consensus
- Virtual Partition
Available Copies Replication
- Can handle the case when some replica managers are unavailable because they failed or communication failure
- Reads can be performed by any available replica manager but writes must be performed by all available replica managers
- Normal case is like one/write all
- As long as the set of available replica managers does not change during a transaction
- RM failure case
- One copy serializability requires that failures and recovery be serialized w.r.t transactions
- This is not achieved when different transactions make conflicting failure observations
- Examples shows local concurrency control is not enough
- Additional concurrency control procedure (called local validation) has to be performed to ensure correctness
- Available copies with local validation assume no network partition
- i.e, functioning replica managers can communicate with one another.
- Local validation
- Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed
Handling Network Partititons
- Network partitions separate replica managers into two or more subgroups, in such a way that the members of a group can communicate with one another but members of different subgroups cannot communicate
- Optimistic approaches
- Available copies with validation
- Pessimistic approaches
Available Copies with Validation
- Available copies algorithm applied within each partition
- Maintains availability for read operations
- When partition is repaired, possibly conflicting transactions in separate partitions are validated
- The effects of a committed transactions that is now aborted on validation will have to be undone
- Only feasible for applications where such compensating actions can be taken
- Validation
- Versions vector (write-write conflicts)
- Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions)
- Log used to construct a precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations
- No cycle in graph corresponding to each partition
- If there are cycles in graph, validation fails.
Quorum Consensus
- A quorum is a subgroup of replica managers whose size gives it the right to carry out operations
- Majority voting one instance of a quorum consensus scheme
- R+ W > total number of votes in group
- W > half f the total votes
- Ensure that each read quorum intersects a write quorum, and two write quorum will intersect
- Each replica has a version number that is used to detect if the replica is up to date.
Virtual Partition Scheme
- Combines available copies and quorum consensus
- Virtual partition = set of replica managers that have a read and write quorum
- If a virtual partition can be formed, available copies is used
- Improve performance of Reads
- If a failure occurs, and virtual partition changes during a transaction, it is aborted
- Have to ensure virtual partitions do not overlap.
CAP Conjecture
- Is it possible to achieve consistency, availability, and partition tolerance?.
- Classic distributed systems: focused on ACID semantics
- Atomic
- Consistent
- Isolated
- Durable
- Modern internet system: focused on BASE
- Basically Available
- Soft-state (or scalable)
- Eventually consistent
- ACID v.s. BASE
Why the Divide
- What goals might you want for a shared-data system
- consistency
- availability
- partition tolerance
- Strong consistency
- all clients see the same view, even in the presence of updates
- High availability
- All clients can find the same replica of the data, even in the presence of failures
- Partition-tolerance
- The system properties hold even when the system is partitioned.
CAP Conjecture
- You can only have two out of these three properties.
- The choice of which feature to discard determines the nature of your system.
Consistency and Availability
- Comment
- Providing transaction semantics requires all nodes to be in contact with each other
- Example
- Single-site clustered databases
- Typical features
- Two-phase commit
- Cache invalidation protocol
- Classic distributed system style
Consistency and Partition Tolerance
- Comment
- If one is willing to tolerate system-wide blocking, then can provide consistency even when there are temporary partitions
- Examples
- Distributed databases
- Distributed locking
- Quorum (majority) protocols
- Typical features
- Pessimistic locking
- Minority partitions unavailable
- Also common distributed system
- voting vs. primary replicas
Partition-Tolerance and Availability
- Comment
- Examples
- Typical features
- TTLs and lease cache management
- Optimistic updating with conflict resolution
Techniques
- Expiration-based caching: AP
- Quorum/majority algorithm: PC
- Two phase commit: AC
No comments:
Post a Comment