In databases, the terms isolation level and consistency level/model are sometimes used interchangeably. "Read repeatable" and "Serializable" are well known isolation levels. But "Strict Serializable" and "Linearizable" are consistency terms.
If you have used Mysql or Postgresql, you know probably know what an isolation levels like "Read repeatable" or "Serializable" means. But when you work on a distributed database you hear about consistency level much more.
The first time I heard about consistency level was when I worked with Apache Cassandra which claimed to only support "eventual consistency". A few years ago when my company was evaluating distributed databases, we had a few architects that insisted that we needed a database that support "strict serializability". CockroachDB was a database that supported this consistency level.
If you are confused, read long. I wrote this blog in attempt to clear up my confusion.
So far, the best explanations on this topic that I found are by Daniel J Abadi [2] [3]. Kyle Kingsbury @Jepson [1] has good descriptions of the topic as well.
But first, a clarification on what consistency means.
What is consistency ?
Consistency is an overloaded term and its meaning has changed in recent times.
ACID consistency
The database must preserve its internal correctness rules after every transaction.
Consider a banking database with a constraint account_balance > 0.
If the starting account_balance is 50 and a transaction tried to deduct 100, that is a violation of that constraint and should fail.
This is the C is ACID. Databases support constraints to ensure this. But it is mainly the responsibility of the application programmer. It is well understood and rarely discussed these days.
Distributed systems consistency
The system must ensure that all nodes (or clients) agree on the same view of data.
Make the distributed system feel like a single threaded single node system. Read of a value any where in the system produces the same result [2]. The result returned is the most recently written value no matter where it was written.
Consider a system with multiple nodes. X was 1. The value X=2 is written to one node and replicated to others. If clients read from the replicas. Do they all see X=2 immediately ? With strict serializable consistency level , the answer is yes. With weaker models, it is possible they read a an older value.
Most of us first heard of this description from the CAP theorem.
Why the difference ?
Both describe behavior under concurrency.
Isolation levels describe problems that occurs in single node databases when transactions execute concurrently. At the highest isolation level transactions execute in some order. Each transaction executes as if it were alone.
In distributed systems there is network latency, replication and partitioning, all contributing latency and timing issues to concurrency issues. Consistency approaches concurrency issues taking time and latency into account as well. At the highest consistency level, transactions execute in order of their order of completion (commit) in real time.
Serializable is the strictest isolation level. Strict serializability is the strictest consistency model. In a single node system, there is very little difference between the two because the time issues are small.
Isolation Levels vs. Consistency Models
Isolation Levels
- Prevent read, writes of uncommitted data.
- Prevent anomalies like read uncommitted, non repeatable reads, phantom reads
- Focus on managing concurrent access to data while balancing performance and correctness.
- Common isolation levels (from weakest to strongest):
- Read Uncommitted
- Read Committed
- Repeatable Read
- Serializable — the strictest standard defined by the ANSI SQL standard.
- Old blog
Consistency Levels
- Typically relevant distributed databases.
- Time is a factor
- They describe the guarantees about visibility and ordering of updates in a distributed, replicated data system.
- They focus on the behavior perceived by clients across multiple nodes or replicas.
- Examples include:
- Strict serializability
- Linearizability
- causal consistency
Example to Illustrate the Difference:
Scenario:
- Two accounts A and B initially have a balance of 100 each.
- Two concurrent transactions:
- Tx1: Transfer 50 from A to B.
- Tx2: Reads balances of A and B and sums them
Isolation level Serializable:
- Tx1 and Tx2 are serialized, and the sum read by Tx2 is 200.
- (Tx1, Tx2) and (Tx2, Tx1) are valid orders irrespective of when each actually committed first.
Consistency level Strict Serializable
- If Tx1 commits before Tx2 starts, Tx2 must see all effects of Tx1. The only valid order is (Tx1, Tx2)
- However if there is some overlap like if Tx1 commits after Tx2 starts, then both orders (Tx1, Tx2) and (Tx2, Tx1) can be valid. Reason is that Tx2 cannot read the data committed by Tx1
A few descriptions
Serializability
Strict Serializability
Transactions occur in a strict order that is consistent with the real time (clock time) order in which transactions occur. It applies to the entire system encompassing multiple objects. A is before B in the order if A commits before B begins. So the only valid order is (A,B). However if A commits after B begin, then both orders (A, B) and (B, A) are valid.
Linearizable
Transactions occur in a strict order that is consistent with the real time (clock time) order in which transactions occur. But this applies to a single object not to the entire dataset. Definition of a single object varies. Could be a key or a table. [1]
Most of the time concurrency issues are important when multiple threads touch the same data and that why this model is also as important as strict serializability.
No comments:
Post a Comment