Introduction
Optimistic concurrency control refers to concurrency control in databases without the use of explicit locks. Traditionally it is done using locks. But locking reduces throughput and scalability. Modern databases therefore prefer not to use locking and use optimistic techniques. This is a brief description of the optimistic technique.
Why concurrency control ?
Concurrency control in databases refers to techniques used to execute transactions concurrently.
From a end users standpoint, every transaction needs to execute correctly and preserve the consistency of the database, irrespective of what other transactions are doing. For the user, it should appear that the transactions are executing in some serial order.
When multiple transactions read and/or write the same data several kinds of errors can happen such as
- dirty reads - reading uncommitted data
- non repeatable reads - read a second time and you get a different value
- phantom reads - read a second time and you get additional rows
- Lost update - one transaction overwrites others write
- dirty write - a transaction reads an uncommitted value and updates it
- write skew - the combination of two transactions breaks some invariant like minimum balance
At a user level, databases give the users a knob called the isolation level to protect against some of these errors. For an explanation of isolation levels, see the blog I wrote many a few years ago.
But under the hood databases implement concurrency control to ensure that users see a consistent view of the data at all times.
There are 3 main types of concurrency control:
- Pessimistic concurrency control
- Optimistic concurrency control (OCC)
- Multi version concurrency control (MVCC)
The pessimistic approach involves locking resources - rows, tables , pages etc while they are used by one transactions. Other transaction that need that resource wait until the lock is released. This approach prevents conflicts but comes at the cost of throughput.
OCC and MVCC take the optimistic approach that conflicts rarely happen. They lets transactions proceed without acquiring locks. If a conflict is detect downstream, one of the transaction is forced to abort.
In this blog we discuss one such concurrency control technique - optimistic concurrency control. I will discuss the other two in subsequent blogs.
What is optimistic concurrency control ?
As the name suggests, the approach is optimistic. Optimistic implies a positive view of things that most transactions do not conflict with other and there is no need to slow things down by acquiring locks and holding on to them.
Assumptions are that
- Transactions are short lived
- They rarely conflict
The concurrency control here is done in 3 stages.
In the first stage (read) the transaction executes its operations in a private or staging area. Rows that are read known as the read set and rows that are updated known as the write set are identified.
In the second stage (validation) the operations are checked for conflicts. Rows that are being read by one transaction should not be be updated by others. If a conflict is detected, then the transaction is aborted and the staging area cleared
In the third stage, if there are no conflicts the changes are committed.
Detecting the conflicts
Given 2 transactions A and B,
A and B are given timestamps(TS) before the validation starts.
Let assume TS (A) < TS (B).
If A committed before B started read, then there are no issues and B is also allowed to commit.
If A committed after B started read but before B started write, then both can commit if the write set of A does not intersect with the read set of B.
If A read completes before B read completes, and the write set of B does not intersect with both the read set of A and write set of A, then both can commit.
In all the 3 cases, A and B are operating on different records and do not conflict.
In this example, we are looking "backward" from B. If there is a conflict B will asked to abort. Some databases look "forward" but the concept is the same.
When a validation is being done, no other transaction is allowed to commit. Only one transaction is validated at a time.
Timestamp based
In pessimistic concurrency control, transactions are ordered in the order in which locks are acquired.
In optimistic concurrency control, transactions are ordered based on timestamps given to transactions.
Commercial databases using OCC
Some example are:
Amazon Aurora DSQL
Yugabyte
CockroachDB
FoundationDB
Fauna
Google Spanner
Advantages of OCC
Higher scalability
Higher throughput
No deadlocks
In general this is suitable for low conflict data access.
Disadvantages
Transactions might need to be aborted.
Applications need to handle aborted transactions and retry.
This might not be suitable for cases when conflict is high - that is multiple transactions needs to access the same data.
Conclusion
Optimistic concurrency control is becoming increasing popular is modern databases that need to scale and can be distributed across multiple node. It really works well in low data contention scenarios. It is ideal for distributed databases where the locking approach becomes even more expensive when you need to lock across nodes - as would be needed in modern distributed databases. For reasons mentioned above, distributed , cloud native, high concurrency systems are moving towards OCC.
References:
1. Database Management Systems - Ramakrishnan and Gehrke2. Database Internals - Petrov
3. Andy Pavlov CMU lecture on youtube