Tuesday, November 30, 2010

Database isolation

When multiple threads are accessing data from the same table or tables in a relational database, care needs to be taken that updates from one thread or transaction do not interfere with others. Isolation is a property of database transactions that determine when changes made by one transaction are visible to other concurrent transactions.

Relational databases support different isolation levels. Isolation is typically achieved either by locking data or by serializing access to data, both of which lead to loss in concurrency ( and thus performance). It is thus important to pick the correct isolation level so you have optimal performance without loss in correctness.

To understand isolation levels, it is useful to first talk about the type of queries than can happen.

1. Dirty reads
A dirty read is one that reads uncommitted data.This is dangerous for obvious reason. The data you just read may get rolled back and never exist in the database in the future.

2. Non repeatable read
A non repeatable read reads committed data. But if you do the read again, you will see the effect of any changes to the data that were committed by other transaction.

At time a1, say a transaction t1 executes query:

select quantity from Order
where orderid = 1

It returns 2.

At time a2, transaction t2 updates the quantity to 3.

At time a3, t1 executes the same query which returns 3.

3. Phantom reads
A phantom read reads committed data as well. But a subsequent read in the same
transaction may see new data added and committed by another transaction.

At time a1, a transaction t1 does

Select OrderId from Order
where itemid = '23'

The querry returns

1
2

At time a2 , another transaction t2 inserts a new order for the same item id.

At time a3, transaction t1 executes the same query and it might return

1
2
3

Now that we understand the types of reads, let us look at the isolation levels. There are 4 isolation levels defined by ANSI

1. READ UNCOMMITTED
This is the least stringent isolation level. At this level, dirty reads are allowed. You are reading data that may or may not be committed and it is unreliable.

2. READ COMMITTED
This isolation level ensures that only committed data is read. Dirty reads are thus not allowed. But Non repeatable reads or phantom reads are possible. This isolation level is sufficient if you just want to get a snapshot of the data at a particular time and do not care that might be updated.

3. REPEATABLE READ
This isolation level ensures that rows read within a transaction will not be updated by another transaction. New rows might added, but ones already read will not change. The reads are thus repeatable. Non repeatable read is not possible, but phantom reads are possible.

4. SERIALIZABLE
This is the most stringent isolation level. Access to data is serialized. This is very expensive but none of the problem reads are possible.

The default isolation level in SQL Server, Oracle and DB2 is Read Committed. Most the time this is sufficient. Read Uncommitted is too dangerous and serializable can lead to unacceptable performance.

Repeatable Read is necessary when you say read rows from a database and based on the value, do an update or delete within the same transaction. You need repeatable read because within the transaction, the conditions that lead to update following the read ,should not change.

The default isolation level is sufficient for most routine database applications. However problems due to isolation generally show up in large scale environments where thousands of transaction are hitting the same database tables at the same time. By choosing the right isolation levels, you can ensure the performance stays acceptable while avoiding difficult to troubleshoot problems.