Sunday, March 6, 2011

When to use the Java volatile keyword ?

In the Java programming world, there is some misunderstanding around the volatile keyword. Most programmers never use it and you rarely see code that has it. In this article I try to remove the mystery surrounding volatile.

Section 8.3.1.4 of the Java language specification has this definition: A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable.

Consider the class below:
public class SomeValue {
    private static long value = 0  ;
    public static void setValue(int i) {
        value = i ;
    }
    public static void getValue() {
        return i ;
    }
}

At time time1, thread t1 calls SomeValue.setvalue(3).
At time time2, thread t2 calls SomeValue.getValue().
Thanks to the java memory model, there is no guarantee that thread t2 see the value 3.

One way to fix the problem is the make the methods getValue and setValue synchronized. Declaring the member variable value volatile is another way to do the same.

So which approach is better ? To answer the question, there are 2 concepts that need to be examined.

1. Locking:
With synchronized, locking is involved. To execute code within a synchronized block, a lock must first be acquired. The lock is held until the block is exit. If another thread tries to execute the same code block, it will block until it can acquire the lock.

With volatile, no locking is involved. Whenever possible, avoiding locking is better because locking can make the application less scalable.

2. Atomicity:
An atomic operation is one that is indivisible.
public void setX(boolean val) {
    x = val ;
}

setX is an atomic operation. Every thread that calls setX concurrently can setX and expect it to behave correctly. Timing does not affect the result.

But consider the method
public void increase(long x) {
    value = value + x ;
}

It really consists of 3 operations: get the value, add x to it, store the value.
If two threads call increase, the result depends on timing.

Let us say at time1 , value is 3
At time2, thread t1 calls increase(4). It sees the current value at 3. But before it can update the value,
At time3, thread t2 calls increase(5). It sees the current value as 3 as well.
At time4, thread t1 updates the value to 3+4 = 7.
At time5, thread t2 updates the value to 3+5 = 8. This is known as a race condition. Depending on timing of event, you can get different results.

Synchronization makes the code within the synchronized block atomic. If you make the method synchronized,
public synchronized void increase(long x) {
    value = value + x ;
}

the increase method is atomic. Only one thread is able to increase the value at a time. When the method completes, every other thread will see the value. With synchronized, in the above example, the final value would be 3+4+5 which is 12 which is correct.

By just declaring value as volatile, this atomicity is not there. It two threads call increase at the same time, we cannot predict what the result is.

Back to the question we started with. When to use volatile ?

The volatile keyword can help when you know that your access to the field whether read or write is atomic. The benefit of volatile is that the changes to the field are published to other threads without the use of locking. When you are sure that operations are atomic, declaring a field as volatile is simpler that wrapping code in synchronized blocks.

An additional point to remember is that volatile is not necessary for fields that immutable or declared final. The Java memory model guarantees that final fields are visible to other threads without synchronization or volatile.

The classic use case of volatile is a boolean flag that is used to stop a long running thread.
public class LongRunningTask extend Thread {
    private volatile boolean stop = false ;
    public void run() {
       while(!stop) {
         doWork() ;
       }
    }
    public void stopTask() {
       stop = true ;
    }
}

If one thread calls stopTask, the main thread can immediately see the value and stop. stopTask is atomic.

However when you have operations that are non atomic such as the increase method, you need synchronization. JDK5 introduced atomic variable classes such AtomicInteger, AtomicLong etc in the package java.util.concurrent.atomic. These classes extend the volatile concept to compound operations such as read-modify-write. These can be used in place of volatile. A more detailed discussion of atomic classes is topic for another blog.