Code & Cluster: JAVA concurrency scalability

1. Synchronized

Every one knows that synchronized keyword is used to acquire a lock and serialize access to a code block. A lesser known function of this keyword is to synchronize thread local memory with main memory. When a thread enters a synchronized block, it refreshes its local cache with data from main memory. You can be sure that you are now reading any data written by other threads. When a thread leaves a synchronized block, it writes data to main memory. The data is guaranteed to be seen by any other thread that reads.

2. Executors

Prior to JDK 5 and prior to java.util.concurrent, the way to create threads was to extend java.lang.Thread and override the run method or implement a Runnable and pass it to a Thread constructor. However most applications need more than a single thread and you had to write your own thread pool. Since JDK5, the preferred way to create and use threads is to use java.util.concurrent.Executors class to create a threadpool.

ExecutorService tPool = Executors.newCachedThreadPool() ;
tPool.submit(new Runnable() {

public void run() {
             // do work
   }

}) ;

Executors can create different kinds to threadpools. ExecutorService is an interface that can accept Runnable or Callable s that need to be executed.

3. Callable and Future

Callable like Runnable is an interface to represent a task that needs to be executed. The difference is that the call method of the Callable interface can return a value.

Future is an interface that has methods to check status on an executing task and get the result when it is available.

Callable<List> work = new Callable<List>() {   public List call() {

         List result = new ArrayList() ;

          // do some work and populate result

         return result ;
   }
}

Future<List> future = executor.submit(work) ;

List result = future.get() ;

get() method waits for the execution to complete and then gets the result.

Callable and Future make it convenient to code the interaction between tasks that generate results and tasks that are waiting for results. Future also has methods to check if a task is completed or canceled. You may cancel a task with the cancel method.

4. Thread Termination

Terminating a thread or threadpool gracefully is the responsibility of the application. A best practice is to provide a stop method that tells the thread to let submitted work complete and then exit.

If you have created the thread directly, then your implementation of shutdown needs to set a flag. The run method would check this flag and exit when necessary. Since a race condition is possible care should to taken to synchronize setting or reading the flag. Once the flag is set, any new work should be rejected and the thread should exit after already submitted work is completed.

ExecutorService discussed above has a shutdown method which shuts down the threadpool after completing of already submitted tasks. No new tasks are accepted once this method is called.

public void stop() throws InterruptedException {
    executor.shutdown() ;
    executor.awaitTermination(timeout,TimeUnit.seconds) ;
}

5. Thread Interruption

Interruption is cooperative. Calling the interrupt method on thread merely sets the interrupted status flag in the thread to true. It does not automatically interrupt the thread. Implementations of well behaved blocking methods or long running methods should check this flag and exit early. Exiting early involves clearing the interrupted status and throwing an InterruptedException.

If your code calls a method that throws an InterruptedException, the code should either propagate the exception up the stack ( so someone more appropriate can handle it) or it should catch the Exception and set the interrupted status by calling the interrupt method.

The isInterrupted method returns the current interrupted status. The interrupted() method clears the status. These method names are a little confusing.

6. ConcurrentHashMap

In concurrent programs, it is better to use ConcurrentHashMap as opposed to synchronizedHashMap. See the blog ConcurrentHashMap.

7. Explicit locks

Explicit locks have several advantages over the synchronized keyword. For details read the blog When to use Explicit Locks ?

8. Compare and Swap

Locking in concurrent programs whether using synchronized or using explicit locks is expensive. The thread that is blocked waiting for lock might be suspended by operating system. When it acquires the lock it has to be rescheduled for execution and wait for its time slice.

Modern CPUs support the execution of some compound operations like compare and swap, fetch and increment, test and set without locking. When multiple threads try to operate on the same data, only one thread succeeds but the others do not block. This substantially increases scalability of concurrent programs.

Since JDK 5, Java has taken advantage of these atomic compound operations supported by CPUs in the form of Atomic variables and data structures like ConcurrentHashMap. Atomic classes discussed in 9 have various compound operations like CompareAndSet that take advantage of this.

9. Atomic variables

Operations like incrementing a variable or check and update, are compound operations. You first read, then increment/check and lastly write. For this to be threadsafe, locking is required. As mentioned above, locking is expensive and not scalable. The java.util.concurrent.atomic package has a set of classes that let you perform thread safe lock free operations on variables using techniques like item 8.

A get on an atomic variable gets the latest update from memory. A set on an atomic variable is available immediately to other threads to read. This is the same behavior as volatile variable and as per the Java memory model listed in 10.

10. Java Memory model

The Java memory model describes the rules that define how variables written to memory are seen, when such variables are written and read by multiple threads. It is a topic that is not well understood and many programmers are not aware of it. Read about it in the blog Java Memory model.

A synchronized HashMap is a Map returned by calling synchronizedMap methods of java.util.Collections class.

Map syncMap = Collections.synchronizedMap(new HashMap()) ;

The characteristics of synchronized collections are:

1. Each method is synchronized using an object level lock. So the get and put methods on syncMap acquire a lock on syncMap.

2. Compound operations such as check -then - update or iterating over the collection require the client to explicitly acquire a lock on the collection object.

synchronized(syncMap) {
     Integer val = syncMap.get(key) ;
     if ( val == null) {
          syncMap.put(key, newvalue) ;

}

Without synchronization, multiple threads calling the code can lead to inconsistent values.

3. Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.

4. HashMap and other collections from java.util.collections throw ConcurrentModificationException if a thread tries to modify a collection while another thread is iterating over it. The recommended approach is to acquire a lock before iterating over the map.

ConcurrentHashMap was introduced in JDK 5.

The characteristics of ConcurrentHashMap are:

1. There is no locking at the object level. The locking is at a much finer granularity. For a concurrentHashMap , the locks may be at a hashmap bucket level.

2. The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads to much more scalability.

3. Since there no locking at the object level, additional atomic methods are provided for some compound operations. The ConcurrentHashMap has methods putIfAbsent, remove, replace all of which require checking a key or value and then performing a put or remove.

The code above can be replaced by

ConcurrentHashMap concMap ;
.
.
.
concMap.putIfAbsent(key,newvalue) ;

4. ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it. The iterator returned by ConcurrentHashMap is an iterator on a snapshot of the data when the iterator was created. It may or may not have changes made by other threads after the iterator was created.

In general, using ConcurrentHashMap instead of synchronized Map gives you much better scalability and you do not have to explicitly synchronize on the map object.

Code & Cluster

Tuesday, May 28, 2013

Java Concurrency : 10 things every programmer should know

Friday, October 19, 2012

JAVA Synchronized HashMap vs ConcurrentHashMap