Sunday, June 27, 2010

What Java Map class should I use ?

The interface java.util.Map is an interface for mapping keys to values. Implementations of Map provide in memory maps of keys to values. The interface provides methods to insert (key,value) pairs into a map and methods to look up values based on the key.

The JDK provides a few implementations of the map interface such as HashMap, LinkedHashMap, TreeMap and the IdentityMap. Which implementation should you use ? The answers depends on what your requirement is. 

The HashMap is implemented using a hash table. Hash tables are implemented using a hash function to calculate the index into an array where the value is placed.  Hence HashMap provides the fastest search and insertion times O(1). HashMaps however take up more memory. Also they do not guarantee any ordering for the keys. If you iterate over the keys, you will find them in no particular order.

The LinkedHashMap is implemented using both hash table and linked list. In addition to using a hash table for fast retrieval, it maintains a doubly linked list through the items. The linked list helps preserve the order in which keys were inserted into the map. Search or get operation is still as fast as an HashMap, but inserts or deletes are a little slower because of the additional work involved in maintaining the linked list. LinkedHashMap can be used if you are going to  need to iterate over the keys in say the order in which the keys were inserted into the Map. LinkedHashMap also has a constructor parameter accessorder, which when set to true, changes the ordering from insertion order to access order - from least recently accessed to most recently accessed. This is particularly useful if you are using the Map to build a cache.

TreeMap is implemented using the red black tree which is a balanced tree. When items are inserted into a tree, they are placed at a position based on their natural ordering or based on a provided comparator. The ordering is thus a sorted order. Search and insertion for TreeMap are of the order O(log(n)). This is slower than HashMap and LinkedHashMap. You might need to use a TreeMap if you are going to need to iterate over the keys in the map in a sorted order.

IdentityHashMap is implemented using hash table and is very fast - O(1) for both get and insert. But unlike other Map classes which use the equals method when comparing keys, it just uses object reference equality. For 2 keys key1 and key2, it will compare using key1 == key2 rather than key1.equals(key2). The JDK is clear that this is not be to used as general purpose Map where you need lookups based on values. The specification of the containsKey(object key) in the Map interface states that it returns true if key.equals(k). This is not true for the IdentityHashMap. IdentityHashMap is rarely used. But you might use it if you are doing something that needs keeping track of object references.

In summary,
  • HashMap is the fastest map with O(1) search and insertion times.
  • LinkedHashMap is a little slower for inserts, but maintains the insertion order.
  • TreeMap is the slowest map, but lets you iterate over the keys in a sorted order.

Sunday, April 18, 2010

The Java Memory Model

The Java memory model describes the rules that define how variables written to memory are seen, when such variables are written and read by multiple threads.

When a thread reads a variable, it is not necessarily getting the latest value from memory. The processor might return a cached value. Additionally, even though the programmer authored code where a variable is first written and later read, the compiler might reorder the statements as long as it does not change the program semantics. It is quite common for processors and compilers to do this for performance optimization. As a result, a thread might not see the values it expects to see. This can result in hard to fix bugs in concurrent programs.

The Java programming language provides synchronized, volatile, final to help write safe multithreaded code. However earlier versions of Java had several issues because the memory model was underspecified. JSR 133 fixed some of the flaws in the earlier memory model.

Most programmers are familiar that entering a synchronized block means obtaining a lock on a monitor that ensures that no other thread can enter the synchronized block. Less familiar but equally important are the facts that
(1) accquring a lock and entering a synchronized block forces the thread to refresh data from memory.
(2) On exiting the synchronized block, data written is flushed to memory.

This ensures that values written by a thread in a synchronized block are visible to other threads in synchronized blocks.

Ever heard of  "happens before" in the context of Java ? JSR 133 introduced the term "happens before" and provided some guarantees about the ordering of actions within a program.These guarantees are

(1) Every action in a thread happens before every other action that comes after it in the thread.
(2) An unlock on a monitor happens before a subsequent lock on the same monitor
(3) A volatile write on a variable happens before a subsequent volatile read on the same variable
(4) A call to  Thread.start() happens before any other statement in that thread
(5) All actions in thread happen before any other thread returns from a join() on that thread

Where action is defined in section 17.4.2 of the Java language specification as statements that can be detected or influenced by other threads. Normal read/write, volatile read/write, lock/unlock are some actions.

Rules 1 ,4 and 5 guarantee that within a single thread all actions will execute in the order in which they appear in the authored program. Rules 2 and 4 guarantee that between multiple threads working on shared data, the relative ordering of synchronized blocks and the order of read/writes on volatile variables is preserved.

Rules 2 and 4 makes volatile very similar synchronized block. Prior to JSR 133, volatile still meant that a write to volatile variable is written directly to memory and a read is read from memory. But a compiler could reorder volatile read/writes with non volatle read/writes causing incorrect results. Not possible after JSR 133.

One additional notable point. This is related to final members that are initalized in the constructor of a class. As long as the constructor completes execution properly, the final members are visible to other threads without synchronization. If you however share the reference to the object from within the constructor, then all bets are off.

Saturday, March 27, 2010

REST in a nutshell

REST or Representational State Transfer is the architectural style of the world wide web. REST was defined by Roy Fielding in his PhD thesis sometime around 2000. But it has come to the foreground only recently with the realization and acceptance of REST as the preferred architecture for providing web services. In recent years, REST has replaced SOAP/WSDL as the preferred way to build web services.

Representational State Transfer is a term that sounds very academic. It is best explained using the web where it is put to use every time you use the web. A typical web interaction involves a user in front of a web browser ( the client ) making a request to say get some resource from a server by typing in a URL. The representation of the resource is a document. The location of the document is described by the URL. Getting the resource from the server and displaying in the browser causes the state transition in the browser.

The protocol used in the web browser-server communication is HTTP described in RFC2616.

When you type the URL http://www.xyz.com/path/mydoc.html, the browser requests the document from the host by sending the GET command.

Everyone has at some point filled out a form on a web page to may be open an account or to buy something online. When you click the submit button, the browser sends the data to the server using the http POST command.

Other common http commands are PUT which is used to update a document on the server and DELETE which is used to delete a document from the server.

This sort of interaction does not necessarily have to take place only between a browser and a server. It could very well be two servers or applications exchanging documents or data. The term web services is generally used to describe this kind of interaction. The two applications could have been written in different programming languages. But that does not matter since they are talking HTTP.  Ironically, web services were popularized by SOAP/WSDL, the technology that REST is displacing.

SOAP based web services use XML to describe the operation or method to be invoked on the server, the  data that is passed as input and the data that is output. In SOAP, describing the interface, operation and data has to done for each and every application. For example, for an application that manages users, you define methods like getUser, createUser, deleteUser and so on. For an application that manages accounts, you might define methods like getAccount, createAccount, unaccountably and etc.

Contrast this with REST , where the operations stay the same no matter what the application is - GET, PUT, POST, DELETE. The resources are identified by URLs such as http://www.xyz.com/user/userid or
http//www.xyx.com/account/accountid. REST is thus a lot simpler. Data is transferred as the body of the HTTP message. Popular formats are XML and JSON.

An example of a GET request is

GET /account/A12345 HTTP/1.1
Host: www.xyz.com/


and the server response could be

HTTP/1.1 200 OK
Content-Type: text/json; charset=utf-8
Content-Length: length

{"account":
  {"id":"A12345",
   "type":"checking",
    "balance":"150"
   }
}

Similarly to create a new account, the request would be

POST /account HTTP/1.1
Host: http://www.xyz.com/
Content-Length: length
Content-Type: text/json
{"account":
   {"id":"A12346",
     "type":"checking",
     "balance":"200"
    }
}

An important characteristic of RESTful applications is that they are stateless. Each request has all the information necessary for the server to respond to the request. This is a critical requirement for such web services applications to be highly scalable. By being stateless, each subsequent request can be serviced by any other peer server, which means you can use a cluster of servers to service requests and you can have high availability and failover.

In summary, REST is a simple and powerful architectural style for building web services. REST requires identifying the data or resources in your application using URLs. The HTTP commands such as GET,PUT,POST,DELETE are the verbs or operations exposed by the application. The client and server communicate using the HTTP protocol with the data carried in the body of the message.

Thursday, March 11, 2010

The wait/notify mechanism in JAVA

Consider the problem where one thread produces some data and other threads consume the data. This is the producer consumer pattern. If the producer is not producing data fast enough, the consumers might have to wait. The wrong way to handle this is for consumer threads to go to sleep and then check for available data at periodic intervals. This eats up cpu cycles. Fortunately JAVA provides a more elegant way.

The wait/notify mechanism allows one thread to wait for a notification from another thread. The consumer thread checks a variable that can indicate if data is available or not and if not , then call the wait() method. Calling the wait method puts the thread to sleep until it is notified. Whenever the producer thread produces data and updates the variable, it calls notify() or notifyAll() to notify all the waiting threads.

The class WorkQueue in Listing 1 shows how this works. The addWork method is called by producer threads to queue up work that needs to be done. The getWork method is called by consumer threads that will do the work. When there is no work, the consumer threads need to wait. 

public class WorkQueue {
     private final ArrayDeque workq  = new ArrayDeque();
    public void addWork(Object work) {
        synchronized(workq) {
             workq.addLast(work) ;
            workq.notifyAll() ;
            System.out.println("Thread producer added work notiying waiter") ;
        }
    }
    public Object getWork() {  
        Object ret = null ;
        synchronized(workq) {   
            while(((ret = workq.pollFirst()) == null) ){
                 try {
                    System.out.println("No Work Thread consumer going to block") ;
                    workq.wait() ;
                    System.out.println("Thread consumer woken") ;
                } catch(Exception e) {
                    System.out.println(e) ;
                }
            }
           
        }
      return ret ;
      }
Listing 1

wait(), notify() and notifyAll() are methods of java.lang.Object. The thread needs to accquire a lock on an object before calling any of these methods.  The getWork method gets a lock on the variable workq. The condition in the while loop checks if there is a work item. If there is an item, the condition is false, we break out of the loop, exit the synchronized block, there by releasing the lock on workq and return the work item. If workq.pollFirst() returns null, there is no work queued, we enter the loop and call the wait method. Calling the wait method causes the thread to give up the lock and go to sleep.

When a producer thread calls the addWork method, the statement synchronized(workq) ensures that it first accquires a lock. If another thread has a lock on workq, this thread will wait until it gets the lock. It then adds the work item to the workq. Lastly it calls notifyAll() on workq to notify all waiting threads that work is available.

When the consumer thread receives the notification, it wakes up.  However , to return from the wait() method, it needs to reaccquire the lock first. On accquiring the lock, it returns from wait() and continues to execute the loop. The condition workq.pollFirst() == null is checked again. If false, the thread got an item and it exits the loop and the method. If not, it calls wait again and sleeps till the next notification.

The code in listing 2 exercises the class in listing 1 with 2 threads.
public static void main(String[] args) {
        final WorkQueue wq = new WorkQueue() ;
        Runnable producer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    wq.addWork(Integer.toString(i)) ;
                    System.out.println("Thread producer created work :" + i) ;
                    try {
                        Thread.sleep(5000) ;
                    } catch(Exception e) {
                        System.out.println(e) ;
                    }
                }
               
            }

        } ;
       
        Runnable consumer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    String work = (String)wq.getWork() ;
                    System.out.println("Thread consumer Got work:" + work) ;
                }
            }
        } ;
       
        Thread tconsumer = new Thread(consumer,"tconsumer") ;
        tconsumer.start() ;
        Thread tproducer = new Thread(producer,"tproducer") ;
        tproducer.start() ;
   
    }
Listing 2

Running the code should produce output shown below.
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :1
Thread consumer woken
Thread consumer Got work:1
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :2
Thread consumer woken
Thread consumer Got work:2
.
.
.
.......

Note that the wait must always be called within a loop that checks the condition that the thread is waiting on. Testing the condition before the wait() ensures that the thread waits() only when the condition is true. Testing the condition after the wait(), that is after the thread is woken up, ensures that the thread continues to wait, if the condition is still true.

In summary, the wait/notify mechanism is a powerful mechanism for threads to communicate with each other. However, use with care and remember to call the wait() within a while loop.

Thursday, February 25, 2010

The dreaded double check pattern in java

The problem with double check locking in java is well documented. Yet even a seasoned programmer can get overzealous trying to optimize synchronization of code that creates singletons and fall prey to the trap. Consider the code
public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      s = new Sample() ;
    }
  return s ;
  }
}
Listing 1
This code is not thread safe. If 2 threads t1 and t2 enter the getSample() method at the same time, they are likely to get different instances of sample. This can be fixed easily by adding the synchronized keyword to the getSample() method.
public class Sample {
  private static Sample s = null ;
  public static synchronized Sample getSample() {
    if (s == null) {
      s = new Sample() ;
    }
    return s ;
  }
}
Listing 2
Now the getSample method works correctly. Before entering the getSample method, thread t1 accquires a lock. Any other thread t2 that needs to enter the method will block until t1 exits the method and releases the lock. Code works. Life is good. This is where the smart programmer if not careful, can outsmart himself. He will notice that in reality only the first call to getSample, which creates the instance, needs to be synchronized and subsequent calls that merely return s are paying an unnecessary penalty. He decides to optimize the code to
public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      synchronized(Sample.class) {
        s = new Sample() ;
      }
    }
    return s ;
  }
}
Listing 3
Our java guru quickly realizes that this code has the same problem that listing 1 has. So he fine tunes it further.
public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      synchronized(Sample.class) {
        if (s == null) {
          s = new Sample() ;
        }
      }
    }
    return s ;
  }
}
Listing 4
By adding an additional check withing the synchronized block, he has ensured that only one thread will ever create an instance of the sample. This is the double check pattern. Our guru's friend, a java expert, buddy reviews the code. Code is checked in and product is shipped. Life is good right ?

Wrong !! Let us say thread t1 enters getSample. s is null. It gets a lock. Within the synchronized block, it checks that s is still null and then executes the constructor for Sample. Before the execution of the constructor completes t1 is swapped out and t2 gets control. Since the constructor did not complete, s is partially initialized. It is not null, but has some corrupt or incomplete value. When t2 enters getSample, it sees that s is not null and returns a corrupt value.

In summary, the double check pattern does not work. The options are to synchronize at a method level as in s listing 2 or to forego lazy initialization. Another option that works is to use a nested holder class that delays initialization until the get method is called.
public class Sample {
  private static class SampleHolder {
    public static Sample INSTANCE = new Sample() ;
  }
  public static Sample getSample()  {
    return SampleHolder.INSTANCE ;
  }
}

Listing 5

Sunday, February 7, 2010

Service Oriented Architecture

Service Oriented Architecture or SOA was a hot new buzz word 3 - 4 years ago. While still around, it has been pushed to the background by other hot paradigms like Software as a Service (Saas) and cloud computing. You don't see vendors highlighting the SOA qualities of the products that much. Is SOA dead ?

In this blog I will briefly describe the principles of SOA and where I think it stands today.

In SOA, you build complex software services by coupling together other coarse grained software services, each of which provides a distinct service. The coupling between services is loose coupling. The idea is that you should be able to replace a service or add a new one without lot of coding or rework.

The participating services do not have to be written in the same programming language. Some could have been developed in java, others in .net, some others in c++ and so on.

The interaction between services may be based on asynchronous messaging. Instead of using APIs, services use messages to exchange data between each other. This is in contrast to strongly typed APIs.

Each service provides the abstraction for 1 particular task such as computing the credit score for an application. Each service can be optimized to do the one task it does really well.

Bigger applications are composed by assembling other services that provide a specific function.

An example of a reusable service is a service for creating, updating, viewing customer data. Every other application in the enterprise can use this service for customer information. The service may be available via an end point such as http URL or a messaging destination. The customer data that the service accepts and returns may be described using XML.

Some of the technologies the enable SOA are:

1. Web Services
2. Service Component architecture (SCA)
3. Service Data Objects (SDO)
4. Enterprise messaging
5. XML
6. REST

Web Services and SCA are used to build services. The data or meta data can be exchanged between services as straight XML or using SDOs. The messaging transport can be SOAP over HTTP in the case of web services or HTTP in the case of REST. One can use asynchronous messaging protocols like JMS when persistent messages are required.

We know things like abstraction , encapsulation, res use from the early days of object oriented design and design patterns. So SOA is not telling us any thing radically new. However it sort of reminds us once more to apply these design principles in large enterprise application environments.

SOA is not without problems. Weakly typed message based protocols as encouraged by SOA are typically hard to debug. Describing metadata in XML and using it in runtime work well for simple interfaces, but is more problematic for complex interfaces. Parsing XML is expensive and adds a performance overhead. Propagating contexts from one service to another can be an issue when the services are controlled by different teams. Due to the complexity, significant testing needs to be done before rolling out such services to users.

In conclusion, SOA reminds us of some useful design principles. These same principles are applicable even if you are building a cloud and offering your software as service. However you want to pay close attention to the know pitfalls when they apply to your environment.

In the future blogs, I will provide an introduction to some of the technologies that enable you to build SOA services.