The Khangaonkar Report: 2010

Wednesday, December 29, 2010

Performance tip #1 - Java Strings

The java.lang.String class is a very convenient class for handling text or character data. It is used extensively in java programs. Sometimes excessively to the point that it degrades performance.

The String class is immutable. It cannot be changed. It has concatenation and substring methods that give the appearance of changing the string. In realilty, the jvm copies data and creates new strings.

// code 1 Inappropriate

String item ;
String[] listofitems ;
for (int i = 0 ; i < numitems ; i++)
item = item + listofitems[i] ;

// code 2 Much better
StringBuffer item ;
for (int i = 0 ; i < numitems ; i++)
item.append(listofitems[i]) ;

java.lang.StringBuffer is the mutable counterpart of String, that should be used for manipulating Strings. StringBuffer is threadsafe. java.lang.StringBuilder provides methods similar to StringBuffer, but the methods are not synchronized. StringBuilder can used when thread safety is not an issue.

In 1 ,a new String is created for every iteration of the loop and the contents of the previous iteration are copied into it. For n items, this is going to perform O(n square). Code 2 scales linearly O(n). For large values of n, the difference is significant.

There are a number of methods such as substring, trim, toLowerCase, toUpperCase that have this problem.

However there is one instance where concatenation is better. If the String can be resolved at compile time as in

// code 4
String greeting = "hello" + " new world" ;

At compile time , this gets compiled to
String greeting = "hello new world" ;

The alternative is
// code 5
String greeting = (new StringBuffer()).append("hello").append(" new world").toString() ;

Code 4 is more efficient as it avoid the temporary StringBuffer as well as the additional method calls.

Processing character arrays as is done in the C programming language is generally the fastest. But almost all public java libraries use String in their interfaces. So most of the time the programmer has no choice. So proper care needs to be take to ensure acceptable performance.

Lastly, converting Strings to numbers whether it is int, long, double or float can be expensive. Many programmers make the mistake of starting with String and then converting to a number type when they need to do a numerical computation. As a general rule, Strings should be used when processing text. For example, if you know that data being read from a file is of type int or float, read it into the correct type and avoid unnecessary conversion.

Tuesday, November 30, 2010

Database isolation

When multiple threads are accessing data from the same table or tables in a relational database, care needs to be taken that updates from one thread or transaction do not interfere with others. Isolation is a property of database transactions that determine when changes made by one transaction are visible to other concurrent transactions.

Relational databases support different isolation levels. Isolation is typically achieved either by locking data or by serializing access to data, both of which lead to loss in concurrency ( and thus performance). It is thus important to pick the correct isolation level so you have optimal performance without loss in correctness.

To understand isolation levels, it is useful to first talk about the type of queries than can happen.

1. Dirty reads
A dirty read is one that reads uncommitted data.This is dangerous for obvious reason. The data you just read may get rolled back and never exist in the database in the future.

2. Non repeatable read
A non repeatable read reads committed data. But if you do the read again, you will see the effect of any changes to the data that were committed by other transaction.

At time a1, say a transaction t1 executes query:

select quantity from Order
where orderid = 1

It returns 2.

At time a2, transaction t2 updates the quantity to 3.

At time a3, t1 executes the same query which returns 3.

3. Phantom reads
A phantom read reads committed data as well. But a subsequent read in the same
transaction may see new data added and committed by another transaction.

At time a1, a transaction t1 does

Select OrderId from Order
where itemid = '23'

The querry returns

1
2

At time a2 , another transaction t2 inserts a new order for the same item id.

At time a3, transaction t1 executes the same query and it might return

1
2
3

Now that we understand the types of reads, let us look at the isolation levels. There are 4 isolation levels defined by ANSI

1. READ UNCOMMITTED
This is the least stringent isolation level. At this level, dirty reads are allowed. You are reading data that may or may not be committed and it is unreliable.

2. READ COMMITTED
This isolation level ensures that only committed data is read. Dirty reads are thus not allowed. But Non repeatable reads or phantom reads are possible. This isolation level is sufficient if you just want to get a snapshot of the data at a particular time and do not care that might be updated.

3. REPEATABLE READ
This isolation level ensures that rows read within a transaction will not be updated by another transaction. New rows might added, but ones already read will not change. The reads are thus repeatable. Non repeatable read is not possible, but phantom reads are possible.

4. SERIALIZABLE
This is the most stringent isolation level. Access to data is serialized. This is very expensive but none of the problem reads are possible.

The default isolation level in SQL Server, Oracle and DB2 is Read Committed. Most the time this is sufficient. Read Uncommitted is too dangerous and serializable can lead to unacceptable performance.

Repeatable Read is necessary when you say read rows from a database and based on the value, do an update or delete within the same transaction. You need repeatable read because within the transaction, the conditions that lead to update following the read ,should not change.

The default isolation level is sufficient for most routine database applications. However problems due to isolation generally show up in large scale environments where thousands of transaction are hitting the same database tables at the same time. By choosing the right isolation levels, you can ensure the performance stays acceptable while avoiding difficult to troubleshoot problems.

Sunday, October 24, 2010

Developing web applications with Spring

Spring MVC enables easy web application development with a framework based on the Model View Controller architecture (MVC) pattern. The MVC architectural pattern requires the separation of the user interface (View), the data being processed (Model) and the Controller which manages the interactions between the view and the model.

At the core of Spring MVC is a servlet, the DispatcherServlet, that handles every request. The DispatcherServlet routes the HTTP request to a Controller class authored by the application developer. The controller class handles the request and decides which view should be displayed to the user as part of the response.

Let us develop a simple web application that takes a request and sends some data back to the user. Before you proceed any further, I recommend you download the source code at springmvc.zip

For this tutorial you will also need

(1) A webserver like Tomcat
(2) Spring 3.0
(3) Eclipse is optional. I use eclipse as my IDE. Eclipse lets you export the war that can be deployed to Tomcat. But you can use other IDEs or command line tools as well.
(4) Some familiarity with JSPs and Servlets is required.

Step 1: If you were to develop a web application in J2EE, typically you do it by developing servlets or JSPs, that are packaged in .war file. Also necessary is a deployment descriptor web.xml that contains configuration metadata. The war is deployed to a web server like tomcat.

With Spring, the first thing to do is to wire Spring to this J2EE web infrastructure by defining org.springframework.web.servlet.DispatcherServlet as the servlet class for this application. You also need to define org.springframework.web.context.ContextLoaderListener as a listener. ContextLoaderListener is responsible for loading the spring specific application context which has Spring metadata.

The web.xml setup ensures that every request to the application is routed by the servlet engine to DipatcherServlet. The updated to web.xml is shown below:

<listener>
    <listener-class>
        org.springframework.web.context.ContextLoaderListener
    </listener-class>
</listener>
<servlet>
    <servlet-name>springmvc</servlet-name>
    <servlet-class>
        org.springframework.web.servlet.DispatcherServlet
    </servlet-class>
    <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
    <servlet-name>springmvc</servlet-name>
    <url-pattern>*.htm</url-pattern>
</servlet-mapping>

Step 2: The heavy lifting in this web application is done by a controller class. This is an ordinary java class or bean that extends org.springframework.web.servlet.mvc.AbstractController. We override the handleRequestInternal method. In this method, you would do the things necessary to handle the request which may include for example reading from a database.

The method returns a org.springframework.web.servlet.ModelAndView object which encapsulates the name of the view and any data (model) that needs to be displayed by the view. ModelAndView holds data as name value pairs.This data is later made available to the view. If the view is a jsp, then you can access the data using either jstl techniques or by directly querying the Request object. The code for our controller is shown below:

public class SpringMVCController extends AbstractController {
    protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) {
        ModelAndView mview = new ModelAndView("springmvc") ;
        mview.addObject("greeting", "Greetings from SpringMVC") ;
        mview.addObject("member1", new Member("Jonh","Doe", 
            "1234 Main St","Pleasanton","94588","kh@gmail.com","1234")) ;
        return mview ;
    }
}

The name of the view springmvc is passed in to the constructor of ModelAndView. The addObject methods add 2 model objects greeting and member1. Later you will see how the view can retrieve the objects and display them.

Step 3: Every Spring application needs metadata that defines the beans and their dependencies. For this application, we create a springmvc-servlet.xml. We help spring find it by specifying its location in web.xml.

<context-param>
    <param-name>contextConfigLocation</param-name>
    <param-value>/WEB-INF/springmvc-servlet.xml</param-value>
</context-param>

In springmvc-servlet.xml, the controller bean is defined as

<bean name="/*.htm" class="com.mj.spring.mvc.SpringMVCController"/>

Step 4: How does DispatcherServlet know which Controller should handle the request ?

Spring uses handler mappings to associate controllers with requests. 2 commonly used handler mappings are BeanNameUrlHandlerMapping and SimpleUrlHandlerMapping.

In BeanNameUrlHandlerMapping, when the request url matches the name of the bean, the class in the bean definition is the controller that will handle the request.

In our example, we use BeanNameUrlHandlerMapping as shown below. Every request url ending in .htm is handled by SpringMVCController.

<bean name="/*.htm" class="com.mj.spring.mvc.SpringMVCController"/>

In SimpleUrlHandlerMapping, the mapping is more explicit. You can specify a number of urls and each URL can be explicitly associated with a controller.

Step 5: How does the DispatcherServlet know what to return as the response ?

As mentioned earlier, the handleInternalRequest method of the controller returns a ModelandView Object.

In the controller code shown above, the name of the view "springmvc" is passed in the constructor to ModelAndView. At this point we have just given the name of the view. We have not said what file or classes or artifacts help produce the html, nor have we said whether the view technology used is JSP or velocity templates or XSLT. For this you need a ViewResolver, which provides that mapping between view name and a concrete view. Spring lets you produce a concrete view using many different technologies, but for this example we shall use JSP.

Spring provides a class InternalResourceViewResolver that supports JSPs and the declaration below in springmvc-servlet.xml tells spring that we use this resolver. The prefix and suffix get added to view name to produce the path to the jsp file that renders the view.

<bean id="viewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver">
    <property name="prefix" value="/WEB-INF/jsp/"></property>
    <property name="suffix" value=".jsp"></property>
</bean>

Step 6: In this example, the view resolves to springmvc.jsp, which uses JSTL to get the data and display it. Spring makes the model objects greeting and member1 available to the JSP as request scope objects. For educational purposes, the code below also gets the objects directly from the request.

// Using JSTL to get the model data
${greeting}
${member1.lastname
// Using java to get the model directly from the request
Map props = request.getParameterMap() ;
System.out.println("PARAMS =" + props) ;
Enumeration em = request.getAttributeNames() ;
while (em.hasMoreElements()) {
    String name = (String) em.nextElement() ;
    System.out.println("name = "+name) ;
}
System.out.println("Attrs are "+request.getAttributeNames()) ;
System.out.println("greeting is "+ request.getAttribute("greeting")) ;
Member m = (Member)request.getAttribute("member1") ;
System.out.println("member is "+m.toString()) ;

Step 7: All files we have developed so far should be packaged into a war file as you would in any web application. The war may be deployed to tomcat by copying to tomcat_install\webapps. I built a war that you can download at springmvc.war

Step 8: Point your web browser http://localhost:8080/springmvc/test.htm to run the application. The browser should display the data.

To summarize, Spring simplifies web application development by providing building blocks that you can assemble easily. We built a web application using Spring MVC. Spring provides an easy way to wire together our model, the controller SpringMVCController and the view springmvc.jsp. We did not have to explicitly code any request/response handling logic. By changing the metadata in springmvc-servlet.xml, you can switch to a different controller or a different view technology.

Sunday, September 19, 2010

Database access made simple with Spring

In "The promise of spring" I talked about some of the benefits of Spring, one of which is the simplification of the usage of JDBC.

Typical JDBC programming requires the programmer to write repeatedly the same code for basic things like loading the JDBC driver, getting a connection, closing a connection. After getting a connection, one has to create a PreparedStatement to execute the SQL. The PreparedStatement may return a ResultSet, over which the programmer needs to iterate to extract the data.

Spring addresses the above the issues as follows:

1. The cornerstone of the Spring framework is dependency injection. With Spring, the connection information such as driver, connection URL can be defined as metadata and a datasource object injected into the application. This frees the programmer from the burden of writting code to manage connections.

2. Spring provides a class JdbcTemplate that abstracts away the repetetive code involving Statements and ResultSets.

3. Lastly Spring maps JDBC checked exceptions to a runtime exception hierarchy. This helps to unclutter application code because the code no longer needs to be cluttered with database specific try catch logic.

Let us see how this works with a sample. Let us write a DAO (data access object) to insert and retrieve information from a database. Before you proceed any further, you can download the complete source code from springjdbc.zip. In the blog below, for brevity, I show only code snippets. To run the sample, you will also need Spring 3.0, which you can download from Spring 3.0.x.

Step 1. For a database, I am going to use Apache Derby. We are going to store and retrieve user information as is typically required in any application. The schema is

create table puma_members ( 

  firstname  varchar(20), 
  lastname   varchar(30) not null, 
  street     varchar(40), 
  city       varchar(15), 
  zip        varchar(6), 
  email      varchar(30) not null primary key, 
  password   varchar(8) 

) ;

To download derby and for help on creating a database and table , see Derby documentation

Step 2. The DAO interface to access this table is

public interface MemberDAO {

    public int insertMember(Member m) ;
    public int deleteMember(String email) ;
    public Member getMember(String email) ;
}

Member is a class that has get and set methods for every column in puma_members. For brevity, the code is not shown here.

Step 3. The class MemberSpringJDBCDAO shall provide an implementation for MemberDAO using Spring JDBC.

The heavylifting in this class is done by an instance of org.springframework.jdbc.core.JdbcTemplate. However we don'nt need to instantiate it explicitly.We will let Spring create an instance and inject it into this class.

To help Spring, we however need to provide getter/setter method for JdbcTemplate.

So the class needs a private variable to hold the jdbcTemplate and setter/getter methods that Spring can call to set its value.

private JdbcTemplate jdbcTemplate ;
public JdbcTemplate getJdbcTemplate() {
    return jdbcTemplate ;
}
public void setJdbcTemplate(JdbcTemplate template) {
    jdbcTemplate = template ;
}

This is a form of dependency injection called setter injection. Spring calls the setJdbcTemplate method to provide our class with an instance of JdbcTemplate.

Step 4. How do we tell Spring we need a JdbcTemplate ? And how does Spring know what database driver to load and what database to connect to ?
All the bean definitions are in springjdbcdao.xml.The main bean memberdao has a property that references a jdbcTemplate.

<bean class="com.mj.spring.jdbc.MemberSpringJDBCDAO" id="memberdao">
    <property name="jdbcTemplate">
       <ref bean="jdbcTemplate"></ref>
    </property>
</bean>

<bean class="org.springframework.jdbc.core.JdbcTemplate" id="jdbcTemplate">
    <constructor-arg>
        <ref bean="dataSource"></ref>
    </constructor-arg>
</bean>

The jdbcTemplate bean has as it implementation org.springframework.jdbc.core.JdbcTemplate which is the class we are interested in. It references another bean dataSource , which has all the necessary jdbc configuration.

<bean id="dataSource"
  class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="org.apache.derby.jdbc.EmbeddedDriver"/>
    <property name="url" 
         value="jdbc:derby:/home/mdk/mjprojects/database/pumausers"/>
</bean>

This configuration has all the information necessary for Spring to create a JdbcTemplate with a dataSource and inject it into memberDAO.

Step 5. JdbcTemplate has a number of helper methods to execute SQL commands that make implementing MemberDAO methods easy.

private static final String insert_sql = "INSERT into puma_members VALUES(?,?,?,?,?,?,?)" ;
private static final String select_sql = "Select * from puma_members where email = ?" ;
public int insertMember(Member member) {
    JdbcTemplate jt = getJdbcTemplate() ;
    Object[] params = new Object[] {member.getFirstname(),member.getLastname(),
                           member.getStreet(),member.getCity(),member.getZip(),
                           member.getEmail(),member.getPassword()} ;
    int ret = jt.update(insert_sql, params) ;
    return ret ;
}
public Member getMember(String email) {
    JdbcTemplate jt = getJdbcTemplate() ;
    Object[] params = new Object[] {email} ;
    List result = jt.query(select_sql,params, new MemberRowMapper()) ;
    Member member = (Member)result.get(0) ;
    return member;
}
private class MemberRowMapper implements RowMapper {
    public Object mapRow(ResultSet rs, int arg1) throws SQLException {
        Member member = new Member(rs.getString("firstname"), rs.getString("lastname"), 
                                   rs.getString("street"), rs.getString("city"), rs.getString("zip"),
                                   rs.getString("email"), rs.getString("password"));
         
        return member ;
    }
}

In insertMember, the update method of JdbcTemplate takes 2 parameters, the SQL insert statement and an array that contains the data to be inserted. In getMember, the query method takes an additional parameter, a class that implements the RowMapper interface, and this maps the JDBC resultSet to the object we want, which is an instance of Member. The Spring javadocs very clearly state that JdbcTemplate is star of the Spring JDBC package. It has several variations of query, update, execute methods. Too many,one might think.

Step 6. The class MemberSpringJDBCDAOTest has the junit tests that test MemberSPringJDBCDAO. A Snippet is below

public void insertMember() {
    ApplicationContext context = new ClassPathXmlApplicationContext(
                                                "springjdbcdao.xml");
    BeanFactory factory = (BeanFactory) context;
    MemberSpringJDBCDAO mDAO=(MemberSpringJDBCDAO) factory.getBean("memberdao");
    Member newMember = new Member("John","Doe","2121 FirstStreet","Doecity",
                                   "42345","jdoe@gmail.com","jondoe") ;
    int ret = mDAO.insertMember(newMember) ;
}

This is typical Spring client code. First we create a BeanFactory and load the metadata in springjdbcdao.xml. Then we request the factory to create a memberDAO. Insert a record into the database by calling the insertMember method.

Clearly the code is a lot simpler than if you implemented MemberDAO in plain JDBC. If you are new to Spring and are intimated with buzz words like inversion of control (IOC), then using Spring for database access is a good way to start benefiting from Spring while learning to use it. Note the loose coupling between the interface MemberDAO and its implementation. The loose coupling is good design and a reason for the popularity of frameworks like Spring. In future blogs, I will implement MemberDAO using other persistence APIs like JPA and may be Hibernate and show how the implementation can be switched without having to change client code.

Tuesday, August 31, 2010

Security concerns with Cloud Computing

A few years ago, there was a news story about an online tax service mixing up peoples tax returns. That scared the hell out of me. While I continued to e-file my returns, I use old fashioned desktop tax preparation software to prepare my return and then e-file it. I am not yet comfortable with someone else hosting all the personal information in a tax return.

Concerns regarding security are a primary barrier in businesses adopting cloud based services for critical stuff. As a cloud consumer, below are some issues to think about and ask the provider. As a cloud service provider, these are issue to think about and address.

1. Physical security

Are the premises that host the servers, databases etc physically secure. The buildings that host the systems needs state of art security technology to restrict entry and monitor who goes in and who goes out, as well as record who is doing what. Location is important as well. Who would want their business data hosted in a high crime area or in a country with a track record of wars.

2. Isolation

A cloud service provider is hosting data from multiple customers. That is something users should never have to care about. Any mixup, like the one described in the first paragraph is completely unacceptable.

3. Authentication and Access control

When sensitive data is hosted outside your enterprise, are the people who manage or access the data properly authenticated ? Is the access limited to those that absolutely need it ? Is the access control policy available for review and reviewed periodically.

4. Data security

Is sensitive data encrypted ? Operations staff such as system administrators manage files and databases. They need to move , backup, copy stuff etc but they do not necessarily need to be able to read credit card numbers from a customer table.

5. Audit trail

As is the case in any business, things can go wrong. There will be the bad apple who happens to come across some sensitive information and decides to misuse it. To be able to investigate such issues, a detailed audit trail is required. Who entered or left the premises ? Who logged on to the system ? what actions did he perform ?

As the saying goes "forewarned is forearmed". If you know the security practices of your provider, you can weigh risk versus benefit and decide what is appropriate to be hosted by the provider.

Sunday, August 15, 2010

The promise of Spring

The Spring framework is a popular alternative to J2EE for enterprise application development. To download Spring or read about it, visit http://www.springsource.org/

The top 5 documented benefits of Spring are:

1. Spring saves the developer the time and effort required to write boiler plate code like JNDI lookups, creating JDBC connections, API specific exception handling etc. Every one has their horror story on not finding a JNDI reference and a framework that abstracts such plumbing code away from the application developer is certainly useful.

2. Spring is a lightweight framework. A lightweight framework should be small in size, conceptually simple and easy to use.

3. Spring simplifies database access. Most applications need to store and retrieve data from a relational database. There are many alternatives for database access such as JDBC, JPA, Hibernate, with varying levels of complexity. Spring provides a simpler abstraction over these APIs. Similarly it simplifies web development with its MVC framework.

4. Spring can used in various environments. It can be used in standalone J2SE applications. It can be used with a webserver such as tomcat. It can be used with a full blown application server like JBOSS or Websphere.

5. Spring lets you develop applications by assembling loosely coupled components. Loose coupling is a better design practice because it allows you to swap out moving parts without having to do major changes to the application. Spring achieves this by what it calls "Inversion of Control" and "Dependency Injection". Actually, it is mostly dependency injection.

Inversion of Control and Dependency Injection are a topic for a separate blog. But very briefly, in Spring, when you author a "bean", you specify the dependencies in an XML file and the Spring container makes them available. You don'nt have to explicitly create each dependency.

In subsequent blogs, using some code example, let us see if Spring delivers on its promise. I will dig deeper with examples and point out where Spring really simplifies development and where it just adds another layer of complexity. So stay tuned.

Sunday, July 25, 2010

Cloud computing: I can see the "cloud" clearly now the rain ....

There are many blogs and articles on the internet on cloud computing and perhaps there are too many. Yet the question keeps popping up. What is cloud computing ? Let us try to clear the fog around this topic.

Almost everyone uses online services like Gmail, Google docs, yahoo mail etc. To use these services, there is no investment to the consumer. There are no software licensing costs, there is no time spent in installing, managing, upgrading and software - not even client software. The infrastructure is provided and managed behind the scenes by providers like Google. Behind the scenes, Google adds the necessary hardware, upgrades the software when necessary, to ensure that you and I as end users get the same quality of service when we use Gmail. This is an example of what is called software as a service and is a form of cloud computing. Some companies do the same for enterprise software, the best example of which is SalesForce.com. Enterprise software is extremely hard to install and manage. With cloud computing, you can pay a fee and start using the software and let the provider take care of installation, management, maintenance and customization.

Companies like Amazon offer storage and servers that an IT department can use on demand. If I am running an IT department and all of sudden the enterprise needs several gigabytes of disk space or additional servers to run some additional jobs, I want to be able to rent the disk space or servers and use it temporarily - without an 18 wheeler bringing in boxes of hardware that I need to install, configure and manage. This is an example of the cloud offering computing services just the way utility companies offer electricity or water.

There are many more variations of this concept of providing some computing infrastructure whether hardware or software, over the internet for use by a customer.

Why is this stuff called the cloud ? The services we are talking about are generally offered over the internet. In computing literature, the cloud drawing is a popular way to show the internet as a computer network. The idea is that as a user you do not care about what is in the cloud, but you can reliably get some computing service from it. Cloud is an abstraction for the underlying computing technology which makes it easy to offer services in a highly dynamic and scalable manner. What distinguishes cloud computing from other forms of distributed computing is that one can view the cloud as vast supply of computing power, some of which you can buy for a fee.

If you are providing a cloud based service, a key requirement is that the architecture needs to dynamic. Depending on user demand, the provider should be able to scale seamlessly by adding hardware or software as necessary. This could be be anything from adding more storage or starting more server processes to handle user requests.

The consumer of the service should not have to do any infrastructure setup related to the service. Ideally the consumer should be able to use the service with minimal help from the provider. When you move into a new apartment or house, you call up the utility company to turn on the electricity service. After that you have uninterrupted service as long as you are paying your monthly bill. Using a cloud service should be as easy as that.

A good cloud platform is self managed and self healing. Services and usage are monitored and resources allocated optimally. Problems should be detected automatically and fixed without interruption of service.

Security for applications as well as data is critical for obvious reasons. Customers would suffer severe financial damages if their private data falls into the wrong hands.

So just putting up a web site that is hosted from your garage does not necessarily make you a cloud service provider.

The biggest advantage of cloud based services for the customer is that it provides a low cost entry point to a variety of computing services. Whether it is storage, web hosting, application services for email, documents to even things like payroll, to be able to get them with no infrastructure investment is a huge plus. Obviously, as with any out sourcing, you give up some control and become dependent on the service provider. However when you are starting out as a business, the lack of control might be tolerable when compared to the cost of setting up a data center or huge IT department. As the business grows and you have solid revenues, you can of course decide to move some function in house.

There are advantages for the service provider as well, especially for software.

If you have developed commercial software, you know how expensive it is to support several different versions. If your software is available as a cloud service, every user is on the same version.

Since users are not installing the software on their hardware, you do not have to support different platforms.

Since you control hardware and software, making updates/upgrades is easier. They can happen behind the scenes.

The application can to be tuned to the hardware/OS platform of your choice and hence can deliver the best performance.

To conclude, it is clear that the cloud computing paradigm does provide value to both customers and providers and it is not just a rehash of old technology - cloud service providers have to solve some real technical problems to ensure the quality of service for their customers.

Sunday, June 27, 2010

What Java Map class should I use ?

The interface java.util.Map is an interface for mapping keys to values. Implementations of Map provide in memory maps of keys to values. The interface provides methods to insert (key,value) pairs into a map and methods to look up values based on the key.

The JDK provides a few implementations of the map interface such as HashMap, LinkedHashMap, TreeMap and the IdentityMap. Which implementation should you use ? The answers depends on what your requirement is.

The HashMap is implemented using a hash table. Hash tables are implemented using a hash function to calculate the index into an array where the value is placed. Hence HashMap provides the fastest search and insertion times O(1). HashMaps however take up more memory. Also they do not guarantee any ordering for the keys. If you iterate over the keys, you will find them in no particular order.

The LinkedHashMap is implemented using both hash table and linked list. In addition to using a hash table for fast retrieval, it maintains a doubly linked list through the items. The linked list helps preserve the order in which keys were inserted into the map. Search or get operation is still as fast as an HashMap, but inserts or deletes are a little slower because of the additional work involved in maintaining the linked list. LinkedHashMap can be used if you are going to need to iterate over the keys in say the order in which the keys were inserted into the Map. LinkedHashMap also has a constructor parameter accessorder, which when set to true, changes the ordering from insertion order to access order - from least recently accessed to most recently accessed. This is particularly useful if you are using the Map to build a cache.

TreeMap is implemented using the red black tree which is a balanced tree. When items are inserted into a tree, they are placed at a position based on their natural ordering or based on a provided comparator. The ordering is thus a sorted order. Search and insertion for TreeMap are of the order O(log(n)). This is slower than HashMap and LinkedHashMap. You might need to use a TreeMap if you are going to need to iterate over the keys in the map in a sorted order.

IdentityHashMap is implemented using hash table and is very fast - O(1) for both get and insert. But unlike other Map classes which use the equals method when comparing keys, it just uses object reference equality. For 2 keys key1 and key2, it will compare using key1 == key2 rather than key1.equals(key2). The JDK is clear that this is not be to used as general purpose Map where you need lookups based on values. The specification of the containsKey(object key) in the Map interface states that it returns true if key.equals(k). This is not true for the IdentityHashMap. IdentityHashMap is rarely used. But you might use it if you are doing something that needs keeping track of object references.

In summary,

HashMap is the fastest map with O(1) search and insertion times.
LinkedHashMap is a little slower for inserts, but maintains the insertion order.
TreeMap is the slowest map, but lets you iterate over the keys in a sorted order.

Sunday, April 18, 2010

The Java Memory Model

The Java memory model describes the rules that define how variables written to memory are seen, when such variables are written and read by multiple threads.

When a thread reads a variable, it is not necessarily getting the latest value from memory. The processor might return a cached value. Additionally, even though the programmer authored code where a variable is first written and later read, the compiler might reorder the statements as long as it does not change the program semantics. It is quite common for processors and compilers to do this for performance optimization. As a result, a thread might not see the values it expects to see. This can result in hard to fix bugs in concurrent programs.

The Java programming language provides synchronized, volatile, final to help write safe multithreaded code. However earlier versions of Java had several issues because the memory model was underspecified. JSR 133 fixed some of the flaws in the earlier memory model.

Most programmers are familiar that entering a synchronized block means obtaining a lock on a monitor that ensures that no other thread can enter the synchronized block. Less familiar but equally important are the facts that
(1) accquring a lock and entering a synchronized block forces the thread to refresh data from memory.
(2) On exiting the synchronized block, data written is flushed to memory.

This ensures that values written by a thread in a synchronized block are visible to other threads in synchronized blocks.

Ever heard of "happens before" in the context of Java ? JSR 133 introduced the term "happens before" and provided some guarantees about the ordering of actions within a program.These guarantees are

(1) Every action in a thread happens before every other action that comes after it in the thread.
(2) An unlock on a monitor happens before a subsequent lock on the same monitor
(3) A volatile write on a variable happens before a subsequent volatile read on the same variable
(4) A call to Thread.start() happens before any other statement in that thread
(5) All actions in thread happen before any other thread returns from a join() on that thread

Where action is defined in section 17.4.2 of the Java language specification as statements that can be detected or influenced by other threads. Normal read/write, volatile read/write, lock/unlock are some actions.

Rules 1 ,4 and 5 guarantee that within a single thread all actions will execute in the order in which they appear in the authored program. Rules 2 and 4 guarantee that between multiple threads working on shared data, the relative ordering of synchronized blocks and the order of read/writes on volatile variables is preserved.

Rules 2 and 4 makes volatile very similar synchronized block. Prior to JSR 133, volatile still meant that a write to volatile variable is written directly to memory and a read is read from memory. But a compiler could reorder volatile read/writes with non volatle read/writes causing incorrect results. Not possible after JSR 133.

One additional notable point. This is related to final members that are initalized in the constructor of a class. As long as the constructor completes execution properly, the final members are visible to other threads without synchronization. If you however share the reference to the object from within the constructor, then all bets are off.

Saturday, March 27, 2010

REST in a nutshell

REST or Representational State Transfer is the architectural style of the world wide web. REST was defined by Roy Fielding in his PhD thesis sometime around 2000. But it has come to the foreground only recently with the realization and acceptance of REST as the preferred architecture for providing web services. In recent years, REST has replaced SOAP/WSDL as the preferred way to build web services.

Representational State Transfer is a term that sounds very academic. It is best explained using the web where it is put to use every time you use the web. A typical web interaction involves a user in front of a web browser ( the client ) making a request to say get some resource from a server by typing in a URL. The representation of the resource is a document. The location of the document is described by the URL. Getting the resource from the server and displaying in the browser causes the state transition in the browser.

The protocol used in the web browser-server communication is HTTP described in RFC2616.

When you type the URL http://www.xyz.com/path/mydoc.html, the browser requests the document from the host by sending the GET command.

Everyone has at some point filled out a form on a web page to may be open an account or to buy something online. When you click the submit button, the browser sends the data to the server using the http POST command.

Other common http commands are PUT which is used to update a document on the server and DELETE which is used to delete a document from the server.

This sort of interaction does not necessarily have to take place only between a browser and a server. It could very well be two servers or applications exchanging documents or data. The term web services is generally used to describe this kind of interaction. The two applications could have been written in different programming languages. But that does not matter since they are talking HTTP. Ironically, web services were popularized by SOAP/WSDL, the technology that REST is displacing.

SOAP based web services use XML to describe the operation or method to be invoked on the server, the data that is passed as input and the data that is output. In SOAP, describing the interface, operation and data has to done for each and every application. For example, for an application that manages users, you define methods like getUser, createUser, deleteUser and so on. For an application that manages accounts, you might define methods like getAccount, createAccount, unaccountably and etc.

Contrast this with REST , where the operations stay the same no matter what the application is - GET, PUT, POST, DELETE. The resources are identified by URLs such as http://www.xyz.com/user/userid or
http//www.xyx.com/account/accountid. REST is thus a lot simpler. Data is transferred as the body of the HTTP message. Popular formats are XML and JSON.

An example of a GET request is

GET /account/A12345 HTTP/1.1
Host: www.xyz.com/

and the server response could be

HTTP/1.1 200 OK
Content-Type: text/json; charset=utf-8
Content-Length: length

{"account":
{"id":"A12345",
   "type":"checking",
    "balance":"150"
   }
}

Similarly to create a new account, the request would be

POST /account HTTP/1.1
Host: http://www.xyz.com/
Content-Length: length
Content-Type: text/json
{"account":
   {"id":"A12346",
     "type":"checking",
     "balance":"200"
    }
}

An important characteristic of RESTful applications is that they are stateless. Each request has all the information necessary for the server to respond to the request. This is a critical requirement for such web services applications to be highly scalable. By being stateless, each subsequent request can be serviced by any other peer server, which means you can use a cluster of servers to service requests and you can have high availability and failover.

In summary, REST is a simple and powerful architectural style for building web services. REST requires identifying the data or resources in your application using URLs. The HTTP commands such as GET,PUT,POST,DELETE are the verbs or operations exposed by the application. The client and server communicate using the HTTP protocol with the data carried in the body of the message.

Thursday, March 11, 2010

The wait/notify mechanism in JAVA

Consider the problem where one thread produces some data and other threads consume the data. This is the producer consumer pattern. If the producer is not producing data fast enough, the consumers might have to wait. The wrong way to handle this is for consumer threads to go to sleep and then check for available data at periodic intervals. This eats up cpu cycles. Fortunately JAVA provides a more elegant way.

The wait/notify mechanism allows one thread to wait for a notification from another thread. The consumer thread checks a variable that can indicate if data is available or not and if not , then call the wait() method. Calling the wait method puts the thread to sleep until it is notified. Whenever the producer thread produces data and updates the variable, it calls notify() or notifyAll() to notify all the waiting threads.

The class WorkQueue in Listing 1 shows how this works. The addWork method is called by producer threads to queue up work that needs to be done. The getWork method is called by consumer threads that will do the work. When there is no work, the consumer threads need to wait.

public class WorkQueue {
     private final ArrayDeque workq = new ArrayDeque();
    public void addWork(Object work) {
        synchronized(workq) {
             workq.addLast(work) ;
            workq.notifyAll() ;
            System.out.println("Thread producer added work notiying waiter") ;
        }
    }
    public Object getWork() {
        Object ret = null ;
    synchronized(workq) {
            while(((ret = workq.pollFirst()) == null) ){
                 try {
                    System.out.println("No Work Thread consumer going to block") ;
                    workq.wait() ;
                    System.out.println("Thread consumer woken") ;
                } catch(Exception e) {
                    System.out.println(e) ;
                }
            }

        }
      return ret ;
    }
Listing 1

wait(), notify() and notifyAll() are methods of java.lang.Object. The thread needs to accquire a lock on an object before calling any of these methods. The getWork method gets a lock on the variable workq. The condition in the while loop checks if there is a work item. If there is an item, the condition is false, we break out of the loop, exit the synchronized block, there by releasing the lock on workq and return the work item. If workq.pollFirst() returns null, there is no work queued, we enter the loop and call the wait method. Calling the wait method causes the thread to give up the lock and go to sleep.

When a producer thread calls the addWork method, the statement synchronized(workq) ensures that it first accquires a lock. If another thread has a lock on workq, this thread will wait until it gets the lock. It then adds the work item to the workq. Lastly it calls notifyAll() on workq to notify all waiting threads that work is available.

When the consumer thread receives the notification, it wakes up. However , to return from the wait() method, it needs to reaccquire the lock first. On accquiring the lock, it returns from wait() and continues to execute the loop. The condition workq.pollFirst() == null is checked again. If false, the thread got an item and it exits the loop and the method. If not, it calls wait again and sleeps till the next notification.

The code in listing 2 exercises the class in listing 1 with 2 threads.
public static void main(String[] args) {
        final WorkQueue wq = new WorkQueue() ;
        Runnable producer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    wq.addWork(Integer.toString(i)) ;
                    System.out.println("Thread producer created work :" + i) ;
                    try {
                        Thread.sleep(5000) ;
                    } catch(Exception e) {
                        System.out.println(e) ;
                    }
                }

            }

        } ;

        Runnable consumer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    String work = (String)wq.getWork() ;
                    System.out.println("Thread consumer Got work:" + work) ;
                }
            }
        } ;

        Thread tconsumer = new Thread(consumer,"tconsumer") ;
        tconsumer.start() ;
        Thread tproducer = new Thread(producer,"tproducer") ;
        tproducer.start() ;

    }
Listing 2

Running the code should produce output shown below.
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :1
Thread consumer woken
Thread consumer Got work:1
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :2
Thread consumer woken
Thread consumer Got work:2
.
.
.
.......

Note that the wait must always be called within a loop that checks the condition that the thread is waiting on. Testing the condition before the wait() ensures that the thread waits() only when the condition is true. Testing the condition after the wait(), that is after the thread is woken up, ensures that the thread continues to wait, if the condition is still true.

In summary, the wait/notify mechanism is a powerful mechanism for threads to communicate with each other. However, use with care and remember to call the wait() within a while loop.

Thursday, February 25, 2010

The dreaded double check pattern in java

The problem with double check locking in java is well documented. Yet even a seasoned programmer can get overzealous trying to optimize synchronization of code that creates singletons and fall prey to the trap. Consider the code

public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      s = new Sample() ;
    }
  return s ;
  }
}
Listing 1

This code is not thread safe. If 2 threads t1 and t2 enter the getSample() method at the same time, they are likely to get different instances of sample. This can be fixed easily by adding the synchronized keyword to the getSample() method.

public class Sample {
  private static Sample s = null ;
  public static synchronized Sample getSample() {
    if (s == null) {
      s = new Sample() ;
    }
    return s ;
  }
}
Listing 2

Now the getSample method works correctly. Before entering the getSample method, thread t1 accquires a lock. Any other thread t2 that needs to enter the method will block until t1 exits the method and releases the lock. Code works. Life is good. This is where the smart programmer if not careful, can outsmart himself. He will notice that in reality only the first call to getSample, which creates the instance, needs to be synchronized and subsequent calls that merely return s are paying an unnecessary penalty. He decides to optimize the code to

public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      synchronized(Sample.class) {
        s = new Sample() ;
      }
    }
    return s ;
  }
}
Listing 3

Our java guru quickly realizes that this code has the same problem that listing 1 has. So he fine tunes it further.

public class Sample {
  private static Sample s = null ;
  public static Sample getSample() {
    if (s == null) {
      synchronized(Sample.class) {
        if (s == null) {
          s = new Sample() ;
        }
      }
    }
    return s ;
  }
}
Listing 4

By adding an additional check withing the synchronized block, he has ensured that only one thread will ever create an instance of the sample. This is the double check pattern. Our guru's friend, a java expert, buddy reviews the code. Code is checked in and product is shipped. Life is good right ?

Wrong !! Let us say thread t1 enters getSample. s is null. It gets a lock. Within the synchronized block, it checks that s is still null and then executes the constructor for Sample. Before the execution of the constructor completes t1 is swapped out and t2 gets control. Since the constructor did not complete, s is partially initialized. It is not null, but has some corrupt or incomplete value. When t2 enters getSample, it sees that s is not null and returns a corrupt value.

In summary, the double check pattern does not work. The options are to synchronize at a method level as in s listing 2 or to forego lazy initialization. Another option that works is to use a nested holder class that delays initialization until the get method is called.

public class Sample {
  private static class SampleHolder {
    public static Sample INSTANCE = new Sample() ;
  }
  public static Sample getSample()  {
    return SampleHolder.INSTANCE ;
  }
}

Listing 5

Sunday, February 7, 2010

Service Oriented Architecture

Service Oriented Architecture or SOA was a hot new buzz word 3 - 4 years ago. While still around, it has been pushed to the background by other hot paradigms like Software as a Service (Saas) and cloud computing. You don't see vendors highlighting the SOA qualities of the products that much. Is SOA dead ?

In this blog I will briefly describe the principles of SOA and where I think it stands today.

In SOA, you build complex software services by coupling together other coarse grained software services, each of which provides a distinct service. The coupling between services is loose coupling. The idea is that you should be able to replace a service or add a new one without lot of coding or rework.

The participating services do not have to be written in the same programming language. Some could have been developed in java, others in .net, some others in c++ and so on.

The interaction between services may be based on asynchronous messaging. Instead of using APIs, services use messages to exchange data between each other. This is in contrast to strongly typed APIs.

Each service provides the abstraction for 1 particular task such as computing the credit score for an application. Each service can be optimized to do the one task it does really well.

Bigger applications are composed by assembling other services that provide a specific function.

An example of a reusable service is a service for creating, updating, viewing customer data. Every other application in the enterprise can use this service for customer information. The service may be available via an end point such as http URL or a messaging destination. The customer data that the service accepts and returns may be described using XML.

Some of the technologies the enable SOA are:

1. Web Services
2. Service Component architecture (SCA)
3. Service Data Objects (SDO)
4. Enterprise messaging
5. XML
6. REST

Web Services and SCA are used to build services. The data or meta data can be exchanged between services as straight XML or using SDOs. The messaging transport can be SOAP over HTTP in the case of web services or HTTP in the case of REST. One can use asynchronous messaging protocols like JMS when persistent messages are required.

We know things like abstraction , encapsulation, res use from the early days of object oriented design and design patterns. So SOA is not telling us any thing radically new. However it sort of reminds us once more to apply these design principles in large enterprise application environments.

SOA is not without problems. Weakly typed message based protocols as encouraged by SOA are typically hard to debug. Describing metadata in XML and using it in runtime work well for simple interfaces, but is more problematic for complex interfaces. Parsing XML is expensive and adds a performance overhead. Propagating contexts from one service to another can be an issue when the services are controlled by different teams. Due to the complexity, significant testing needs to be done before rolling out such services to users.

In conclusion, SOA reminds us of some useful design principles. These same principles are applicable even if you are building a cloud and offering your software as service. However you want to pay close attention to the know pitfalls when they apply to your environment.

In the future blogs, I will provide an introduction to some of the technologies that enable you to build SOA services.