Tuesday, August 31, 2010

Security concerns with Cloud Computing

A few years ago, there was a news story about an online tax service mixing up peoples tax returns. That scared the hell out of me. While I continued to e-file my returns, I use old fashioned desktop tax preparation software to prepare my return and then e-file it. I am not yet comfortable with someone else hosting all the personal information in a tax return.

Concerns regarding security are a primary barrier in businesses adopting cloud based services for critical stuff. As a cloud consumer, below are some issues to think about and ask the provider. As a cloud service provider, these are issue to think about and address.

1. Physical security

Are the premises that host the servers, databases etc physically secure. The buildings that host the systems needs state of art security technology to restrict entry and monitor who goes in and who goes out, as well as record who is doing what. Location is important as well. Who would want their business data hosted in a high crime area or in a country with a track record of wars.

2. Isolation

A cloud service provider is hosting data from multiple customers. That is something users should never have to care about. Any mixup, like the one described in the first paragraph is completely unacceptable.

3. Authentication and Access control

When sensitive data is hosted outside your enterprise, are the people who manage or access the data properly authenticated ? Is the access limited to those that absolutely need it ? Is the access control policy available for review and reviewed periodically.

4. Data security

Is sensitive data encrypted ? Operations staff such as system administrators manage files and databases. They need to move , backup, copy stuff etc but they do not necessarily need to be able to read credit card numbers from a customer table.

5. Audit trail

As is the case in any business, things can go wrong. There will be the bad apple who happens to come across some sensitive information and decides to misuse it. To be able to investigate such issues, a detailed audit trail is required. Who entered or left the premises ? Who logged on to the system ? what actions did he perform ?

As the saying goes "forewarned is forearmed". If you know the security practices of your provider, you can weigh risk versus benefit and decide what is appropriate to be hosted by the provider.

Sunday, August 15, 2010

The promise of Spring

The Spring framework  is a popular alternative to J2EE for enterprise application development.  To download Spring or read about it, visit http://www.springsource.org/

The top 5 documented benefits of Spring are:

1. Spring saves the developer the time and effort required to write boiler plate code like JNDI lookups, creating JDBC connections, API specific exception handling etc. Every one has their horror story on not finding a JNDI reference and a framework that abstracts such plumbing code away from the application developer is certainly useful.

2. Spring is a lightweight framework. A lightweight framework should be small in size, conceptually simple and easy to use.

3. Spring simplifies database access. Most applications need to store and retrieve data from a relational database. There are many alternatives for database access such as JDBC, JPA, Hibernate, with varying levels of complexity. Spring provides a simpler abstraction over these APIs. Similarly it simplifies web development with its MVC framework.

4. Spring can used in various environments. It can be used in standalone J2SE applications. It can be used with a webserver such as tomcat. It can be used with a full blown application server like JBOSS or Websphere.

5. Spring lets you develop applications by assembling loosely coupled components. Loose coupling is a better design practice because it allows you to swap out moving parts without having to do major changes to the application.  Spring achieves this by what it calls "Inversion of Control" and "Dependency Injection".  Actually, it is mostly dependency injection.

Inversion of Control and Dependency Injection are a topic for a separate blog. But very briefly, in Spring, when you author a "bean", you specify the dependencies in an XML file and the Spring container makes them available. You don'nt have to explicitly create each dependency.

In subsequent blogs, using some code example, let us see if Spring delivers on its promise. I will dig deeper with examples and point out where Spring really simplifies development and where it just adds another layer of complexity. So stay tuned.

Sunday, July 25, 2010

Cloud computing: I can see the "cloud" clearly now the rain ....

There are many blogs and articles on the internet on cloud computing and perhaps there are too many. Yet the question keeps popping up. What is cloud computing ? Let us try to clear the fog around this topic.

Almost everyone uses online services like Gmail, Google docs,  yahoo mail etc. To use these services, there is no investment to the consumer. There are no software licensing costs, there is no time spent in installing, managing, upgrading and software - not even client software. The infrastructure is provided and managed behind the scenes by providers like Google. Behind the scenes, Google adds the necessary hardware, upgrades the software when necessary, to ensure that you and I as end users get the same quality of service when we use Gmail. This is an example of  what is called software as a service and is a form of cloud computing. Some companies do the same for enterprise software, the best example of which is SalesForce.com. Enterprise software is extremely hard to install and manage. With cloud computing, you can pay a fee and start using the software and let the provider take care of  installation, management, maintenance and customization.

Companies like Amazon offer storage and servers that an IT department can use on demand. If I am running an IT department and all of sudden the enterprise needs several gigabytes of disk space or additional servers to run some additional jobs, I want to be able to rent the disk space or servers and use it temporarily - without an 18 wheeler bringing in boxes of hardware that I need to install, configure and manage. This is an example of the cloud offering computing services just the way utility companies offer electricity or water.

There are many more variations of this concept of providing some computing infrastructure whether hardware or software, over the internet for use by a customer.

Why is this stuff called the cloud ? The services we are talking about are generally offered over the internet. In computing literature, the cloud drawing is a popular way to show the internet as a computer network. The idea is that as a user you do not care about what is in the cloud, but you can reliably get some computing service from it. Cloud is an abstraction for the underlying computing technology which makes it easy to offer services in a highly dynamic and scalable manner. What distinguishes cloud computing from other forms of distributed computing is that one can view the cloud as vast supply of computing power, some of which you can buy for a fee.

If you are providing a cloud based service, a key requirement is that the architecture needs to dynamic. Depending on user demand, the provider should be able to scale seamlessly by adding hardware or software as necessary. This could be be anything from adding more storage or starting more server processes to handle user requests.

The consumer of the service should not have to do any infrastructure setup related to the service. Ideally the consumer should be able to use the service with minimal help from the provider. When you move into a new apartment or house, you call up the utility company to turn on the electricity service. After that you have uninterrupted service as long as you are paying your monthly bill. Using a cloud service should be as easy as that.

A good cloud platform is self managed and self healing. Services and usage are monitored and resources allocated optimally. Problems should be detected automatically and fixed without interruption of service.

Security for applications as well as data is critical for obvious reasons. Customers would suffer severe financial damages if their private data falls into the wrong hands.

So just putting up a web site that is hosted from your garage does not necessarily make you a cloud service provider.

The biggest advantage of cloud based services for the customer is that it provides a low cost entry point to a variety of computing services. Whether it is storage, web hosting, application services for email, documents to even things like payroll, to be able to get them with no infrastructure investment is a huge plus. Obviously, as with any out sourcing, you give up some control and become dependent on the service provider. However when you are starting out as a business, the lack of control might be tolerable when compared to the cost of setting up a data center or huge IT department. As the business grows and you have solid revenues, you can of course decide to move some function in house.

There are advantages for the service provider as well, especially for software.

If you have developed commercial software, you know how expensive  it is to support several different versions. If your software is available as a cloud service, every user is on the same version.

Since users are not installing the software on their hardware, you do not have to support different platforms.

Since you control hardware and software, making updates/upgrades is easier. They can happen behind the scenes.

The application can to be tuned to the hardware/OS platform of your choice and hence can deliver the best performance.

To conclude, it is clear that the cloud computing paradigm does provide value to both customers and providers and it is not just a rehash of old technology - cloud service providers have to solve some real technical problems to ensure the quality of service for their customers.

Sunday, June 27, 2010

What Java Map class should I use ?

The interface java.util.Map is an interface for mapping keys to values. Implementations of Map provide in memory maps of keys to values. The interface provides methods to insert (key,value) pairs into a map and methods to look up values based on the key.

The JDK provides a few implementations of the map interface such as HashMap, LinkedHashMap, TreeMap and the IdentityMap. Which implementation should you use ? The answers depends on what your requirement is. 

The HashMap is implemented using a hash table. Hash tables are implemented using a hash function to calculate the index into an array where the value is placed.  Hence HashMap provides the fastest search and insertion times O(1). HashMaps however take up more memory. Also they do not guarantee any ordering for the keys. If you iterate over the keys, you will find them in no particular order.

The LinkedHashMap is implemented using both hash table and linked list. In addition to using a hash table for fast retrieval, it maintains a doubly linked list through the items. The linked list helps preserve the order in which keys were inserted into the map. Search or get operation is still as fast as an HashMap, but inserts or deletes are a little slower because of the additional work involved in maintaining the linked list. LinkedHashMap can be used if you are going to  need to iterate over the keys in say the order in which the keys were inserted into the Map. LinkedHashMap also has a constructor parameter accessorder, which when set to true, changes the ordering from insertion order to access order - from least recently accessed to most recently accessed. This is particularly useful if you are using the Map to build a cache.

TreeMap is implemented using the red black tree which is a balanced tree. When items are inserted into a tree, they are placed at a position based on their natural ordering or based on a provided comparator. The ordering is thus a sorted order. Search and insertion for TreeMap are of the order O(log(n)). This is slower than HashMap and LinkedHashMap. You might need to use a TreeMap if you are going to need to iterate over the keys in the map in a sorted order.

IdentityHashMap is implemented using hash table and is very fast - O(1) for both get and insert. But unlike other Map classes which use the equals method when comparing keys, it just uses object reference equality. For 2 keys key1 and key2, it will compare using key1 == key2 rather than key1.equals(key2). The JDK is clear that this is not be to used as general purpose Map where you need lookups based on values. The specification of the containsKey(object key) in the Map interface states that it returns true if key.equals(k). This is not true for the IdentityHashMap. IdentityHashMap is rarely used. But you might use it if you are doing something that needs keeping track of object references.

In summary,
  • HashMap is the fastest map with O(1) search and insertion times.
  • LinkedHashMap is a little slower for inserts, but maintains the insertion order.
  • TreeMap is the slowest map, but lets you iterate over the keys in a sorted order.

Sunday, April 18, 2010

The Java Memory Model

The Java memory model describes the rules that define how variables written to memory are seen, when such variables are written and read by multiple threads.

When a thread reads a variable, it is not necessarily getting the latest value from memory. The processor might return a cached value. Additionally, even though the programmer authored code where a variable is first written and later read, the compiler might reorder the statements as long as it does not change the program semantics. It is quite common for processors and compilers to do this for performance optimization. As a result, a thread might not see the values it expects to see. This can result in hard to fix bugs in concurrent programs.

The Java programming language provides synchronized, volatile, final to help write safe multithreaded code. However earlier versions of Java had several issues because the memory model was underspecified. JSR 133 fixed some of the flaws in the earlier memory model.

Most programmers are familiar that entering a synchronized block means obtaining a lock on a monitor that ensures that no other thread can enter the synchronized block. Less familiar but equally important are the facts that
(1) accquring a lock and entering a synchronized block forces the thread to refresh data from memory.
(2) On exiting the synchronized block, data written is flushed to memory.

This ensures that values written by a thread in a synchronized block are visible to other threads in synchronized blocks.

Ever heard of  "happens before" in the context of Java ? JSR 133 introduced the term "happens before" and provided some guarantees about the ordering of actions within a program.These guarantees are

(1) Every action in a thread happens before every other action that comes after it in the thread.
(2) An unlock on a monitor happens before a subsequent lock on the same monitor
(3) A volatile write on a variable happens before a subsequent volatile read on the same variable
(4) A call to  Thread.start() happens before any other statement in that thread
(5) All actions in thread happen before any other thread returns from a join() on that thread

Where action is defined in section 17.4.2 of the Java language specification as statements that can be detected or influenced by other threads. Normal read/write, volatile read/write, lock/unlock are some actions.

Rules 1 ,4 and 5 guarantee that within a single thread all actions will execute in the order in which they appear in the authored program. Rules 2 and 4 guarantee that between multiple threads working on shared data, the relative ordering of synchronized blocks and the order of read/writes on volatile variables is preserved.

Rules 2 and 4 makes volatile very similar synchronized block. Prior to JSR 133, volatile still meant that a write to volatile variable is written directly to memory and a read is read from memory. But a compiler could reorder volatile read/writes with non volatle read/writes causing incorrect results. Not possible after JSR 133.

One additional notable point. This is related to final members that are initalized in the constructor of a class. As long as the constructor completes execution properly, the final members are visible to other threads without synchronization. If you however share the reference to the object from within the constructor, then all bets are off.

Saturday, March 27, 2010

REST in a nutshell

REST or Representational State Transfer is the architectural style of the world wide web. REST was defined by Roy Fielding in his PhD thesis sometime around 2000. But it has come to the foreground only recently with the realization and acceptance of REST as the preferred architecture for providing web services. In recent years, REST has replaced SOAP/WSDL as the preferred way to build web services.

Representational State Transfer is a term that sounds very academic. It is best explained using the web where it is put to use every time you use the web. A typical web interaction involves a user in front of a web browser ( the client ) making a request to say get some resource from a server by typing in a URL. The representation of the resource is a document. The location of the document is described by the URL. Getting the resource from the server and displaying in the browser causes the state transition in the browser.

The protocol used in the web browser-server communication is HTTP described in RFC2616.

When you type the URL http://www.xyz.com/path/mydoc.html, the browser requests the document from the host by sending the GET command.

Everyone has at some point filled out a form on a web page to may be open an account or to buy something online. When you click the submit button, the browser sends the data to the server using the http POST command.

Other common http commands are PUT which is used to update a document on the server and DELETE which is used to delete a document from the server.

This sort of interaction does not necessarily have to take place only between a browser and a server. It could very well be two servers or applications exchanging documents or data. The term web services is generally used to describe this kind of interaction. The two applications could have been written in different programming languages. But that does not matter since they are talking HTTP.  Ironically, web services were popularized by SOAP/WSDL, the technology that REST is displacing.

SOAP based web services use XML to describe the operation or method to be invoked on the server, the  data that is passed as input and the data that is output. In SOAP, describing the interface, operation and data has to done for each and every application. For example, for an application that manages users, you define methods like getUser, createUser, deleteUser and so on. For an application that manages accounts, you might define methods like getAccount, createAccount, unaccountably and etc.

Contrast this with REST , where the operations stay the same no matter what the application is - GET, PUT, POST, DELETE. The resources are identified by URLs such as http://www.xyz.com/user/userid or
http//www.xyx.com/account/accountid. REST is thus a lot simpler. Data is transferred as the body of the HTTP message. Popular formats are XML and JSON.

An example of a GET request is

GET /account/A12345 HTTP/1.1
Host: www.xyz.com/


and the server response could be

HTTP/1.1 200 OK
Content-Type: text/json; charset=utf-8
Content-Length: length

{"account":
  {"id":"A12345",
   "type":"checking",
    "balance":"150"
   }
}

Similarly to create a new account, the request would be

POST /account HTTP/1.1
Host: http://www.xyz.com/
Content-Length: length
Content-Type: text/json
{"account":
   {"id":"A12346",
     "type":"checking",
     "balance":"200"
    }
}

An important characteristic of RESTful applications is that they are stateless. Each request has all the information necessary for the server to respond to the request. This is a critical requirement for such web services applications to be highly scalable. By being stateless, each subsequent request can be serviced by any other peer server, which means you can use a cluster of servers to service requests and you can have high availability and failover.

In summary, REST is a simple and powerful architectural style for building web services. REST requires identifying the data or resources in your application using URLs. The HTTP commands such as GET,PUT,POST,DELETE are the verbs or operations exposed by the application. The client and server communicate using the HTTP protocol with the data carried in the body of the message.

Thursday, March 11, 2010

The wait/notify mechanism in JAVA

Consider the problem where one thread produces some data and other threads consume the data. This is the producer consumer pattern. If the producer is not producing data fast enough, the consumers might have to wait. The wrong way to handle this is for consumer threads to go to sleep and then check for available data at periodic intervals. This eats up cpu cycles. Fortunately JAVA provides a more elegant way.

The wait/notify mechanism allows one thread to wait for a notification from another thread. The consumer thread checks a variable that can indicate if data is available or not and if not , then call the wait() method. Calling the wait method puts the thread to sleep until it is notified. Whenever the producer thread produces data and updates the variable, it calls notify() or notifyAll() to notify all the waiting threads.

The class WorkQueue in Listing 1 shows how this works. The addWork method is called by producer threads to queue up work that needs to be done. The getWork method is called by consumer threads that will do the work. When there is no work, the consumer threads need to wait. 

public class WorkQueue {
     private final ArrayDeque workq  = new ArrayDeque();
    public void addWork(Object work) {
        synchronized(workq) {
             workq.addLast(work) ;
            workq.notifyAll() ;
            System.out.println("Thread producer added work notiying waiter") ;
        }
    }
    public Object getWork() {  
        Object ret = null ;
        synchronized(workq) {   
            while(((ret = workq.pollFirst()) == null) ){
                 try {
                    System.out.println("No Work Thread consumer going to block") ;
                    workq.wait() ;
                    System.out.println("Thread consumer woken") ;
                } catch(Exception e) {
                    System.out.println(e) ;
                }
            }
           
        }
      return ret ;
      }
Listing 1

wait(), notify() and notifyAll() are methods of java.lang.Object. The thread needs to accquire a lock on an object before calling any of these methods.  The getWork method gets a lock on the variable workq. The condition in the while loop checks if there is a work item. If there is an item, the condition is false, we break out of the loop, exit the synchronized block, there by releasing the lock on workq and return the work item. If workq.pollFirst() returns null, there is no work queued, we enter the loop and call the wait method. Calling the wait method causes the thread to give up the lock and go to sleep.

When a producer thread calls the addWork method, the statement synchronized(workq) ensures that it first accquires a lock. If another thread has a lock on workq, this thread will wait until it gets the lock. It then adds the work item to the workq. Lastly it calls notifyAll() on workq to notify all waiting threads that work is available.

When the consumer thread receives the notification, it wakes up.  However , to return from the wait() method, it needs to reaccquire the lock first. On accquiring the lock, it returns from wait() and continues to execute the loop. The condition workq.pollFirst() == null is checked again. If false, the thread got an item and it exits the loop and the method. If not, it calls wait again and sleeps till the next notification.

The code in listing 2 exercises the class in listing 1 with 2 threads.
public static void main(String[] args) {
        final WorkQueue wq = new WorkQueue() ;
        Runnable producer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    wq.addWork(Integer.toString(i)) ;
                    System.out.println("Thread producer created work :" + i) ;
                    try {
                        Thread.sleep(5000) ;
                    } catch(Exception e) {
                        System.out.println(e) ;
                    }
                }
               
            }

        } ;
       
        Runnable consumer = new Runnable() {
            public void run() {
                for (int i = 1 ; i <=10 ; i++) {
                    String work = (String)wq.getWork() ;
                    System.out.println("Thread consumer Got work:" + work) ;
                }
            }
        } ;
       
        Thread tconsumer = new Thread(consumer,"tconsumer") ;
        tconsumer.start() ;
        Thread tproducer = new Thread(producer,"tproducer") ;
        tproducer.start() ;
   
    }
Listing 2

Running the code should produce output shown below.
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :1
Thread consumer woken
Thread consumer Got work:1
No Work Thread consumer going to block
Thread producer added work notiying waiter
Thread producer created work :2
Thread consumer woken
Thread consumer Got work:2
.
.
.
.......

Note that the wait must always be called within a loop that checks the condition that the thread is waiting on. Testing the condition before the wait() ensures that the thread waits() only when the condition is true. Testing the condition after the wait(), that is after the thread is woken up, ensures that the thread continues to wait, if the condition is still true.

In summary, the wait/notify mechanism is a powerful mechanism for threads to communicate with each other. However, use with care and remember to call the wait() within a while loop.