Code & Cluster: java performance

Showing posts with label java performance. Show all posts

Wednesday, December 19, 2012

JAVA Garbage Collection

Garbage (GC) collection is the process by which the java virtual machine frees up memory by releasing the memory taken up by objects that are no longer referenced by any other objects. Garbage collection is automatic. For simple applications, the developer does even need to be aware of garbage collection. But for applications with large memory footprint or are long running or have low latency requirements, some understanding is necessary to ensure that garbage collection does not interfere with the application. A common interference of garbage collection is that the application seems to stop responding or the time to respond goes up randomly. The articles lists a few important points every Java developer needs to know about garbage collection.

1.0 Generational GC

Since JDK 5 , the garbage collectors are what are called generational collectors. The heap is divided into regions based on the age of the objects. The young generation has objects that are short lived. The tenured generation has objects that are long lived. All objects are first created in the young region and after a while if they are alive, they are moved to the tenured generation. Garbage collection of the young region happens frequently and is generally fast. GC for the tenured region happens less frequently. Since most objects are short lived, this makes the GC more efficient.

2.0 Types of collectors

Serial Collector : Garbage from both young and tenured regions is done serially and while this happens your application is paused. This is the default collector on single cpu machines and for small heaps sizes ( less that 2G) . This is fine if your application does not care about pauses.

Parallel Collector: This is the default collector on server class machines ( multiple CPUs and greater than 2G heap size). Multiple threads/cpus are used to do garbage collection in parallel for the young region. This makes collection faster. But the application is still paused when GC happens. For the tenured region, the GC is serial as in a serial collector.

Parallel Compacting Collector: GC for the young region is the same as parallel collector and uses multiple threads. However GC for tenured region happens in parallel using multiple CPUs. Application is paused when GC happens.

Concurrent Mark Sweep Collector (CMS): For young region, it is same as in parallel collectors. But for tenured region, most of the time, GC runs concurrently with the application. The application pauses during GC are expected to be much shorter than the other collectors. This is an ideal choice for applications that cannot tolerate long pauses.

3.0 Understanding GC in your application

Before you try to tune your applications GC, it is important to understand when GC is happening, how much time it takes and how much memory it is reclaiming. The JVM provides the following options to log GC activity.

The -XX:+PrintGCDetails prints GC details described below. The -XX:+PrintGCTimeStamps prints the time from the start of the JVM to when each GC happened. The -Xloggc:gcfilename.log writes the log to gcfilename.log.

In the gc log, you will see a number of lines like

11.561: [GC [PSYoungGen: 868524K->294158K(1198848K)] 1303221K->728855K(4694144K), 0.3640750 secs] [Times: user=1.44 sys=0.02, real=0.37 secs]

This indicates that a GC of the young region occurred at time 11.561 secs from start. The young region was reduced from 868524k to 294158k (66%). The number (1198848K) is the memory allocated to the young region. The total heap was reduced from 1303221K to 728855K or 44%. The number (4694144K) is the total heap. This GC took .37 secs.

You will see a few lines like

3602.170: [Full GC (System) [PSYoungGen: 16250K->0K(1662080K)] [PSOldGen: 1594630K->1578665K(3495296K)] 1610881K->1578665K(5157376K) [PSPermGen: 22314K->22314K(35904K)], 3.4836190 secs] [Times: user=3.45 sys=0.03, real=3.48 secs]

This indicates that a full GC occurred at 3602.17 secs from the start. The young region was reduced from 16250K to 0K. The old or tenured region was reduced from 1594630K to 1578665K. The total heap was reduced from 1610881K to 1578665K. The GC took 3.48 sec.

The GCViewer is free tool to view GC logs graphically.

GC log viewed in GCViewer

The very small black lines at the bottom indicate the small GCs. The tall black lines at the hourly mark are the Full GCs. The blue peaks are lines indicating how the used heap goes up and goes down after a GC. The ruby red line just below the blue spikes shows the growth of the tenured region. You can see that the tenured region drops after a full GC. Full GCs take a lot of time and you want to reduce the frequency with which they occur.

4.0 Tuning options

The JVM offers a few knobs that one can turn to tune the GC in a way most suitable to your machine and your application.

-Xms -Xmx options are used to set the initial and maximum size of the heap. Maximum heap size should be less that physical memory on the machine to avoid paging and one should also leave aside memory for the operating system and other applications running on the same machine. While bigger heap and more memory are good because the GC has to collect less often, when it does have to collect, it has to do more work and the GC pauses could be longer.

–XX:+UseSerialGC
–XX:+UseParallelGC
–XX:+UseParallelOldGC
–XX:+UseConcMarkSweepGC

These options are used to select the GC. SerialGC and ParallelGC are selected by default depending on machine type as described earlier. Applications that have low latency requirements and cannot tolerate long GC pauses should consider switching to the Concurrent Mark Sweep GC.

-XX:NewSize=n is used to set the default initial size of the young generation. Most applications have many short lived objects and few long lived objects. The newsize should be large enough that short lived objects fit into the young generation and are garbage collected in the small GCs. If the young generation is too small, short lived object get moved to the tenured region which leads to longer Full GCs.

-XX:MaxPauseTimeMillis is a hint to the GC as to the desired maximum pause time. This is just a hint and may or may not be honoured.

5.0 References

There are many other tuning options and the following documents from Oracle are good references on tuning options as well as garbage collection in general:

1. http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
2. http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf

Wednesday, December 29, 2010

Performance tip #1 - Java Strings

The java.lang.String class is a very convenient class for handling text or character data. It is used extensively in java programs. Sometimes excessively to the point that it degrades performance.

The String class is immutable. It cannot be changed. It has concatenation and substring methods that give the appearance of changing the string. In realilty, the jvm copies data and creates new strings.

// code 1 Inappropriate

String item ;
String[] listofitems ;
for (int i = 0 ; i < numitems ; i++)
item = item + listofitems[i] ;

// code 2 Much better
StringBuffer item ;
for (int i = 0 ; i < numitems ; i++)
item.append(listofitems[i]) ;

java.lang.StringBuffer is the mutable counterpart of String, that should be used for manipulating Strings. StringBuffer is threadsafe. java.lang.StringBuilder provides methods similar to StringBuffer, but the methods are not synchronized. StringBuilder can used when thread safety is not an issue.

In 1 ,a new String is created for every iteration of the loop and the contents of the previous iteration are copied into it. For n items, this is going to perform O(n square). Code 2 scales linearly O(n). For large values of n, the difference is significant.

There are a number of methods such as substring, trim, toLowerCase, toUpperCase that have this problem.

However there is one instance where concatenation is better. If the String can be resolved at compile time as in

// code 4
String greeting = "hello" + " new world" ;

At compile time , this gets compiled to
String greeting = "hello new world" ;

The alternative is
// code 5
String greeting = (new StringBuffer()).append("hello").append(" new world").toString() ;

Code 4 is more efficient as it avoid the temporary StringBuffer as well as the additional method calls.

Processing character arrays as is done in the C programming language is generally the fastest. But almost all public java libraries use String in their interfaces. So most of the time the programmer has no choice. So proper care needs to be take to ensure acceptable performance.

Lastly, converting Strings to numbers whether it is int, long, double or float can be expensive. Many programmers make the mistake of starting with String and then converting to a number type when they need to do a numerical computation. As a general rule, Strings should be used when processing text. For example, if you know that data being read from a file is of type int or float, read it into the correct type and avoid unnecessary conversion.

Sunday, June 27, 2010

What Java Map class should I use ?

The interface java.util.Map is an interface for mapping keys to values. Implementations of Map provide in memory maps of keys to values. The interface provides methods to insert (key,value) pairs into a map and methods to look up values based on the key.

The JDK provides a few implementations of the map interface such as HashMap, LinkedHashMap, TreeMap and the IdentityMap. Which implementation should you use ? The answers depends on what your requirement is.

The HashMap is implemented using a hash table. Hash tables are implemented using a hash function to calculate the index into an array where the value is placed. Hence HashMap provides the fastest search and insertion times O(1). HashMaps however take up more memory. Also they do not guarantee any ordering for the keys. If you iterate over the keys, you will find them in no particular order.

The LinkedHashMap is implemented using both hash table and linked list. In addition to using a hash table for fast retrieval, it maintains a doubly linked list through the items. The linked list helps preserve the order in which keys were inserted into the map. Search or get operation is still as fast as an HashMap, but inserts or deletes are a little slower because of the additional work involved in maintaining the linked list. LinkedHashMap can be used if you are going to need to iterate over the keys in say the order in which the keys were inserted into the Map. LinkedHashMap also has a constructor parameter accessorder, which when set to true, changes the ordering from insertion order to access order - from least recently accessed to most recently accessed. This is particularly useful if you are using the Map to build a cache.

TreeMap is implemented using the red black tree which is a balanced tree. When items are inserted into a tree, they are placed at a position based on their natural ordering or based on a provided comparator. The ordering is thus a sorted order. Search and insertion for TreeMap are of the order O(log(n)). This is slower than HashMap and LinkedHashMap. You might need to use a TreeMap if you are going to need to iterate over the keys in the map in a sorted order.

IdentityHashMap is implemented using hash table and is very fast - O(1) for both get and insert. But unlike other Map classes which use the equals method when comparing keys, it just uses object reference equality. For 2 keys key1 and key2, it will compare using key1 == key2 rather than key1.equals(key2). The JDK is clear that this is not be to used as general purpose Map where you need lookups based on values. The specification of the containsKey(object key) in the Map interface states that it returns true if key.equals(k). This is not true for the IdentityHashMap. IdentityHashMap is rarely used. But you might use it if you are doing something that needs keeping track of object references.

In summary,

HashMap is the fastest map with O(1) search and insertion times.
LinkedHashMap is a little slower for inserts, but maintains the insertion order.
TreeMap is the slowest map, but lets you iterate over the keys in a sorted order.