Wednesday, December 29, 2010

Performance tip #1 - Java Strings

The java.lang.String class is a very convenient class for handling text or character data. It is used extensively in java programs. Sometimes excessively to the point that it degrades performance.

The String class is immutable. It cannot be changed. It has concatenation and substring methods that give the appearance of changing the string. In realilty, the jvm copies data and creates new strings.

// code 1 Inappropriate

String item ;
String[] listofitems ;
for (int i = 0 ; i < numitems ; i++)
item = item + listofitems[i] ;

// code 2 Much better
StringBuffer item ;
for (int i = 0 ; i < numitems ; i++)
item.append(listofitems[i]) ;

java.lang.StringBuffer is the mutable counterpart of String, that should be used for manipulating Strings. StringBuffer is threadsafe. java.lang.StringBuilder provides methods similar to StringBuffer, but the methods are not synchronized. StringBuilder can used when thread safety is not an issue.

In 1 ,a new String is created for every iteration of the loop and the contents of the previous iteration are copied into it. For n items, this is going to perform O(n square). Code 2 scales linearly O(n). For large values of n, the difference is significant.

There are a number of methods such as substring, trim, toLowerCase, toUpperCase that have this problem.

However there is one instance where concatenation is better. If the String can be resolved at compile time as in

// code 4
String greeting = "hello" + " new world" ;

At compile time , this gets compiled to
String greeting = "hello new world" ;

The alternative is
// code 5
String greeting = (new StringBuffer()).append("hello").append(" new world").toString() ;

Code 4 is more efficient as it avoid the temporary StringBuffer as well as the additional method calls.

Processing character arrays as is done in the C programming language is generally the fastest. But almost all public java libraries use String in their interfaces. So most of the time the programmer has no choice. So proper care needs to be take to ensure acceptable performance.

Lastly, converting Strings to numbers whether it is int, long, double or float can be expensive. Many programmers make the mistake of starting with String and then converting to a number type when they need to do a numerical computation. As a general rule, Strings should be used when processing text. For example, if you know that data being read from a file is of type int or float, read it into the correct type and avoid unnecessary conversion.