Wednesday, December 29, 2010

Performance tip #1 - Java Strings

The java.lang.String class is a very convenient class for handling text or character data. It is used extensively in java programs. Sometimes excessively to the point that it degrades performance.

The String class is immutable. It cannot be changed. It has concatenation and substring methods that give the appearance of changing the string. In realilty, the jvm copies data and creates new strings.

// code 1 Inappropriate

String item ;
String[] listofitems ;
for (int i = 0 ; i < numitems ; i++)
item = item + listofitems[i] ;

// code 2 Much better
StringBuffer item ;
for (int i = 0 ; i < numitems ; i++)
item.append(listofitems[i]) ;

java.lang.StringBuffer is the mutable counterpart of String, that should be used for manipulating Strings. StringBuffer is threadsafe. java.lang.StringBuilder provides methods similar to StringBuffer, but the methods are not synchronized. StringBuilder can used when thread safety is not an issue.

In 1 ,a new String is created for every iteration of the loop and the contents of the previous iteration are copied into it. For n items, this is going to perform O(n square). Code 2 scales linearly O(n). For large values of n, the difference is significant.

There are a number of methods such as substring, trim, toLowerCase, toUpperCase that have this problem.

However there is one instance where concatenation is better. If the String can be resolved at compile time as in

// code 4
String greeting = "hello" + " new world" ;

At compile time , this gets compiled to
String greeting = "hello new world" ;

The alternative is
// code 5
String greeting = (new StringBuffer()).append("hello").append(" new world").toString() ;

Code 4 is more efficient as it avoid the temporary StringBuffer as well as the additional method calls.

Processing character arrays as is done in the C programming language is generally the fastest. But almost all public java libraries use String in their interfaces. So most of the time the programmer has no choice. So proper care needs to be take to ensure acceptable performance.

Lastly, converting Strings to numbers whether it is int, long, double or float can be expensive. Many programmers make the mistake of starting with String and then converting to a number type when they need to do a numerical computation. As a general rule, Strings should be used when processing text. For example, if you know that data being read from a file is of type int or float, read it into the correct type and avoid unnecessary conversion.

2 comments:

  1. I have seen String overly used and most of garbage created by temporary Strings. Its worth using StringBuffer and StringBuilder whenever possible also its good to know Why String is immutable in Java

    Thanks
    Javin
    FIX Protocol tutorial

    ReplyDelete
  2. Thread safety with strings is rarely an issue, so you should be talking about StringBuilder an only mention StringBuffer as for special cases.
    Also the simple string concatenations are converted to StringBuilder appends by every reasonable JVM compiler. So it makes no difference to write it using StringBuilder at all if you did your homework and tested it. I would even guess, that appends using StringBuffer, which you used, might work much slower because of the unnecessary synchronization (which is actually one of the main performance bottlenecks in Java apps)!
    If you write a blog post and even want to post it somewhere like DZone, do some real work. This one sounds only as a junior's learning log. Performance optimization is definitely something you should not be writing about yet.

    ReplyDelete