Java Performance Optimizing
Java Performance Optimizing
Abstract: We give some advice on improving the execution of Java programs by reducing their time and space consumption. There are no magic tricks, just advice on common problems to avoid.
Instead, compute the loop bound only once and bind it to a local variable, like this:
for (int i=0, stop=size()*2; i<stop; i++) { ... }
Instead, compute the subexpression once, bind the result to a variable, and reuse it:
Bird bird = birds.elementAt(i); if (bird.isGrower()) ... if (bird.isPullet()) ...
Every array access requires an index check, so it is worth-while to reduce the number of array accesses. Moreover, usually the Java compiler cannot automatically optimize indexing into multidimensional arrays. For instance, every iteration of the inner (j) loop below recomputes the indexing rowsum[i] as well as the indexing arr[i] into the rst dimension of arr:
double[] rowsum = new double[n]; for (int i=0; i<n; i++) for (int j=0; j<m; j++) rowsum[i] += arr[i][j];
Instead, compute these indexings only once for each iteration of the outer loop:
double[] rowsum = new double[n]; for (int i=0; i<n; i++) { double[] arri = arr[i]; double sum = 0.0; for (int j=0; j<m; j++) sum += arri[j]; rowsum[i] = sum; }
Note that the initialization arri = arr[i] does not copy row i of the array; it simply assigns an array reference (four bytes) to arri. Declare constant elds as final static so that the compiler can inline them and precompute constant expressions. Declare constant variables as final so that the compiler can inline them and precompute constant expressions. Replace a long if-else-if chain by a switch if possible; this is much faster. If a long if-else-if chain cannot be replaced by a switch (because it tests a String, for instance), and if it is executed many times, it is often worthwhile to replace it by a final static HashMap or similar. Nothing (except obscurity) is achieved by using clever C idioms such as performing the entire computation of a while-loop in the loop condition:
int year = 0; double sum = 200.0; double[] balance = new double[100]; while ((balance[year++] = sum *= 1.05) < 1000.0);
Instead, use a StringBuilder object and its append method. This takes time linear in the number of iterations, and may be several orders of magnitude faster:
StringBuilder sbuf = new StringBuilder(); for (int i=0; i<n; i++) { sbuf.append("#").append(i); } String s = sbuf.toString();
On the other hand, an expression containing a sequence of string concatenations automatically gets compiled to use StringBuilder.append(...), so this is OK:
String s = "(" + x + ", " + y + ")";
Do not process strings by repeatedly searching or modifying a String or StringBuilder. Repeated use of methods substring and index from String may be legitimate but should be looked upon with suspicion.
Instead, an initialized array variable or similar table should be declared and allocated once and for all as a final static eld in the enclosing class:
private final static int[] monthlengths = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }; public static int monthdays(int y, int m) { return m == 2 && leapyear(y) ? 29 : monthlengths[m-1]; }
More complicated initializations can use a static initializer block static { ... } to precompute the contents of an array like this: 3
private final static double[] logFac = new double[100]; static { double logRes = 0.0; for (int i=1, stop=logFac.length; i<stop; i++) logFac[i] = logRes += Math.log(i); } public static double logBinom(int n, int k) { return logFac[n] - logFac[n-k] - logFac[k]; }
The static initializer is executed when the enclosing class is loaded. In this example it precomputes a table logFac of logarithms of the factorial function n! = 1 2 (n 1) n, so that method logBinom(n,k) can efciently compute the logarithm of a binomial coefcient. For instance, the number of ways to choose 7 cards out of 52 is Math.exp(logBinom(52, 7)) which equals 133 784 560.
1.5 Methods
Declaring a method as private, final, or static makes calls to it faster. Of course, you should only do this when it makes sense in the application. For instance, often an accessor method such as getSize can reasonably be made final in a class, when there would be no point in overriding it in a subclass:
class Foo { private int size; ... public final int getSize() { return size; } }
This can make a call o.getSize() just as fast as a direct access to a public eld o.size. There need not be any performance penalty for proper encapsulation (making elds private). Virtual method calls (to instance methods) are fast and should be used instead of instanceof tests and casts. In modern Java Virtual Machine implementations, such as Suns HotSpot JVM and IBMs JVM, interface method calls are just as fast as virtual method calls to instance methods. Hence there is no performance penalty for maintenance-friendly programming, using interfaces instead of their implementing classes for method parameters and so on.
For arrays, use java.util.Arrays.sort, which is an improved quicksort; it uses no additional memory, but is not stable (does not preserve the order of equal elements). There are overloaded versions for all primitive types and for objects. For ArrayList<T> and LinkedList<T>, implementing interface java.util.List<T>, use java.util.Collections.sort, which is stable (preserves the order of equal elements) and smooth (near-linear time for nearly sorted lists) but uses additional memory. Avoid linear search in arrays and lists, except when you know that they are very short. If your program needs to look up something frequently, use one of these approaches: Binary search on sorted data: For arrays, use java.util.Arrays.binarySearch. The array must be sorted, as if by java.util.Arrays.sort. There are overloaded versions for all primitive types and for objects. For ArrayList<T>, use java.util.Collections.binarySearch. The array list must be sorted, as if by java.util.Collections.sort. If you need also to insert or remove elements from the set or map, use one of the approaches below instead. Hashing: Use HashSet<T> or HashMap<K,V> from package java.util if your key objects have a good hash function hashCode. This is the case for String and the wrapper classes Integer, Double, . . . , for the primitive types. Binary search trees: Use TreeSet<T> or TreeMap<K,V> from package java.util if your key objects have a good comparison function compareTo. This is the case for String and the wrapper classes Integer, Double, . . . , for the primitive types.
1.7 Exceptions
The creation new Exception(...) of an exception object builds a stack trace, which is costly in time and space, and especially so in deeply recursive method calls. The creation of an object of class Exception or a subclass of Exception may be between 30 and 100 times slower than creation of an ordinary object. On the other hand, using a try-catch block or throwing an exception is fast. You can prevent the generation of this stack trace by overriding method fillInStackTrace in subclasses of Exception, as shown below. This makes creation exception instances roughly 10 times faster.
class MyException extends Exception { public Throwable fillInStackTrace() { return this; } }
Thus you should create an exception object only if you actually intend to throw it. Also, do not use exceptions to implement control ow (end of data, termination of loops); use exceptions only to signal errors and exceptional circumstances (le not found, illegal input format, and so on). If your program does need to throw exceptions very frequently, reuse a single pre-created exception object.
Instead, use the enhanced for statement to iterate over the elements. It implicitly uses the collections iterator, so the traversal takes linear time:
for (T x : lst) System.out.println(x);
Repeated calls to remove(Object o) on LinkedList<T> or ArrayList<T> should be avoided; it performs a linear search. Repeated calls to add(int i, T x) or remove(int i) on LinkedList<T> should be avoided, except when i is at the end or beginning of the linked list; both perform a linear traversal to get to the ith element. Repeated calls to add(int i, T x) or remove(int i) on ArrayList<T> should be avoided, except when i is at the end of the ArrayList<T>; it needs to move all elements after i. Preferably avoid the legacy collection classes Vector, Hashtable and Stack in which all methods are synchronized, and every method call has a runtime overhead for obtaining a lock on the collection. If you do need a synchronized collection, use synchronizedCollection and similar methods from class java.util.Collection to create it. The collection classes can store only reference type data, so a value of primitive type such as int, double, . . . must be wrapped as an Integer, Double, . . . object before it can be stored or used as a key in a collection. This takes time and space and may be unacceptable in memoryconstrained embedded applications. Note that strings and arrays are reference type data and need not be wrapped. If you need to use collections that have primitive type elements or keys, consider using the Trove library, which provides special-case collections such as hash set of int and so on. As a result it is faster and uses less memory than the general Java collection classes. Trove can be found at <http://trove4j.sourceforge.net/>. 6
1.13 Reection
A reective method call, reective eld access, and reective object creation (using package java.lang.reflect) are far slower than ordinary method call, eld access, and object creation. Access checks may further slow down such reective calls; some of this cost may be avoided by declaring the class of the called method to be public. This has been seen to speed up reective calls by a factor of 8.
1.15 Proling
If a Java program appears to be too slow, try to prole some runs of the program. Assume that the example that performs repeated string concatenation in Section 1.3 is in le MyExample.java. Then one can compile and prole it using Suns HotSpot JVM as follows:
javac -g MyExample.java java -Xprof MyExample 10000
Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total)
Global summary of 19.01 seconds: 100.0% 929 Received ticks 74.6% 693 Received GC ticks 0.8% 7 Other VM operations
It says that 51.3% per cent of the computation time was spent in native method expandCapacity and a further 29.5% was spent in method append, both from class AbstractStringBuilder. This makes it plausible that the culprits are + and += on String, which are compiled into append calls. But what is even more signicant is the bottom section, which says that 74.6% of the total time was spent in garbage collection, and hence less than 25% was spent in actual computation. This indicates a serious problem with allocation of too much data that almost immediately becomes garbage.
Instead, do this:
10
public class Car { final static ImageIcon symbol = new ImageIcon("porsche.gif"); ... }
When you are not sure that an object will actually be needed, then allocate it lazily: postpone its allocation until needed, but allocate it only once. This will unconditionally create a Button for every Car object, although the Button may never be requested by a call to the getButton method:
public class Car { private Button button = new JButton(); public Car() { ... initialize button ... } public final JButton getButton() { return button; } }
This saves space (for the Button object) as well as time (for allocating and initializing it). On the other hand, if the button is known to be needed, it is more efcient to allocate and initialize it early and avoid the test in getButton.
3 Other resources
The book J. Noble and C. Weir: Small Memory Software, Addison-Wesley 2001, presents a number of design patterns for systems with limited memory. Not all of the advice is applicable to Java (for instance, because it requires pointer arithmetics), but most of it is useful albeit somewhat marred by pattern-speak.
4 Acknowledgements
Thanks to Morten Larsen, Jyrki Katajainen and Eirik Maus for useful suggestions. 11