Profiling and Optimization

By Adrian Sutton

April 30, 2004

Recently there has been a very interesting and at times heated discussion about optimization on the HUMBUG mailing list. It’s worth reading the archives (look for the “Which is better?” thread) to get the whole picture.

There were two main topics being discussed:

Whether you should look for optimizations as you code and implement them if they don’t affect readability and
Whether or not memory allocation in Java is slow and thus efforts should be taken to avoid it.

Issue 1 – Premature Optimization

Inside The Loop or Outside The Loop?

Most people argued that premature optimization is the root of all evil, and that in Java particularly it’s a waste of time because you almost always predict what the compiler and the JITC are going to do incorrectly. There were a couple of people (one in particular) who thought that optimizations that didn’t affect readability were a good idea. The example:

for (int i = 0; i < 1000; i++) {
  SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-DD");
  format.format(new Date());
}

was bandied around as an example of something you should optimize by moving the creation of the SimpleDateFormat out of the loop. The speed increase of doing so is very significant and so you should move it out of the loop, however there are two possible reasons why you would move it:

Because it’s faster outside the loop.
Because the scope of the formatter extends across all iterations of the loop and not just a single iteration.

It’s for the second reason that I would move the SimpleDateFormat outside of the loop and not the first. This is highlighted nicely by one of the other examples that floated around:

for (int i = 0; i < 1000; i++) {
  Dimension i = new Dimension();
}

In this case the creation of Dimension is a much cheaper operation then creating a SimpleDateFormat and parsing a string (after the first instance is created as that first instance causes AWT to be inintialized), so it really doesn’t matter if it’s inside the loop or out of it in terms of performance but if it were the same exact same dimension object that was unchanged throughout the iterations, I’d still move it outside of the loop because it is global to the loop and unaffected by the current value of the loop counter. Similarly with a class like:

public class Format {

  public String formatDate(Date date) {
    SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-DD");
    return format.format(date);
  }
}

I would move the SimpleDateFormat to at least a class variable and, if every instance of the class used the same format for the date, a static class variable. I’d do that simply because it better reflects the behaviour and intent of the class, not because it improves speed (if the formatDate method were only called once, or was never called on the critical path for performance, the change would not improve speed).

So out of this we come to a simple rule that all programmers should know and follow pretty much automatically:

Scope variables to reflect their usage. No more than required, no less.

Forwards Or Backwards?

The original question posted was about loops and whether or not you should evaluate the termination value (eg: array.length) before the loop and store it in a local variable to avoid having to recalculate it. Now, obviously in Java array.length happens to be a variable anyway so it’s unlikely to make any difference regardless. The question did lead to a bunch of different ways that people might try to optimize arrays, including going from end to finish to take advantage of the fact that CPUs can compare to 0 faster than they can compare to other numbers (apparently). It was pointed out many times, by many people that it’s not worth worrying about it because of the small amount of difference it would make, however one or two people thought that if you knew in advance techniques to make things run faster, why not use them? A benchmark was put forward to determine what the fastest way to iterate over an array was. It showed the code:

static Test array_unopt = new Test() {

  public int[] array = new int[10000];

  public void test() {
    for (int i = 0; i < array.length; i += 1)
      continue;
  }
};

static Test array_harry = new Test() {

  public int[] array = new int[10000];

  public void test() {
    int l = array.length;
    for (int i = 0; i < l; i += 1)
      continue;
  }
};

static Test array_aj = new Test() {

  public int[] array = new int[10000];

  public void test() {
    for (int i = array.length - 1; i >= 0; i -= 1)
      continue;
  }
};

took 1570, 1439 and 1456 milliseconds for each test respectively. I thought the entire benchmark was critically flawed. Even if you take the benchmark as valid, it showed no noticeable difference in the different ways to iterator over an array. Most developers however would expects to see the “for (int i = 0; i < array.length; i++)” version and would thus recognise it more easily. Out of this I would derive the rule (but others didn’t):

Benchmarks are a waste of time. Profile the real code instead.

Memory Allocation In Java

The SimpleDateFormat in the for loop example sparked off a significant debate about how expensive memory allocation was in Java, primarily because it was heap based rather than stack based. Obviously using SimpleDateFormat as a test class is a really bad idea because of the amount of string parsing it does in it’s constructor. The other benchmark that was put forward was creating a lot of Dimension objects in a loop. What really killed that benchmark is the fact that creating a Dimension object requires AWT to be initialized, so when the benchmark was run, it appeared that Java was exceptionally slow at creating a whole bunch of objects that effectively did nothing. The “equivalent” benchmark in C (which is actually completely different but never mind that for now):

#include <stdio.h>
#include <time.h>
#include <malloc.h>

int main()
{
  long t = time(0);
  int i;
  for (i = 0; i < 100000000; i += 1)
    free(malloc(16));
  printf("elapsed = %d\n", time(0) - t);
  return 0;
}

Turned out to run 30% slower than the Java benchmark (creating Dimension objects). That certainly surprised me. So even taking into account the fact that benchmarks are a horrible waste of time, I think it’s safe to say that memory allocation in Java is a lot faster than even I thought it was.

From this and a bunch of other suprises in the thread, I’d derive the rule:

If you haven’t profiled it, you don’t know how well it performs.

Learning

Throughout the thread, there have been a lot of cases where someone didn’t fully understand a part of the language or library that they thought they knew well. This lack of knowledge really stood out when it came to talking about performance because the performance of an application can be greatly affected by seemingly trivial details. All the people involved in the thread are really quite intelligent people and most of them have significant experience in the computing field, but nearly everyone (myself included) managed to come out with a statement that was at least partially wrong.

From this I’d derive the rule:

You always need to learn more.

Summary

While I’ve learnt a lot of things out of that thread, as well as becoming very frustrated at times, the main lessons I’ve either learnt from it or had reaffirmed by it were:

Scope variables to reflect their usage. No more than required, no less. Benchmarks are a waste of time. Profile the real code instead. If you haven’t profiled it, you don’t know how well it performs. You always need to learn more. Most importantly, many people pointed out the following, but I think Andrae Muys said it best:

Keep the code simple.
Get the code correct.
Measure.