String Interning and Threads
By Adrian Sutton
Anton Tagunov added an excellent comment to yesterday’s entry:
The one thing that has always stopped me from doing this was: interning must use some global lock. So, while interning would cause little to no harm for desktop application it is likely to introduce extra synchronization bottleneck for server applications. Thus potentially degrading perfromance on some multi-cpu beast. Would you agree with this, Adrian? Firstly, let me make something very clear: string interning will cause bugs in your code and they will be hard to track down – they will however be easy to fix once found. At some point, someone, somewhere will forget to intern a string and let it pass into your code that assumes strings are interned. Also, string interning is an extremely situational optimization. In most cases, it will worsen performance because the overhead of interning the strings will not be made up for by the reduced complexity of comparisons. Even of the cases where it does help, most of the time the difference will not be noticeable. As always, don’t bother optimizing until you know that you need to and that it will help – this is an optimization that causes the code to become less maintainable. Having said that, lets go back to the original question. Firstly, does string interning require synchronization? Probably, but not in terms of Java. The
String.intern()
method is a native method and works via JNI. It would be difficult to imagine a way of achieving the behavior without at least some synchronization though. The synchronized block however would be very small, and very rarely encountered. There are two situations to consider, either the string is already in the interned list or the string is not. If it is, then no synchronization needs to occur because the list is only being read. So multiple strings can be interned at once so long as all of them are already in the interned list. Synchronization will be needed however whenever a string is interned for the first time (ie: it doesn’t match any String constant that has been loaded or any previously interned string). So on a multiple CPU system, it would be very bad to intern a lot of strings that are only ever used once or twice as they would require a lot of synchronization for no benefit. Of course on a single CPU system, doing this would be a bad thing anyway because it would incur the extra cost of comparing strings to check if they match an interned string without gaining any real benefit. My theory would then be (and only real world application profiling will confirm this in any particular situation) that the string interning technique is slightly less likely to pay off on multiple CPU systems, however because the situations in which string interning is useful require that the vast majority ofString.intern()
calls match something already in the cache (most likely one of the string constants they’re to be compared against) the question of how many CPUs will be in use isn’t going to have any significant impact. I can’t stress enough though that if you don’t have specific profiling data that shows String comparisons as the biggest bottle neck in your application, you shouldn’t apply this optimization. Great question. UPDATE: Here’s an interesting discussion of interning relating specifically to this question. The automatic google search on the side (if you actually click through to this blog entry) is very handy at times.