Debugging Deadlocks – Print All Stack Traces
By Adrian Sutton
One of the hardest types of bugs to track down is a deadlock situation. They are very time dependant which makes them intermitten, specific to a particular computer and configuration and generally impossible to reproduce in a debugger. Fortunately, at least in Java, it’s fairly easy to spot most of the situations where a deadlock is possible:
- Calls to SwingUtilities.invokeAndWait
- Calls to Object.wait()
- synchronized blocks
There might be a few others depending on the libraries you’re using, but starting with those three, in that order, is very likely to lead you to at least one point in the deadlock. Just put an old fashioned System.err.println before and after each of those calls and you’ll quite quickly see where things are waiting forever.
Deadlocks have to have at least two threads involved. To find the other one, you need to print the stack traces of all the other threads immediately before you actually hit the deadlock. So once you’ve found one call that’s part of the deadlock, add a call to the method below just before it.
private static void printAllStackTraces() { Map liveThreads = Thread.getAllStackTraces(); for (Iterator i = liveThreads.keySet().iterator(); i.hasNext(); ) { Thread key = (Thread)i.next(); System.err.println("Thread " + key.getName()); StackTraceElement[] trace = (StackTraceElement[])liveThreads.get(key); for (int j = ; j < trace.length; j++) { System.err.println("\tat " + trace[j]); } } }
You’ll get a whole bunch of threads and with a bit of digging around to see exactly what the code is doing in each thread you’ll probably find a pair of threads like below that are waiting on each other (package names removed to make things shorter):
Thread Thread-14 at Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at EventQueue.invokeAndWait(EventQueue.java:848) at SwingUtilities.invokeAndWait(SwingUtilities.java:1257) at CachingFileDownloader.run(CachingFileDownloader.java:204) at WorkerThread.run(WorkerThread.java:49) Thread AWT-EventQueue-2 at Object.wait(Native Method) at Object.wait(Object.java:474) at CachingFileDownloader.waitUntilDownloadComplete(CachingFileDownloader.java:290) <snip>
Yep, the swing thread is waiting on the download and the download thread is trying to do something on the swing thread via invokeAndWait. It’s actually trying to close the download progress dialog which can most likely just use an invokeLater, thus avoiding the deadlock.
This is also a really good example of why I put SwingUtilities.invokeAndWait at the top of my hit list. Also, the actual call to invokeAndWait that I put the debugging above, wasn’t really involved in the deadlock, it just happened to need to wait on the Swing thread as well and wound up waiting forever because the other two threads were deadlocked with each other.
As a side note, I think this is the first time since we started publishing the twice daily builds to my blog automatically that I’ve had a catastrophic failure where I couldn’t post at all. That’s not a bad run considering it’s been at least a couple of years living on the edge.
Sigh, and upon hitting save I find that the debugging code I added above can cause security exceptions when triggered as part of our form submission process in Safari (but not FireFox) on OS X. So if you originally saw this post with no content, that would be why…