JCR Woes

By Adrian Sutton

June 27, 2007

So we’ve got a new internal system that we’ve built on top of JCR. Currently we’re using Jackrabbit as the repository, but eventually it will be ported over to something like IBM Portal or something like that. Unfortunately, right now we’re deploying the app to a pretty limited server – both in terms of CPU and RAM.

It turns out that using Jackrabbit with the Derby persistence manager in that kind of situation is a horrible, horrible idea. Everything works great on systems with modest amounts of CPU and RAM but once we deploy to that poor little virtual server in the sky page load times skyrocket and the whole thing becomes unusable.

Profiling showed a few things we could fix but didn’t solve the problem, so we set out to do some testing with different persistence managers – we already have MySQL running, why not try that? Well, mostly because the JCR export produces XML and sadly with Jackrabbit (and possibly with any compliant repository – not yet sure) that produces an invalid XML file which can’t then be imported. Turns out putting binary data in XML files doesn’t really work – particularly if that data includes characters like and � which aren’t valid in XML even if you entity encode them. At this point I could just reimport the data again from scratch, but since I know we’re going to need to migrate repositories again in the future I need a reliable way to export and reimport, not to mention the fact that it would be nice to be able to back up and a stable format instead of the random binary formats that form Jackrabbit’s native configuration.

Sigh.