JavaScript Testing in the Cloud

By Adrian Sutton

June 8, 2010

One of the things Ephox is contributing to TinyMCE is a build farm to run the automated tests in various browsers which winds up publishing it’s results for all to see. This has been pretty interesting to set up and there are a range of different approaches. Matt Raible posted recently about his experiences using Selenium with Sauce Labs. I had initially looked into that as well, but was worried about a few of the issues Matt hit and TinyMCE had already written a lot of tests using QUnit rather than Selenium.

Instead I’ve wound up with quite a neat little set up based around Hudson. The master Hudson server is running on an EC2 instance, so the configuration and control interface is easily available from anywhere. However, EC2 can only run specific Windows server versions so it can’t provide all the browsers we need. Instead, the slaves are run as VMWare instances back behind the firewall in our UK office. They use Hudson’s webstart slave support so they connect out to the master, avoiding the need to punch a hole in the firewall. At this stage we have Windows XP, Vista and Windows 7 running a suite of browsers (roughly grouped as “old”, “previous version” and “latest” on each of those VMs). We’re also using the master server to run the Linux browser tests.

The next challenge was to get QUnit working with continuous integration so it’s results are reported back correctly. Unlike JSUnit or Selenium, QUnit doesn’t really have anything like this built in, though it does provide some hooks to make it possible. I simply took the JSUnit server, which is completely agnostic about what actually runs in the browser, and a simple HTML and JavaScript harness to marshall the QUnit test results and submit them back to the server.

Finally we want to set up a workflow so tests on different VMs can run in parallel but builds are only published if they actually pass all the tests. To achieve that, I’ve split the build into three parts:

The build itself. Minifies the JavaScript and various other stuff to build the zip package that would be distributed. At this stage, that zip is just left on the Hudson server and not published anywhere. It may be possible to do this using the “touchstone build” option for the configuration matrix type but I haven’t investigated that yet.
The test project is set up as a configuration matrix type, so Hudson automatically duplicates the build on any slaves that are chosen (in this case, all of them plus the master to cover Linux). Each slave then downloads the zip package from the previous build phase and runs the tests. If any of the VM builds fail the test project is considered to fail and the process stops.
Finally if the tests all pass the packaging project starts which simply publishes the zips to various places. It uses the workspace clone plugin to effectively pick up where the build step left off.

With this approach we are essentially building a full release candidate, testing it and then finally releasing it. While the tests don’t exercise it fully enough, this has the advantage of actually testing that the intended files are making it into the zip. Most unit test setups run the tests based on just the compiled classes with no guarantee that they’ll come out the same way after being packaged up. While I was just grabbing the full zip out of pure convenience, it is nice to know that catastrophic packaging problems would be picked up by the tests now and I can quite easily build out more tests to pick up smaller errors as well.

Current issues:

The slaves are extremely good at reconnecting after the server reboots, but if the slaves themselves reboot sometimes their authentication has timed out and they can’t re-download the JNLP file without help. It would be nice if I could specify authentication credentials for Java WebStart on the command line.
The test project is mostly triggered by the build project completing, but it also polls for changes in the git repository of the test harness. That way if we change the harness it verifies that all the projects that test harness is used by will still build and pass successfully rather than giving us a surprise failure on a later, unrelated change. Unfortunately you can’t decide which Hudson instance will be the one polling for updates and winds up distributing it out to the remote slaves (in itself a pretty stupid thing to do in this particular setup). When you restart the server though it takes a few moments for the slaves to reconnect and in the mean time Hudson moves the polling back to the master server, which has an out of date workspace for that project so it almost always detects a change. The net result is that restarting the server causes all the tests to run for no reason.
The matrix project type basically makes a set of sub-projects for each VM so things like Twitter notifications report on multiple cryptic projects rather than just reporting on the overall project (hence I’ve given up on the twitter notifications).

Things to improve:

Add a Mac into the mix – Safari is covered from the Windows builds anyway but there can be subtle differences between versions on different OS’s and since it’s all automated, we may as well be thorough.
Make the slaves redundant. I’d like to effectively duplicate each of the slave VMs so that I have a “latest browser” VM pool rather than a specific VM. That way if one of the VM fails to reconnect or isn’t available for some reason, there is a reasonable chance that the other one will be available. If they’re both up and running it gives extra throughput which is always good too.

Overall, I’m quite pleased with the setup and look forward to growing the test coverage for TinyMCE.