End To End Testing And The 10 Minute Build
By Adrian Sutton
At least in my mind, there seems to be a clash of aims in XP. You want to make sure that you have complete confidence in your tests so that you can go faster and reduce the cost of change. To achieve this you write lots and lots of tests – until your fear of something breaking turns to boredom from writing tests you know will pass. Most of those tests are atomic and test a particular component, but fear lies in the gaps between components too so you regularly get recommendations like Ola Ellnestam’s on my previous post, Testing Your Setup Code:
Note: When you’re TDDing and getting the very loose coupling every one is longing for ;-) you must be aware that integration tests and acceptance tests are an absolute necessity. Since this is the only way to really test your configuration.
On the other hand, to be able to get rapid feedback you want a fast build – under 10 minutes. From James Shore’s draft of the 10 Minute Build chapter of his upcoming book, The Art of Agile:
For most teams, their tests are the source of a slow build. Usually it’s because their tests aren’t focused enough. Look for common problems: are you writing end-to-end tests when you should be writing unit tests and integration tests? Do you unit tests talk to a database, network, or file system?
You should be able to run about 100 unit tests per second (test_driven_development). Unit tests should comprise the majority of your tests. A fraction (less than 10%) should be integration tests, checking that two components synchronize properly. Only a handful, if any at all, should be end-to-end tests (testing).
The problem is, if you want to test your software comprehensively and be able to have confidence that your tests will tell you if you’ve broken something, I can’t see how you can avoid writing a lot of integration tests. I also don’t see why you would avoid automating end to end tests and running them very regularly. The reality is that you need to have a QA process that tests the application from how users actually use it – not from the point of view of this bit of the system or that bit of the system. You need to verify that when a user clicks on this menu item, the event is sent over to the editor pane which interprets it as a bold action and instructs the document to apply bold and finally that when the document serializes it comes out with a STRONG or B tag (depending on the user’s preferences) around the text.
If you don’t have a test that verifies that the message from the menu bar actually gets to the editor pane, how can you have confidence that bold works? How can you have confidence that the complex changes you’re making to the document result in the right end effect when the document is serialized?
I suspect there are a couple of contributing factors to my confusion around this issue. Firstly, I work with text and frankly there is nothing less predictable and safe in software than text. It seems simple on the face of it, but there is a huge amount of complexity that goes on behind the scenes and user’s absolutely demand that the editing experience is completely seamless and intuitive. There are few other environments where the number of possible program states is so incomprehensibly huge in a practical sense, where the differences really matter. On top of that, there’s a ridiculous number of possible user actions that are all available at the same time and all of them interact with the program state is subtly different ways to try to best match the user’s expectation.
In short, if there were any environment where you should be afraid of making changes, it’s code that deals with text. That fear turns into a desire to write lots of tests and make them as close as possible to what the user is actually doing. Capturing all the subtleties of the state and embedding them in an atomic test is difficult to get precisely write – you tend to cover the most important details but miss one or two bits of state that can come back to bite you when you least expect it. Having end to end tests resolves that sense of fear, because you know that the program is operating just like when the user actually uses it – you can’t have missed a detail somewhere, it’s all the real deal.
The other contributing factor is that I haven’t had the opportunity to work on a high quality code base that has very comprehensive atomic tests. Ephox’s code base is quite old, it’s mostly high quality code but it has some back alleys where ambushes lie in wait and it’s well tested but mostly with integration level tests, not atomic tests. It’s no surprise then that I don’t have complete confidence in the atomic tests – they just don’t cover enough of the application. That said, my confidence in the atomic tests is definitely growing as we add more tests and get better at knowing what to test and how to test it.
The bottom line is that now and for the foreseeable future, I’m not going to have enough confidence in the atomic tests to get rid of the slow end to end tests. However I do see it as important to improve our atomic tests and the confidence we have in them. Being able to verify that your changes haven’t caused problems in 5-10 seconds by running the appropriate tests is a huge boost to productivity. Being able to have confidence that everything will work in a minute or two by running all the atomic tests is a extremely powerful too. Despite that, knowing that Bob the Builder is going to come along behind you and run a comprehensive suite of end to end tests as well is priceless.