The Model Doesn’t Have To Match The Output

By Adrian Sutton

September 29, 2006

There is an interesting tendency in software development to try to keep the internal model as closely in sync with the output format as possible. When you control both, that’s probably not a bad idea – it simplifies serialization. What is important is that the user interface matches up precisely with the user’s mental model of how to use your software. Attempting to keep your model simultaneously close to the user’s model and the output format often forms contradictory goals.

I had a recent experience that really highlighted the power of knowing the difference between the user’s model and the output format: determining when to use a space and when to use a non-breaking space to allow users to have adjacent white space in HTML. The HTML standard makes white space insignificant so two spaces together render as if there was only one space. This is very useful for making your HTML readable, but it doesn’t match up with the user’s expectations when they use a WYSIWYG HTML editor. When a user presses space twice, they want two spaces – ignoring the second press of the spacebar would be frustrating.

Our original attempt at meeting this user expectation was to insert a non-breaking space if the user pressed space with the caret after an existing space character. This worked pretty well, except that when the user typed a single space at the start of a paragraph, that space was lost. So we added code for that case too. Then the user deleted a word, leaving two spaces together. Then they typed text between two spaces, removing the need for the second space to be a non-breaking space. Everywhere we looked there was a new case that we had to handle. All these special cases are hard to keep track of and our internal builds started becoming more and more confused about when to insert a non-breaking space1. It drove us nuts trying to use our internal wiki and constantly discovering that the new page we were creating had a non-breaking space entity in the title because we confused this poor, overly complicated code for managing white space.

It turns out the solution is very simple. We’d been trying to keep the internal model in sync with the output format, instead of with the user’s mental model. In the user’s mind, white space is important and two adjacent spaces mean two adjacent spaces. In fact, even our internal model was happy with that because the Swing text APIs work with a variety of formats and render two adjacent spaces as two spaces. The only problem was that when we serialized, those adjacent spaces suddenly became insignificant. So we changed the serialization code. Now, when you hit space twice, internally we get two adjacent spaces. When we serialize the model though, we detect those two spaces and change the second one to a non-breaking space3.

1 – You can see this in the source code for many of my older posts2, since this blog runs the latest internal build as its post editor.↩

2 – and you can see it in some recent posts too because somewhere along the line a client requested that shift-space should insert a non-breaking space and I keep accidentally hitting shift-space instead of just space. It's going to become a configuration option soon.↩

3 – the parser will make sure that when loading in a document, the white space is interpreted according to the normal HTML rules, and we still add extra white space when serializing to make the HTML source readable, it's just the user's actual data that gets non-breaking spaces as needed.↩