Structure In An Unstructured World

By Adrian Sutton

August 28, 2007

There’s a constant argument over whether data should be structured or unstructured in content management and knowledge management systems. The key advantage of structured data is that it’s easier to process and manage – the system can manipulate and report on the data far more accurately. The downside is that it’s more difficult and frustrating for users to be limited to the specified structure so less data tends to get captured and it can be more difficult to get adoption.

So how do you get the best of both worlds? There’s been a number of approaches taken from Google refining search techniques for unstructured data and a number of other systems attempting to parse natural language to identify dates, appointments and other items. Meanwhile the structured crowd invest in ways to improve user interfaces and define more flexible types of structures.

I think the middle ground looks something like microformats, little bits of structure within a generally unstructured system. The downfall of microformats is that they tend to be way too complex to apply but the concept is sound. There needs to be a focus on making it easy and natural for users to create the right structure. With tool support and appropriate feedback to users the experience can be really smooth while still capturing the vital information. We’ve been playing around with ways to make creating links in our internal wiki easier and I think it’s a reasonably good start towards finding the magical balance. The wiki uses HTML and a WYSIWYG editor to avoid users having to think about markup but having to open up the hyperlink dialog just to link to another page gets in the way on a wiki, so we’ve preserved the ability to use wiki markup and put the page name in square brackets. This is good, but looks ugly and doesn’t provide feedback on whether or not the user got the pattern right.

The second iteration added a plugin to the editor that automatically identified correctly formatted wiki links and converted them to real hyperlinks, so when the user hits ‘]’, the link switches over giving them clear indication that they got the pattern right and showing them the results. The main problem at the moment is that if they enter a valid pattern that doesn’t do what they want (say they get the display text and the link target mixed up), it’s not simple enough to correct the mistake because what they’ve typed has now completely changed. Undo works but the backspace key doesn’t yet and it probably should. We also need to identify what happens when they edit the text of the link after it has been converted – in many cases they intend the link target to change as well, but not always.

You can imagine this kind of system being extended to task lists and appointments as well – when the system recognizes a date it should mark it appropriate so the user can tell they got the format right and what the results are. For somethings it may be better to use a standard convention like the square brackets – for instance, tasks to be completed might start with an exclamation mark. Simple to type and if appropriate feedback is given simple to use.

The key element is providing clear feedback right inline with the text. Most of the existing systems I’ve seen provide a plain text area and build up a list of things it recognized over to the side. The problem is that the user is focussing on the text they’re writing and not on the list of items the system is building up so they have to keep stopping and looking over to check that it all worked correctly.

There’s a lot of really cool stuff to come out of this area in the future and we’re seeing the start of it in existing systems but I suspect there’s a whole new level to be reached as systems begin to act more and more as intelligent agents that assist the user in getting their work done. The next major challenge that I see is to get the feedback system right so it is clear but not intrusive.