Conversion for the Web
By Adrian Sutton
Andrew Shebanow in Open Government and PDF:
The issue at hand is not whether governments should pick HTML or PDF. The issue at hand is whether governments are capable of publishing information at all. Show me an HTML creation tool that creates high quality, standards conformant markup from a Word document or any of the zillions of editing tools that government employees use. Now add in all the tools used by people who submit documents to the government. And all the versions of those tools released in the last 20 years. Now make sure that the HTML/XML works correctly even when the user doesn’t have the right browser or the right fonts installed. I’ve actually worked with a number of government departments who were looking to move more content online and the content conversion problem is definitely a time consuming and challenging part of the problem. That’s precisely why I wind up getting involved, since EditLive! lets you easily copy and paste content from Word documents and produce clean, compliant XHTML. It can even (optionally) strip out inline formatting and leave just the structure like headings, tables and lists.
Furthermore, EditLive! is actually quite good at making sure the HTML works correctly even when the user doesn’t have the right browser or the right fonts installed, especially when it’s been configured to suit the particular content needs. Even with non-technical business authors this can work very well and is doing so for a significant number of government departments.
That’s not to say it’s the whole solution, there are systems out there where it’s hard to convert the content to HTML and where HTML may not be the best format anyway. Some of those cases may work better with PDF but certainly not all of them. To somehow suggest that PDF is a complete and simple solution to publishing information on the web misses quite a lot of the picture. For example:
- How do web site visitors navigate around and get to that PDF data? How do they search and find it? As much time is spent working out navigation structures as it is converting content.
- How do you expose information from databases with regularly changing information? Wouldn’t a HTML representation be easier to generate than PDF in most of these cases?
Putting information on the web is not simple and no single technology is going to make it simple. PDF definitely has it’s place on the web, but so does HTML and a number of other formats. PDF doesn’t alleviate compatibility concerns, not all users have a recent enough PDF reader, not all PDF embed all the fonts and when they do it makes the download very large etc and not all PDFs are standards compliant. Putting non-web stuff on the web is always a big, challenging project, so review the available technologies carefully and pick the ones that best achieve your goals. Very few companies have success with just dumping a whole heap of PDFs on a web server.