On Ampersands And Standards
By Adrian Sutton
Byron commented on ampersand redux:
Yes, an ampersand is valid as part of an attribute value (as represented in an HTML document) where that ampersand is part of an entity reference. An ampersand that is not part of an entity reference is not valid in an attribute value, in an HTML document. Serialization has nothing to do with it, since an HTML document is not the serialization of a DOM tree, although it can be viewed as such. I did not mean to say anything about serializing attribute values, I meant to say that an attribute value in an HTML document cannot legally have an ampersand that is not part of an entity reference. If your document does have such an ampersand, it will not validate. It might work in current browsers, but down the road it might not. Don’t do it. If a browser gets it wrong, file a bug against the browser or avoid ampersands entirely, don’t force every other author of HTML parsers to work around your markup’s faults. I still disagree with the first part – ampersands are perfectly valid in HTML comments but when serialized they must be escaped as entities. It is critical to consider entities as equivalent to the character they represent, otherwise é wouldn’t be the same as é which is clearly ludicrous. Regardless, the point is entirely academic so I’ll leave it at that. The last part however is crazy. If a browser has a bug and you need to support that browser, you should do whatever it takes to make your application work with that browser – standards be damned. It is in no way acceptable for a software developer to skip requirements just because it would mean conflicting with a standard. If adhering to the standard was also a requirement then the higher priority requirement should wind up being implemented and the other one revised to not be in conflict. If you can get the browser vendor to fix the issue and you consider it acceptable to make all your clients upgrade to the fixed version then by all means follow the standard – otherwise through it out. Standards are designed to enhance interoperability, if they reduce interoperability in areas that are important to your project they are completely worthless and should be ignored. The comment about forcing every other HTML parser to work around the markup problems is a red herring as well – HTML parsers already have to deal with that kind of thing and that’s not going to change. XML parsers on the other hand do not have to handle invalid mark up and most don’t which is precisely why I pointed out that you should always escape ampersands correctly in XHTML despite the fact that most if not all browsers will get it right either way. Software development is about achieving the project’s requirements. It’s not about politics, it’s not about standards and it’s not about making yourself feel good. If you can meet your requirements and do any of that, then great, but the requirements are the only thing that have to be achieved and they override anything else. That said, any of those things could be made a requirement of the project, but it’s quite rare that they would actually be requirements let alone high priority ones.