The Default Namespace
By Adrian Sutton
Byron complains about what he calls a limitation of XPath. It’s not actually a limitation of XPath at all but rather a very common mistake people make when working with XML namespaces. Lets take a tour into the dark depths of XML namespaces to discover what’s really going on. Originally, XML didn’t have namespaces at all, every element was identified purely by it’s name. So the element <html>
was known as html
and the world was simple. Then people discovered that they wanted to combine XML documents and that quite often they’d wind up with two elements called html
that had completely different meanings and uses – ie: they were actually two different elements. To solve this problem, the clever folk over at the W3C changed the way XML elements and attributes were named. Now instead of a simple string, elements would be named using a QName (short for Qualified Name). A QName is a compound data object consisting of an URI and the regular name we’re used to. Now the important thing to note in this distinction is that it is impossible to refer to an element only by it’s local-name, you have to use a QName to refer to it and that QName must have a namespace attribute (all QName’s do). It is however possible to assign nil to the namespace attribute of a QName, which is referred to as an element in the nil namespace. Now, one of the important things about adding namespaces to XML was backwards compatibility so there had to be some way to assign a namespace to all those elements which were previously referred to only by their local-name. This is where the default namespace comes in (it also happens to be quite convenient). By default in an XML document, any element that doesn’t have a prefix to declare which namespace it is in, is assigned the “default namespace”. The default namespace however isn’t an actual namespace, it’s just a default value for the namespace attribute. By default, the default namespace is the nil namespace. Now, in XML you can specify what the default namespace is by adding an xmlns
attribute, eg: xmlns="http://www.w3.org/1999/xhtml"
. The new value for the default namespace is then in effect for that element and any element under it (unless it’s changed again). It is important to note here that an XML document doesn’t have a default namespace, but rather each element has a value which it inherits (attributes never ever use the default namespace). There can therefore be as many different values for the default namespace as there are elements. So if we were to add Byron’s idea of matching any element in the default namespace, it would never match anything because every element (and attribute) would have an explicit namespace. Worse still, the default namespace would change depending on which element we were in and what the specific representation of the XML was (maybe the default namespace was left as nil and every element used a prefix or maybe no elements were prefixed and the default namespace was changed all through the document). Depending on the representation of XML instead of the actual data it represents is very bad practice and will cause problems. This leads right into the other common mistake in Byron’s post: //*[name() == 'foo']
will do nothing particularly useful. What it will do is match any element with a local-name of foo, in any namespace as long as the element was represented without a prefix. It will not match <my:foo xmlns:my="http://www.intencha.com/my" />
because name()
for my:foo
will return my:foo
. Elements are the same if they have the same QName regardless of whether or not a prefix used or if different prefixes were used. The correct way to select any element with a local-name of foo in any namespace regardless of what prefix was used is: //*[local-name() = 'foo']
If I were to take a shot at selecting any node which had a namespace-uri the same as the default namespace in effect in the original representation of the XML document’s root node it would be: //*[string(/*/namespace::*[name() = ""]) = namespace-uri()]
Which is to say:
string(/*/namespace::*[name() = ""])
- Get the namespace nodes applicable to the document element and filter them for just the ones that have no name (ie: the default namespace) then convert it to a string (the string value of a namespace node is it’s URI).
//*[... = namespace-uri()]
- Select any node which has a namespace-uri matching the result of the above expression. No idea how standards compliant that is but it does work with Jaxen. This will however also select nodes which used a prefix to be put in the same namespace as the default namespace in effect for the document element. To avoid this, we would have to add
not(contains(name(), ":"))
to get://*[string(/*/namespace::*[name() = ""]) = namespace-uri() and not(contains(name(), ":"))]
It would however be much simpler if we just wanted to select any node that used the default namespace, we just need to check that it doesn’t have a prefix, so://*[not(contains(name(), ":"))]
Most people by now would have run away screaming but fortunately since this is the kind of thing I play with all day every day (writing an XML editor is not the world simplest task), I rather enjoy playing with this kind of thing.