Pet Hate In Http Servers
By Adrian Sutton
Pages created on .Mac however are the source of a never-ending headache. Indeed, whenever one requests a page on an account that no longer exists (such as the former .Mac FJZone account), the Apple servers dutifully serve a tri-lingual error page… all the while returning a “200 Found” code. In other words, as far as robots are concerned, .Mac pages live forever.
This has to be my single most hated server misconfiguration. The problem is much more serious than unwanted pages turning up in Google searches – any program that tries to download resources from the server without explicit user interaction gets bitten because the server delivers a 404 page instead of the expected file without warning. The client side program can then only assume the file is corrupt on the server and give up.
This bites us from time to time with our spelling definitions file which is downloaded from an URL specified in our configuration file. Being a configuration file, obviously it’s possible for the system administrator to enter it incorrectly and we get a 404. Unfortunately, it’s also possible for the system administrator to configure us to not check for updates (after all, how often do you change dictionaries?) so we get a corrupt file and are then told to never look to see if it changes because the system administrator has now put the file in the right place.
The other option is worse, we detect the file is corrupt and assume it was corrupted in transit, so we try downloading again – only to get another corrupt file. You can put an upper limit on the number of attempts but it would be so much easier if the server didn’t lie to you.
The biggest culprit for this misconfiguration is IIS – anytime you set it up with an URL to use as a 404 page it sends it with a 200 OK message instead of a 404. Instead, you can select a local file (which can still be an ASP script etc) to use as your 404 page and it will correctly report a 404 error. I can only assume that it exhibits the same behavior if you set a custom URL for any of the other error codes.