Mar 17
The problem:
- Debian has configured Apache such that it will add a Content-Type: … charset=iso-8859-1 to the HTTP request headers of all files with unknown types. This overrides my <meta http-equiv…charset=utf-8> line in my website which sets the character set to UTF-8, and thus breaks the handling of extended ASCII characters (making résumé appear incorrectly). I would consider disabling it, but it does exist for a reason. It is also the default for Apache 2.0.
- My XSLTs are configured NOT to include the <?xml version=”1.0″ charset=”…”?> stanza at the beginning of my webpages (more below). When I make my XSLTs produce ISO-8859-1 output without this stanza, my output validation stage fails because the document is not UTF-8. It suggests to use the <?xml…?> stanza to specify the character set.
- When I output the <?xml…?> stanza, Opera and IE do not display the page correctly. IE also has a bug where it won’t turn on strict conformance mode (to eliminate CSS bugs) if the <?xml…?> stanza exists.
The solutions seem to be:
- Eliminate all extended ASCII from output, and replace it with character references (such as é). This is obviously evil.
- Disable Apache’s Content-Type HTTP header crap. This is evil: see above.
- Forget about the output validation stage. Evil.
- Generate ISO-8859-1 with the XML stanza and add an extra stage after output validation that strips off the <?xml…?> stanza. Evil.
- Try to find a way to set the content-type of files so Apache sends the proper content-type in the HTTP headers. If done with .htaccess files, it will be a big PITA.
- Eliminate non-7bit ASCII altogether. Ugh.
What a mess.
Recent Comments