Escaping Strings in XPath 1.0

C++, XPath No Comments »

XPath is a language for selecting nodes from an XML document. XPath is used extensively in XSLT and other XML technologies. I also vastly prefer using XPath (e.g. with XPathNavigator) over the XML DOM when manipulating XML in a non-streaming fashion.

In XPath, strings must be delimited by either single or double quotes. Given a quote character used to delimit a string, one can’t represent that same quote character within the string. This means that if you decide to use single quotes to delimit your XPath string, you couldn’t represent the string O'Reilly; use double quotes, and you can’t represent "Hello".

However, given a quote delimiter, you can represent the other quote character. We can use this observation along with the concat XPath function to devise a general quoting rule for XPath strings. It’s easiest to show this via a series of examples:

Original String Quoted XPath String
a 'a' (or "a")
O'Reilly "O'Reilly"
"Hello" '"Hello"'
"Hello, Mr. O'Reilly" concat('"Hello, Mr. O', "'Reilly", '"')

Below is a piece of C++ code which implements these quotation rules:

  1. std::string
  2. QuoteXPathString(const std::string& xpath)
  3. {
  4.     // If we don’t have any single or double-quote characters, quote the
  5.     // expression in single quotes.
  6.     std::string::size_type pos = xpath.find_first_of("’\"");
  7.     if (pos == std::string::npos)
  8.         return "’" + xpath + "’";
  9.  
  10.     // If we cannot find the alternate quotation character, quote the
  11.     // expression in the alternate quotation character.
  12.     char chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  13.     pos = xpath.find(chOther, pos + 1);
  14.     if (pos == std::string::npos)
  15.         return chOther + xpath + chOther;
  16.  
  17.     // The string has both quotation characters.  We need to use concat()
  18.     // to form the string.
  19.     std::stringstream ss;
  20.     ss << "concat("
  21.        << chOther
  22.        << xpath.substr(0, pos)
  23.        << chOther;
  24.     do {
  25.         chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  26.         std::string::size_type pos2 = xpath.find(chOther, pos + 1);
  27.         ss << ‘,’
  28.            << chOther
  29.            << xpath.substr(pos, pos2 - pos)
  30.            << chOther;
  31.         pos = pos2;
  32.     } while (pos != std::string::npos);
  33.     ss << ")";
  34.  
  35.     return ss.str();
  36. }

Usage looks like:

  1. std::string lastName = …; // May come from user input
  2. std::string xpath = "//Customer[LastName = " +
  3.     QuoteXPathString(lastName) + "]";

Microsoft’s XmlLite

Win32, XML No Comments »

Microsoft has created a new, lightweight C++ XML processing library called XmlLite. It includes a streaming XML writing class patterned after .NET’s System.Xml.XmlWriter.

This library makes the IXmlWriter in Implementing IXmlWriter Series obsolete for Windows developers.

MSXML4 Is Going To Be Kill-Bitted

Win32, XML No Comments »

In an effort to EOL MSXML 4 and encourage developers to use MSXML 6, the MSXML team is going to kill-bit MSXML4 in Q4 2007. This means that you will no longer be able to create instances of MSXML 4 from within Internet Explorer.

Be sure to update your applications accordingly. For this, the MSXML team’s post Upgrading to MSXML 6.0 may prove quite useful.

XSLT Variable Scoping Differences Across MSXML Versions

XSLT No Comments »

Subtle differences in variable scoping in XSLTs between MSXML 3.0 and 4.0 can result in XSLT files breaking if you upgrade your version of MSXML. Consider the following XSLT:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml"
                version="1.0"
                encoding="UTF-8"
                indent="yes" />

    <xsl:template match="/">
        <root>
            <elem>
                <xsl:variable name="foo">Value</xsl:variable>
                <xsl:value-of select="$foo" />
            </elem>
            <elem>
                <!-- This refers to the variable defined in
                     the previous sibling elem node -->
                <xsl:value-of select="$foo" />
            </elem>
        </root>
    </xsl:template>
</xsl:stylesheet>

This stylesheet (which does not depend on the input XML) works on MSXML 3.0 but fails on MSXML 4.0 with the error message

A reference to variable or parameter ‘foo’ cannot be resolved. The variable or parameter may not be defined, or it may not be in scope.

Clearly, MSXML 4.0 limits the scope of the foo variable to the first elem node, whereas MSXML 3.0 does not. I suspect MSXML 3.0 scopes a variable to its enclosing template.

These scoping differences cut both ways. Consider this attempt to fix the XSLT:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml"
                version="1.0"
                encoding="UTF-8"
                indent="yes" />

    <xsl:template match="/">
        <root>
            <elem>
                <xsl:variable name="foo">Value</xsl:variable>
                <xsl:value-of select="$foo" />
            </elem>
            <elem>
                <xsl:variable name="foo">Value</xsl:variable>
                <xsl:value-of select="$foo" />
            </elem>
        </root>
    </xsl:template>
</xsl:stylesheet>

This stylesheet works on MSXML 4.0 but fails on MSXML 3.0 with the error message

Variable or parameter ‘foo’ cannot be defined twice within the same template.

If you want the stylesheet to work on both processors, you must push up the variable declaration as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml"
                version="1.0"
                encoding="UTF-8"
                indent="yes" />

    <xsl:template match="/">
        <root>
            <xsl:variable name="foo">Value</xsl:variable>
            <elem>
                <xsl:value-of select="$foo" />
            </elem>
            <elem>
                <xsl:value-of select="$foo" />
            </elem>
        </root>
    </xsl:template>
</xsl:stylesheet>

Be careful. Even the smallest of changes can break your software.

Writing Streaming XML Using MSXML

C++, COM, Win32, XML No Comments »

In my posts Implementing IXmlWriter Series, I wrote a streaming XML writing class whose interface is based on .NET’s XmlWriter. I recently discovered that MSXML provides its own method to write streaming XML through the class MXXMLWriter.

MXXMLWriter supports a large set of functionality including encoding, indentation, disabling output escaping, and writing XML fragments. The generated XML can be written to an IStream, a BSTR, or a DOMDocument object. However, it’s interface leaves much to be desired. Usage looks like this:

  1. #import <msxml3.dll>
  2.  
  3.  
  4. // I’m using the #import-generated _com_ptr_t-based smart pointers
  5. MSXML2::IMXWriterPtr spMXWriter;
  6. hr = spMXWriter.CreateInstance(__uuidof(MSXML2::MXXMLWriter30));
  7. _ASSERT(SUCCEEDED(hr)); // TODO
  8.  
  9. // Configure the IMXWriter as appropriate.  We will be using the default of
  10. // writing to a BSTR which can be retrieved using spMXWriter->get_output().
  11.  
  12. MSXML2::ISAXContentHandlerPtr spSAXContentHandler(spMXWriter);
  13. _ASSERT(spSAXContentHandler != NULL); // TODO
  14.  
  15. // Be sure to check the hrs below
  16.  
  17. hr = spSAXContentHandler->startDocument();
  18. hr = spSAXContentHandler->startElement(L"", 0, L"root", 4, L"root", 4, NULL);
  19. hr = spSAXContentHandler->characters(L"text", 4);
  20. // endElement also takes the element name.  This means we may need to
  21. // maintain our own open element stack.
  22. hr = spSAXContentHandler->endElement(L"", 0, L"root", 4, L"root", 4);
  23. hr = spSAXContentHandler->endDocument();

The rough IXmlWriter equivalent is:

  1. #include "StringXmlWriter.h"
  2.  
  3.  
  4. StringXmlWriter xw;
  5. xw.WriteStartDocument();
  6.   xw.WriteStartElement("root");
  7.     xw.WriteString("text");
  8.   xw.WriteEndElement(); // /root
  9. xw.WriteEndDocument();

However, there might be a case to change IXmlWriter to use MXXMLWriter internally.

MSXML Versions

Win32, XML No Comments »

The Microsoft XML team recently posted in their blog a set of recommendations on which version of MSXML to use. In short, they recommend using MSXML 6.0 but falling back to 3.0 if it isn’t available.

My product currently tries MSXML 4.0 first — but only SP2 and above. We ran into bugs with previous versions of MSXML 4.0. I will have to evaluate moving to MSXML 6.0.

More XPath Tricks

XPath No Comments »

Consider, once again, the XML file from yesterday’s post Selecting A Maximum Value Using XPath:

<Prices>
  <Price>
    <Date>2006-09-01</Date>
    <Open>25.89</Open>
    <High>25.97</High>
    <Low>25.64</Low>
    <Close>25.84</Close>
    <Volume>31594600</Volume>
    <AdjClose>25.84</AdjClose>
  </Price>
  <Price>
    <Date>2006-08-31</Date>
    <Open>25.87</Open>
    <High>25.98</High>
    <Low>25.68</Low>
    <Close>25.70</Close>
    <Volume>26380500</Volume>
    <AdjClose>25.70</AdjClose>
  </Price>
  ...
</Prices>

If you cannot assume the data are sorted, the types of XPath 1.0 expressions you can write are deeply limited; as we saw last time, we couldn’t even write an expression which selects the Price element with the latest Date. However, if you can assume the data are sorted in latest-first order, you can write some fairly useful expressions:

Select the Price element with the latest Date:

This is the first Price element, so the expression is:

Prices/Price[1]
Select the Price element with the earliest Date:

This is (obviously) the last Price element, so the expression is:

/Prices/Price[last()]
Select all Price elements from the last trading day of the month:

Normally this is fairly complicated because stock markets aren’t usually open on weekdays or holidays. However, because the Price elements are assumed to be in latest-first order, we can simply look for the (physically) first Price element for a month. While this may not work for all XPath parsers, the following expression works on my machine:

/Prices/Price[substring(preceding-sibling::Price[1]/Date, 1, 7) != substring(Date, 1, 7)]

This reads “Select all Price elements whose Date element have a different year and month from the previous Price element’s Date.”

This expression may not work on all XPath parsers because it assumes that the first Price element in the preceding-sibling axis is the immediate predecessor — in other words, the preceding-sibling axis iterates backwards. I do not know if this is guaranteed by the standard, but I can see that it might commonly be implemented this way.

Furthermore, while this expression correctly selects the <Date>2006-09-01</Date> element as part of the result, it does so for subtle reasons. It might be clearer to explicitly test for the number of elements in the preceding-sibling axis using count().

Selecting A Maximum Value Using XPath

XPath, XSLT No Comments »

Let’s say you have an XML file which contains daily stock prices, such as the following:

<Prices>
  <Price>
    <Date>2006-09-01</Date>
    <Open>25.89</Open>
    <High>25.97</High>
    <Low>25.64</Low>
    <Close>25.84</Close>
    <Volume>31594600</Volume>
    <AdjClose>25.84</AdjClose>
  </Price>
  <Price>
    <Date>2006-08-31</Date>
    <Open>25.87</Open>
    <High>25.98</High>
    <Low>25.68</Low>
    <Close>25.70</Close>
    <Volume>26380500</Volume>
    <AdjClose>25.70</AdjClose>
  </Price>
  ...
</Prices>

Excerpt from MSFT.xml generated on 2006-09-05 from Yahoo Finance’s MSFT Historical Prices and YahooCsvToXml.py

Now let’s write an XSLT fragment which displays the Price element with the latest Date:

<xsl:for-each select="/Prices/Price">
  <xsl:sort select="Date" order="descending" />

  <xsl:if test="position() = 1">
    <xsl:copy-of select="." />
  </xsl:if>
</xsl:for-each>

What if you wanted to do this in pure XPath 1.0? Well, normally one would use something akin to Jeni Tennison’s XPath maximum ‘trick’ and write the following XPath expression:

/Prices/Price[not(preceding-sibling::Price/Date > Date or
                  following-sibling::Price/Date > Date)]

This expression reads “Select the Price element that doesn’t have a sibling Price element with a Date greater than this one.” (By the way, you should be careful with this XPath expression and large node sets — it is highly likely it runs in O(n2) time.)

Unfortunately the above expression doesn’t work for dates because XPath’s comparison operators only work on numbers, not strings. I tried writing the equivalent expression using Microsoft’s ms:string-compare XPath extension function but it didn’t work — I believe because it only compares two strings whereas the expression requires a function that compares a node-set to a string and returns a node-set.

As far as I can tell, the only way to perform this selection in pure XPath 1.0 is to change the original XML by converting the Date values to numbers (by removing the dashes). Hopefully XPath 2.0 will have a more palatable solution.

Disabling Default XSLT Templates

XSLT No Comments »

As XSLT developers quickly learn, the W3C XSLT Recommendation requires for all XSLT processors to implement a number of built-in rules. Per the spec, these are the built-in rules:

<xsl:template match="* | /">
  <xsl:apply-templates />
</xsl:template>

All XML elements apply child templates recursively

<xsl:template match="* | /" mode="m">
  <xsl:apply-templates mode="m" />
</xsl:template>

All XML elements apply child templates recursively for every processing mode m

<xsl:template match="text() | @*">
  <xsl:value-of select="." />
</xsl:template>

All text nodes and attributes return the value of their contents

<xsl:template match="processing-instruction() | comment()" />

All comments and processing instructions are ignored

The net effect of these implicit rules is that an XSLT stylesheet without any templates defined will simply return the string values of all child elements in the XML document concatenated together. Attribute values are ignored because <xsl:apply-templates /> only applies to child elements and text nodes, not attribute nodes.

There are many instances where these default templates are useful, but I often find they mask bugs in my stylesheet (e.g. when I mistype a template match expression). Instead, I usually prefer that the stylesheet fails if it comes across an unanticipated XML element (hopefully loudly). I use the following XSLT fragment to achieve this behavior:

<xsl:template match="*">
  <xsl:message terminate="yes">
    <xsl:text>ERROR: Unhandled XML element: </xsl:text>
    <xsl:value-of select="name(.)" />
  </xsl:message>
</xsl:template>

When I desire the default apply-templates behavior, I add an explicit handler:

<!-- Enable default apply-templates behavior for these elements -->
<xsl:template match="/a/b/c | /a/d/e | ...">
  <xsl:apply-templates />
</xsl:template>

How Return XML From ASPX in ASP.NET 1.1

C#, XML No Comments »

I’m not sure if this is the “canonical” way to do it but here’s a description of how to write an ASP.NET 1.1 ASPX page which returns a XML document (e.g. when writing a home-brewed web service).

First, create a new Web Form (I will call it WebService.aspx). As we will be progamatically generating the XML in the HTTP response rather than sending the (processed) content of the ASPX file, delete everything from the ASPX file but the @Page directive, so that it looks something like:

<%@ Page language="c#" Codebehind="WebService.aspx.cs" AutoEventWireup="false"
    Inherits="WebService.WebService" %>

Next, open up the code-behind file WebService.aspx.cs. Within the Page_Load event handler, add the following code block:

private void Page_Load(object sender, System.EventArgs e)
{
    Response.ContentType = "text/xml";
    Response.ContentEncoding = Encoding.UTF8;

    using (TextWriter textWriter = new StreamWriter(Response.OutputStream,
                                                    Encoding.UTF8))
    {
        XmlTextWriter xmlWriter = new XmlTextWriter(textWriter);
        // Write XML using xmlWriter
    }
}

Notice the use of the HttpResponse.OutputStream property which allows us to write directly to the HTTP response body. Also notice that I explicitly set the Content-Type and Content-Encoding HTTP response headers, and that the encoding for both the response and the StreamWriter must match.

Once you have this block in place, you can use whatever technique you like to write XML to the xmlWriter object. For example, you can call XmlWriter methods by hand, pass xmlWriter as a parameter to XslTransform.Transform(), or use the XmlSerializer.

Update 2006-02-26 1:41 PM: In this comment, dbt suggests writing the XML in the page’s Render() method to avoid problems with chunking encountered when using Page_Load().

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in