Escaping Strings in XPath 1.0

C++, XPath No Comments »

XPath is a language for selecting nodes from an XML document. XPath is used extensively in XSLT and other XML technologies. I also vastly prefer using XPath (e.g. with XPathNavigator) over the XML DOM when manipulating XML in a non-streaming fashion.

In XPath, strings must be delimited by either single or double quotes. Given a quote character used to delimit a string, one can’t represent that same quote character within the string. This means that if you decide to use single quotes to delimit your XPath string, you couldn’t represent the string O'Reilly; use double quotes, and you can’t represent "Hello".

However, given a quote delimiter, you can represent the other quote character. We can use this observation along with the concat XPath function to devise a general quoting rule for XPath strings. It’s easiest to show this via a series of examples:

Original String Quoted XPath String
a 'a' (or "a")
O'Reilly "O'Reilly"
"Hello" '"Hello"'
"Hello, Mr. O'Reilly" concat('"Hello, Mr. O', "'Reilly", '"')

Below is a piece of C++ code which implements these quotation rules:

  1. std::string
  2. QuoteXPathString(const std::string& xpath)
  3. {
  4.     // If we don’t have any single or double-quote characters, quote the
  5.     // expression in single quotes.
  6.     std::string::size_type pos = xpath.find_first_of("’\"");
  7.     if (pos == std::string::npos)
  8.         return "’" + xpath + "’";
  9.  
  10.     // If we cannot find the alternate quotation character, quote the
  11.     // expression in the alternate quotation character.
  12.     char chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  13.     pos = xpath.find(chOther, pos + 1);
  14.     if (pos == std::string::npos)
  15.         return chOther + xpath + chOther;
  16.  
  17.     // The string has both quotation characters.  We need to use concat()
  18.     // to form the string.
  19.     std::stringstream ss;
  20.     ss << "concat("
  21.        << chOther
  22.        << xpath.substr(0, pos)
  23.        << chOther;
  24.     do {
  25.         chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  26.         std::string::size_type pos2 = xpath.find(chOther, pos + 1);
  27.         ss << ‘,’
  28.            << chOther
  29.            << xpath.substr(pos, pos2 - pos)
  30.            << chOther;
  31.         pos = pos2;
  32.     } while (pos != std::string::npos);
  33.     ss << ")";
  34.  
  35.     return ss.str();
  36. }

Usage looks like:

  1. std::string lastName = …; // May come from user input
  2. std::string xpath = "//Customer[LastName = " +
  3.     QuoteXPathString(lastName) + "]";

More XPath Tricks

XPath No Comments »

Consider, once again, the XML file from yesterday’s post Selecting A Maximum Value Using XPath:

<Prices>
  <Price>
    <Date>2006-09-01</Date>
    <Open>25.89</Open>
    <High>25.97</High>
    <Low>25.64</Low>
    <Close>25.84</Close>
    <Volume>31594600</Volume>
    <AdjClose>25.84</AdjClose>
  </Price>
  <Price>
    <Date>2006-08-31</Date>
    <Open>25.87</Open>
    <High>25.98</High>
    <Low>25.68</Low>
    <Close>25.70</Close>
    <Volume>26380500</Volume>
    <AdjClose>25.70</AdjClose>
  </Price>
  ...
</Prices>

If you cannot assume the data are sorted, the types of XPath 1.0 expressions you can write are deeply limited; as we saw last time, we couldn’t even write an expression which selects the Price element with the latest Date. However, if you can assume the data are sorted in latest-first order, you can write some fairly useful expressions:

Select the Price element with the latest Date:

This is the first Price element, so the expression is:

Prices/Price[1]
Select the Price element with the earliest Date:

This is (obviously) the last Price element, so the expression is:

/Prices/Price[last()]
Select all Price elements from the last trading day of the month:

Normally this is fairly complicated because stock markets aren’t usually open on weekdays or holidays. However, because the Price elements are assumed to be in latest-first order, we can simply look for the (physically) first Price element for a month. While this may not work for all XPath parsers, the following expression works on my machine:

/Prices/Price[substring(preceding-sibling::Price[1]/Date, 1, 7) != substring(Date, 1, 7)]

This reads “Select all Price elements whose Date element have a different year and month from the previous Price element’s Date.”

This expression may not work on all XPath parsers because it assumes that the first Price element in the preceding-sibling axis is the immediate predecessor — in other words, the preceding-sibling axis iterates backwards. I do not know if this is guaranteed by the standard, but I can see that it might commonly be implemented this way.

Furthermore, while this expression correctly selects the <Date>2006-09-01</Date> element as part of the result, it does so for subtle reasons. It might be clearer to explicitly test for the number of elements in the preceding-sibling axis using count().

Selecting A Maximum Value Using XPath

XPath, XSLT No Comments »

Let’s say you have an XML file which contains daily stock prices, such as the following:

<Prices>
  <Price>
    <Date>2006-09-01</Date>
    <Open>25.89</Open>
    <High>25.97</High>
    <Low>25.64</Low>
    <Close>25.84</Close>
    <Volume>31594600</Volume>
    <AdjClose>25.84</AdjClose>
  </Price>
  <Price>
    <Date>2006-08-31</Date>
    <Open>25.87</Open>
    <High>25.98</High>
    <Low>25.68</Low>
    <Close>25.70</Close>
    <Volume>26380500</Volume>
    <AdjClose>25.70</AdjClose>
  </Price>
  ...
</Prices>

Excerpt from MSFT.xml generated on 2006-09-05 from Yahoo Finance’s MSFT Historical Prices and YahooCsvToXml.py

Now let’s write an XSLT fragment which displays the Price element with the latest Date:

<xsl:for-each select="/Prices/Price">
  <xsl:sort select="Date" order="descending" />

  <xsl:if test="position() = 1">
    <xsl:copy-of select="." />
  </xsl:if>
</xsl:for-each>

What if you wanted to do this in pure XPath 1.0? Well, normally one would use something akin to Jeni Tennison’s XPath maximum ‘trick’ and write the following XPath expression:

/Prices/Price[not(preceding-sibling::Price/Date > Date or
                  following-sibling::Price/Date > Date)]

This expression reads “Select the Price element that doesn’t have a sibling Price element with a Date greater than this one.” (By the way, you should be careful with this XPath expression and large node sets — it is highly likely it runs in O(n2) time.)

Unfortunately the above expression doesn’t work for dates because XPath’s comparison operators only work on numbers, not strings. I tried writing the equivalent expression using Microsoft’s ms:string-compare XPath extension function but it didn’t work — I believe because it only compares two strings whereas the expression requires a function that compares a node-set to a string and returns a node-set.

As far as I can tell, the only way to perform this selection in pure XPath 1.0 is to change the original XML by converting the Date values to numbers (by removing the dashes). Hopefully XPath 2.0 will have a more palatable solution.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in