More XPath Tricks

XPath Add comments

Consider, once again, the XML file from yesterday’s post Selecting A Maximum Value Using XPath:

<Prices>
  <Price>
    <Date>2006-09-01</Date>
    <Open>25.89</Open>
    <High>25.97</High>
    <Low>25.64</Low>
    <Close>25.84</Close>
    <Volume>31594600</Volume>
    <AdjClose>25.84</AdjClose>
  </Price>
  <Price>
    <Date>2006-08-31</Date>
    <Open>25.87</Open>
    <High>25.98</High>
    <Low>25.68</Low>
    <Close>25.70</Close>
    <Volume>26380500</Volume>
    <AdjClose>25.70</AdjClose>
  </Price>
  ...
</Prices>

If you cannot assume the data are sorted, the types of XPath 1.0 expressions you can write are deeply limited; as we saw last time, we couldn’t even write an expression which selects the Price element with the latest Date. However, if you can assume the data are sorted in latest-first order, you can write some fairly useful expressions:

Select the Price element with the latest Date:

This is the first Price element, so the expression is:

Prices/Price[1]
Select the Price element with the earliest Date:

This is (obviously) the last Price element, so the expression is:

/Prices/Price[last()]
Select all Price elements from the last trading day of the month:

Normally this is fairly complicated because stock markets aren’t usually open on weekdays or holidays. However, because the Price elements are assumed to be in latest-first order, we can simply look for the (physically) first Price element for a month. While this may not work for all XPath parsers, the following expression works on my machine:

/Prices/Price[substring(preceding-sibling::Price[1]/Date, 1, 7) != substring(Date, 1, 7)]

This reads “Select all Price elements whose Date element have a different year and month from the previous Price element’s Date.”

This expression may not work on all XPath parsers because it assumes that the first Price element in the preceding-sibling axis is the immediate predecessor — in other words, the preceding-sibling axis iterates backwards. I do not know if this is guaranteed by the standard, but I can see that it might commonly be implemented this way.

Furthermore, while this expression correctly selects the <Date>2006-09-01</Date> element as part of the result, it does so for subtle reasons. It might be clearer to explicitly test for the number of elements in the preceding-sibling axis using count().

Comments are closed.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in