Consider, once again, the XML file from yesterday’s post Selecting A Maximum Value Using XPath:
<Prices>
<Price>
<Date>2006-09-01</Date>
<Open>25.89</Open>
<High>25.97</High>
<Low>25.64</Low>
<Close>25.84</Close>
<Volume>31594600</Volume>
<AdjClose>25.84</AdjClose>
</Price>
<Price>
<Date>2006-08-31</Date>
<Open>25.87</Open>
<High>25.98</High>
<Low>25.68</Low>
<Close>25.70</Close>
<Volume>26380500</Volume>
<AdjClose>25.70</AdjClose>
</Price>
...
</Prices>
If you cannot assume the data are sorted, the types of XPath 1.0 expressions you can write are deeply limited; as we saw last time, we couldn’t even write an expression which selects the Price element with the latest Date. However, if you can assume the data are sorted in latest-first order, you can write some fairly useful expressions:
- Select the
Priceelement with the latestDate: -
This is the first
Priceelement, so the expression is:Prices/Price[1]
- Select the
Priceelement with the earliestDate: -
This is (obviously) the last
Priceelement, so the expression is:/Prices/Price[last()]
- Select all
Priceelements from the last trading day of the month: -
Normally this is fairly complicated because stock markets aren’t usually open on weekdays or holidays. However, because the
Priceelements are assumed to be in latest-first order, we can simply look for the (physically) firstPriceelement for a month. While this may not work for all XPath parsers, the following expression works on my machine:/Prices/Price[substring(preceding-sibling::Price[1]/Date, 1, 7) != substring(Date, 1, 7)]
This reads “Select all
Priceelements whoseDateelement have a different year and month from the previousPriceelement’sDate.”This expression may not work on all XPath parsers because it assumes that the first
Priceelement in thepreceding-siblingaxis is the immediate predecessor — in other words, thepreceding-siblingaxis iterates backwards. I do not know if this is guaranteed by the standard, but I can see that it might commonly be implemented this way.Furthermore, while this expression correctly selects the
<Date>2006-09-01</Date>element as part of the result, it does so for subtle reasons. It might be clearer to explicitly test for the number of elements in thepreceding-siblingaxis usingcount().
Recent Comments