XPath and Escaping

XML Add comments

Let’s say you have the following XML structure:

<surnames>
  <surname>Casey</surname>
  <surname>D'Oronzo</surname>
  <surname>Engelhardt</surname>
</surnames>

Now let’s say you are writing a function in JavaScript to retrieve a surname node given its text. You might write something like:

function GetNodeWithSurname(docSurnames, strSurname)
{
    return docSurnames.selectSingleNode("/surnames/surname[text() = '" + strSurname + "']");
}

What will happen if you pass the parameter D'Oronzo to this function? The XPath expression within selectSingleNode() will become /surnames/surname[text() = 'D'Oronzo']. Note how the single quote within the parameter has invalidated the XPath expression. This is a classic escaping problem. Improper escaping is the reason why so many web forms break when they encounter single quotation marks, as any Irishman may attest to.

According to the MSDN Magazine Web Q&A article Web Q&A: XPath, XML Notepad, Data Islands, Case Sensitivity, XSL, and More, XPath expressions allow the use of as an escape character. Therefore, to fix GetNodeWithSurname(), all one has to do is replace ' with ', right? Well, not exactly. You also need to escape backslashes within strSurname as well.

To simplify this, I wrote the following simple JavaScript function to perform escaping for me:

function EscapeXPathString
    (
    /* string */ str
    )
{
    var reBackslash = //g;
    var reSingleQuote = /'/g;
    var reDoubleQuote = /"/g;

    var strResult = str;
    strResult = strResult.replace(reBackslash, "");
    strResult = strResult.replace(reSingleQuote, "'");
    strResult = strResult.replace(reDoubleQuote, """);
    return strResult;
}

The large number of s are to deal with the fact that JavaScript also uses to escape characters. I escaped " in addition to ' and to properly handle the case when the user writes an XPath expression that uses " instead of ' to encase the parameter. Now GetNodeWithSurname() becomes:

function GetNodeWithSurname(docSurnames, strSurname)
{
    return docSurnames.selectSingleNode("/surnames/surname[text() = '" + EscapeXPathString(strSurname) + "']");
}

I am unaware if this method of escaping with is a standard part of XPath or a feature of Microsoft’s XML parser.

Proper character escaping is something a programmer must always worry about. For example, when writing the code samples above, I had to write &lt; to insert a <. In fact, just now I had to write &amp;lt; to represent &lt;. Wait, now I’m writing &amp;amp;lt; to represent &amp;lt;. And on and on it goes…

Comments are closed.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in