Escaping Strings in XPath 1.0

C++, XPath No Comments »

XPath is a language for selecting nodes from an XML document. XPath is used extensively in XSLT and other XML technologies. I also vastly prefer using XPath (e.g. with XPathNavigator) over the XML DOM when manipulating XML in a non-streaming fashion.

In XPath, strings must be delimited by either single or double quotes. Given a quote character used to delimit a string, one can’t represent that same quote character within the string. This means that if you decide to use single quotes to delimit your XPath string, you couldn’t represent the string O'Reilly; use double quotes, and you can’t represent "Hello".

However, given a quote delimiter, you can represent the other quote character. We can use this observation along with the concat XPath function to devise a general quoting rule for XPath strings. It’s easiest to show this via a series of examples:

Original String Quoted XPath String
a 'a' (or "a")
O'Reilly "O'Reilly"
"Hello" '"Hello"'
"Hello, Mr. O'Reilly" concat('"Hello, Mr. O', "'Reilly", '"')

Below is a piece of C++ code which implements these quotation rules:

  1. std::string
  2. QuoteXPathString(const std::string& xpath)
  3. {
  4.     // If we don’t have any single or double-quote characters, quote the
  5.     // expression in single quotes.
  6.     std::string::size_type pos = xpath.find_first_of("’\"");
  7.     if (pos == std::string::npos)
  8.         return "’" + xpath + "’";
  9.  
  10.     // If we cannot find the alternate quotation character, quote the
  11.     // expression in the alternate quotation character.
  12.     char chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  13.     pos = xpath.find(chOther, pos + 1);
  14.     if (pos == std::string::npos)
  15.         return chOther + xpath + chOther;
  16.  
  17.     // The string has both quotation characters.  We need to use concat()
  18.     // to form the string.
  19.     std::stringstream ss;
  20.     ss << "concat("
  21.        << chOther
  22.        << xpath.substr(0, pos)
  23.        << chOther;
  24.     do {
  25.         chOther = (xpath[pos] == ‘"’ ? \’ : ‘"’);
  26.         std::string::size_type pos2 = xpath.find(chOther, pos + 1);
  27.         ss << ‘,’
  28.            << chOther
  29.            << xpath.substr(pos, pos2 - pos)
  30.            << chOther;
  31.         pos = pos2;
  32.     } while (pos != std::string::npos);
  33.     ss << ")";
  34.  
  35.     return ss.str();
  36. }

Usage looks like:

  1. std::string lastName = …; // May come from user input
  2. std::string xpath = "//Customer[LastName = " +
  3.     QuoteXPathString(lastName) + "]";

STL objects and module boundaries

STL, Win32 3 Comments »

Let’s say you have the following function:

  1. void AppendChar(std::string& s, char ch)
  2. {
  3.     s += ch;
  4. }

What happens if this function is exported as an ordinal function from a DLL (not an inlined piece of code inside a header) and you call it from an EXE?
Read the rest of this entry »

Converting C++ Member Functions into Function Objects

C++ 2 Comments »

Let’s say you have a C++ function that takes a function object as a parameter and calls it:

  1. template <typename _Fn>
  2. void call_functor(_Fn fn)
  3. {
  4.     fn();
  5. }

Now let’s say you want to pass a class’s member function to call_functor() above, as in:

  1. class C
  2. {
  3.     void foo() { std::cout << "foo()\n"; }
  4. };
  5.  
  6. C c;
  7. call_functor(/* What do I put here? c.foo and &C::foo don’t work */);

The STL has a pointer-to-member function adapter called std::mem_fun() which almost gets us there. Unfortunately, it doesn’t quite meet our needs because it requires us to pass a pointer to an instance of C, as in:

  1. C c;
  2. std::mem_fun(&C::foo)(&c); // The &amp;c is the problem

However, we can use std::mem_fun() if we could figure out a way to create a new functor around std::mem_fun() with &c bound as its first parameter. Unfortunately, we cannot use the STL binders (std::bind1st and std::bind2nd) because they only work on binary functions, not unary functions.

In the general case, you should use Boost’s very powerful binding library bind. However, let’s write our own simple binder for expository purposes.

First, we need a function, bind(), that returns a function object which binds a parameter, p1, to a unary function object, func. We’ll call the returned function object binder.

  1. template <typename _Func, typename _P1>
  2. inline binder<_Func, _P1> bind(_Func func, _P1 p1)
  3. {
  4.     return binder<_Func, _P1>(func, p1);
  5. }

The class binder should store func and p1 and have an operator() which calls func with p1 as its parameter. For simplicity we’ll assume func returns void:

  1. template <typename _Func, typename _P1>
  2. class binder
  3. {
  4. public:
  5.     binder(_Func func, _P1 p1) :
  6.         func_(func), p1_(p1) {}
  7.     void operator()() const { return func_(p1_); }
  8.  
  9. private:
  10.     _Func func_; // The functor to apply
  11.     _P1 p1_; // The first paramter
  12. };

We can now solve the initial problem by combining our bind() with std::mem_fun():

  1. call_functor(bind(std::mem_fun(&C::foo), &c));

We can make usage a little more convenient by introducing a macro:

  1. #define mem_fun_functor(c, memFn) bind(std::mem_fun(memFn), &c)
  2.  
  3. call_functor(mem_fun_functor(c, &C::foo));

There’s plenty of room for improvements, but it’s amazing what you can do with a little template trickery.

Generating and Parsing Localized Numbers In Windows

C++, Win32 2 Comments »

While Windows supports dozens or even hundreds of languages, its localization APIs require quite a bit of getting used to. Below is how I solved some common problems related to formatting and parsing a number for a specific locale.

Formatting a Number for a Locale

The function GetNumberFormat() formats a number for a particular locale. Its simplest usage looks something like:

  1. #define ARRAYSIZE(x) ( sizeof(x) / sizeof(x[0]) )
  2.  
  3. TCHAR buf[80];
  4. int ret = GetNumberFormat
  5.     (
  6.     LOCALE_USER_DEFAULT, // locale
  7.     0,                   // dwFlags
  8.     TEXT("1234567.89"),  // lpValue
  9.     NULL,                // lpFormat
  10.     buf,                 // lpNumberStr
  11.     ARRAYSIZE(buf)       // cchNumber
  12.     );
  13. ASSERT(ret != 0);

buf now contains the number 1234567.89 formatted for the user’s default locale. For example, for the English-United States locale, buf will contain “1,234,567.89″; for German-Germany, “1.234.567,89″; for Hindi, “12,34,567.89″.

The format of the lpValue parameter is important. From GetNumberFormat()’s MSDN documentation:

lpValue

[in] Pointer to a null-terminated string containing the number string to format. This string can only contain the following characters. All other characters are invalid. The function returns an error if the string indicated by lpValue deviates from these rules.

  • Characters ‘0′ through ‘9′.
  • One decimal point (dot) if the number is a floating-point value.
  • A minus sign in the first character position if the number is a negative value.

Given these constraints, I’ve found the easiest way to convert, say, a double to a string for use as lpValue is to use StringCchPrintf() (or, equivalently, wnsprintf() or _sntprintf()), as in:

  1. int GetNumberFormatDbl(LCID locale, DWORD dwFlags, double value,
  2.                        const NUMBERFMT* lpFormat, LPTSTR lpNumberStr,
  3.                        int cchNumber)
  4. {
  5.     // DBL_MAX is 1.7976931348623158e+308 and 317 characters
  6.     // (including null terminator)
  7.     TCHAR szBuf[317];
  8.     HRESULT hr = StringCchPrintf(szBuf, ARRAYSIZE(szBuf),
  9.                                  TEXT("%lf"), value);
  10.     if (hr != S_OK)
  11.     {
  12.         SetLastError(ERROR_INVALID_PARAMETER);
  13.         return 0;
  14.     }
  15.     return GetNumberFormat(locale, dwFlags, szBuf, lpFormat,
  16.                            lpNumberStr, cchNumber);
  17. }

One caveat: GetNumberFormatDbl() does not deal well with very small numbers (below 1e-5 or so).

Parsing a Localized Number String

I spent a lot of time trying to figure out the best way to parse a localized number string until Michael Kaplan mentioned VariantChangeTypeEx(). Once I had that, the rest was (relatively) easy:

  1. // Convert szStr to a BSTR.  Returns NULL on failure.  Result must be
  2. // freed with SysFreeString.
  3. BSTR TstrToBstr(LPCTSTR szStr)
  4. {
  5. #if defined(UNICODE)
  6.     return SysAllocString(szStr);
  7. #else
  8.     BSTR bstrRet = NULL;
  9.     int cch = MultiByteToWideChar(CP_ACP, 0, szStr, -1, NULL, 0);
  10.     if (cch != 0)
  11.     {
  12.         WCHAR* pswz = new WCHAR[cch];
  13.         cch = MultiByteToWideChar(CP_ACP, 0, szStr, -1, pswz, cch);
  14.         if (cch != 0)
  15.         {
  16.             bstrRet = SysAllocString(pswz);
  17.         }
  18.         delete[] pswz;
  19.     }
  20.  
  21.     return bstrRet;
  22. #endif
  23. }
  24.  
  25. // Converts the localized number string szNumber to a double using the
  26. // given locale.  Returns TRUE and sets *pVal on success, FALSE
  27. // otherwise.
  28. BOOL LocalizedStrToDbl(LCID lcid, LPCTSTR szNumber, double* pVal)
  29. {
  30.     BOOL bSuccess = FALSE;
  31.  
  32.     // Set out parameter regardless
  33.     *pVal = 0;
  34.  
  35.     BSTR bstr = TstrToBstr(szNumber);
  36.     if (bstr != NULL)
  37.     {
  38.         VARIANT var;
  39.         VariantInit(&var);
  40.         // bstr will be freed on VariantClear
  41.         var.bstrVal = bstr;
  42.         var.vt = VT_BSTR;
  43.  
  44.         HRESULT hr = VariantChangeTypeEx(&var, &var, lcid, 0, VT_R8);
  45.         if (hr == S_OK)
  46.         {
  47.             *pVal = var.dblVal;
  48.             bSuccess = TRUE;
  49.         }
  50.  
  51.         VariantClear(&var);
  52.     }
  53.  
  54.     return bSuccess;
  55. }

Using VarR8FromStr() instead of VariantChangeTypeEx() is also an option.

Customizing of the Output of GetNumberFormat()

If you pass NULL as the lpFormat parameter to GetNumberFormat(), you use the locale’s default number formatting information. I often find this to be unacceptable — for example, many times I want to control the number of fractional digits I display. To do this, you need to provide a filled-in NUMBERFMT structure to GetNumberFormat().

I suggest starting with the locale’s default NUMBERFMT and then change only the members you require. Because Windows does not seem to provide a way to retrieve a locale’s default NUMBERFMT, we’ll have to roll our own.

To populate the members of NUMBERFMT we are going to use the function GetLocaleInfo(). The map between NUMBERFMT members and LCTYPEs to pass to GetLocaleInfo() is as follows:

NUMBERFMT Member LCTYPE Constant
NumDigits LOCALE_IDIGITS
LeadingZero LOCALE_ILZERO
Grouping LOCALE_SGROUPING
lpDecimalSep LOCALE_SDECIMAL
lpThousandSep LOCALE_STHOUSAND
NegativeOrder LOCALE_INEGNUMBER

GetLocaleInfo() always returns strings, but many of these strings need to be converted to UINTs. Furthermore, the conversion between the LOCALE_SGROUPING string and the Grouping member is quite tricky; read How to fill in that number grouping member of NUMBERFMT for more information.

We now have enough information to write the function to retrieve a locale-default NUMBERFMT:

  1. // Converts a grouping string returned by
  2. // GetLocaleInfo(LOCALE_SGROUPING) into a UINT understood by NUMBERFMT.
  3. UINT GroupingStrToUint(LPCTSTR szGrouping)
  4. {
  5.     LPCTSTR szCurr = szGrouping;
  6.     UINT ret = 0;
  7.  
  8.     while (true)
  9.     {
  10.         ret *= 10;
  11.         if (*szCurr == TEXT(\\0′))
  12.             break;
  13.  
  14.         TCHAR* pch;
  15.         ret += _tcstol(szCurr, &pch, 10);
  16.  
  17.         if (_tcscmp(pch, TEXT(";0")) == 0)
  18.             break;
  19.  
  20.         szCurr = pch + 1;
  21.     }
  22.  
  23.     return ret;
  24. }
  25.  
  26. // Fills the default NUMBERFMT structure for a given locale.
  27. // pFmt->lpDecimalSep and pFmt->lpThousandSep must point to valid
  28. // buffers of size cchDecimalSep and cchThousandSep respectively.
  29. BOOL GetDefaultNumberFmt(LCID lcid, NUMBERFMT* pFmt, int cchDecimalSep,
  30.                          int cchThousandSep)
  31. {
  32.     TCHAR szBuf[80];
  33.  
  34.     int ret = ::GetLocaleInfo(lcid, LOCALE_IDIGITS, szBuf,
  35.                               ARRAYSIZE(szBuf));
  36.     if (ret == 0)
  37.         return FALSE;
  38.     pFmt->NumDigits = _tcstol(szBuf, NULL, 10);
  39.  
  40.     ret = ::GetLocaleInfo(lcid, LOCALE_ILZERO, szBuf, ARRAYSIZE(szBuf));
  41.     if (ret == 0)
  42.         return FALSE;
  43.     pFmt->LeadingZero = _tcstol(szBuf, NULL, 10);
  44.  
  45.     ret = ::GetLocaleInfo(lcid, LOCALE_SGROUPING, szBuf,
  46.                           ARRAYSIZE(szBuf));
  47.     if (ret == 0)
  48.         return FALSE;
  49.     pFmt->Grouping = GroupingStrToUint(szBuf);
  50.  
  51.     ret = ::GetLocaleInfo(lcid, LOCALE_SDECIMAL, pFmt->lpDecimalSep,
  52.                           cchDecimalSep);
  53.     if (ret == 0)
  54.         return FALSE;
  55.  
  56.     ret = ::GetLocaleInfo(lcid, LOCALE_STHOUSAND, pFmt->lpThousandSep,
  57.                           cchThousandSep);
  58.     if (ret == 0)
  59.         return FALSE;
  60.  
  61.     ret = ::GetLocaleInfo(lcid, LOCALE_INEGNUMBER, szBuf,
  62.                           ARRAYSIZE(szBuf));
  63.     if (ret == 0)
  64.         return FALSE;
  65.     pFmt->NegativeOrder = _tcstol(szBuf, NULL, 10);
  66.  
  67.     return TRUE;
  68. }

Now that we have these functions, we can use them to better control the output from GetNumberFormat(), as in:

  1. // Converts the double value to a localized string for the specified
  2. // locale with the given number of fractional digits.
  3. BOOL DblToLocalizedStr(LCID lcid, double value, int nDigits,
  4.                        LPTSTR szStr, int cchStr)
  5. {
  6.     // Get locale-default NUMBERFMT
  7.     TCHAR szDecimalSep[5];
  8.     TCHAR szThousandSep[5];
  9.  
  10.     NUMBERFMT fmt;
  11.     fmt.lpDecimalSep = szDecimalSep;
  12.     fmt.lpThousandSep = szThousandSep;
  13.     if (!GetDefaultNumberFmt(lcid, &fmt, ARRAYSIZE(szDecimalSep),
  14.                              ARRAYSIZE(szThousandSep)))
  15.         return FALSE;
  16.  
  17.     // Override the NumDigits member of NUMBERFMT
  18.     fmt.NumDigits = nDigits;
  19.  
  20.     // Format the number with the custom NUMBERFMT
  21.     int ret = GetNumberFormatDbl(lcid, 0, value, &fmt, szStr, cchStr);
  22.     return (ret != 0);
  23. }

STL Map Use

STL 2 Comments »

What’s wrong with the following code?

  1. template<typename T1, typename T2>
  2. struct my_pair
  3. {
  4.     typedef T1 first_type;
  5.     typedef T2 second_type;
  6.  
  7.     my_pair() : first(T1()), second(T2()) {}
  8.     my_pair(const T1& v1, const T2& v2) : first(v1), second(v2) {}
  9.  
  10.     T1 first;
  11.     T2 second;
  12. };
  13.  
  14. template<typename T1, typename T2>
  15. inline bool operator<
  16.     (
  17.     const my_pair<T1, T2>& x,
  18.     const my_pair<T1, T2>& y
  19.     )
  20. {
  21.     return (x.first < y.first || x.second < y.second);
  22. }
  23.  
  24. void f()
  25. {
  26.     typedef my_pair<…, …> key_type;
  27.     typedefvalue_type;
  28.     typedef std::map<key_type, value_type> map_type;
  29.  
  30.     map_type map;
  31.     // Use map
  32. }

Read the rest of this entry »

STL Vector Use

STL 4 Comments »

I recently wrote a piece of code that looked something like the following:

  1. static const int NUM_TOTAL_VALUES = …;
  2. typedefT;
  3.  
  4. // Create vec and reserve NUM_TOTAL_VALUES spaces for later insertion
  5. std::vector<T> vec(NUM_TOTAL_VALUES);
  6.  
  7. // Insert values into vec
  8. for (int i = 0; i != NUM_TOTAL_VALUES; ++i)
  9.     vec.push_back();
  10.  
  11. // vec should now have NUM_TOTAL_VALUES values in it (but doesn’t!)

What’s wrong with this code?

Read the rest of this entry »

Managed Wrappers and Hidden Interdependencies

C#, C++ 2 Comments »

Let’s say you have the following unmanaged code:

  1. #pragma unmanaged
  2.  
  3. class Stream {}; // Conceptual stream class
  4.  
  5. class StreamWriter
  6. {
  7. public:
  8.     StreamWriter(Stream* pStream) : m_pStream(pStream) {}
  9.     ~StreamWriter() { /* Use m_pStream in some way */ }
  10.  
  11.     …
  12. private:
  13.     Stream* m_pStream;
  14. };
  15.  
  16. void f()
  17. {
  18.     Stream stream;
  19.     StreamWriter streamWriter(&stream);
  20.  
  21.     // Use streamWriter
  22.  
  23.     // streamWriter is destroyed
  24.     // stream is destroyed
  25. }

Note that StreamWriter’s destructor uses m_pStream (perhaps by flushing the stream). This means that the order of destruction is importantStreamWriter must be destroyed before its underlying Stream is.

Now let’s try to write and use some simple managed C++ wrappers for these classes:

  1. #pragma managed
  2.  
  3. public __gc class ManagedStream
  4. {
  5. public:
  6.     ManagedStream() : m_pStream(new Stream) {}
  7.  
  8.     // NOTE: This is a finalizer, not a determinstic destructor  
  9.     ~ManagedStream() { delete m_pStream; }
  10.  
  11. public private: // Make accessible by ManagedStreamWriter
  12.     Stream __nogc* m_pStream;
  13. };
  14.  
  15. public __gc class ManagedStreamWriter
  16. {
  17. public:
  18.     ManagedStreamWriter(ManagedStream* pStream) :
  19.         m_pStreamWriter(new StreamWriter(pStream->m_pStream)) {}
  20.  
  21.     // NOTE: This is a finalizer, not a determinstic destructor  
  22.     ~ManagedStreamWriter() { delete m_pStreamWriter; }
  23.  
  24. private:
  25.     StreamWriter __nogc* m_pStreamWriter;
  26. };
  27.  
  28. void f()
  29. {
  30.     ManagedStream stream = __gc new ManagedStream();
  31.     ManagedStreamWriter streamWriter =
  32.         __gc new ManagedStreamWriter(stream);
  33.  
  34.     // Use streamWriter
  35.  
  36.     // GC will clean up stream and streamWriter
  37. }

See the problem?

Read the rest of this entry »

Smart Pointers Aren’t Quite Smart Enough…

C++, COM No Comments »

What’s wrong with this code? (It is adapted from something I wrote yesterday)

  1. void f()
  2. {
  3.     HRESULT hr;
  4.  
  5.     hr = ::CoInitialize(NULL);
  6.     if (SUCCEEDED(hr)) {
  7.         MSXML2::IMXWriterPtr spMXWriter;
  8.         hr = spMXWriter.CreateInstance(__uuidof(MSXML2::MXXMLWriter30));
  9.         if (SUCCEEDED(hr)) {
  10.             // Use spMXWriter
  11.         }
  12.  
  13.         ::CoUninitialize();
  14.     }
  15. }

Read the rest of this entry »

Writing Streaming XML Using MSXML

C++, COM, Win32, XML No Comments »

In my posts Implementing IXmlWriter Series, I wrote a streaming XML writing class whose interface is based on .NET’s XmlWriter. I recently discovered that MSXML provides its own method to write streaming XML through the class MXXMLWriter.

MXXMLWriter supports a large set of functionality including encoding, indentation, disabling output escaping, and writing XML fragments. The generated XML can be written to an IStream, a BSTR, or a DOMDocument object. However, it’s interface leaves much to be desired. Usage looks like this:

  1. #import <msxml3.dll>
  2.  
  3.  
  4. // I’m using the #import-generated _com_ptr_t-based smart pointers
  5. MSXML2::IMXWriterPtr spMXWriter;
  6. hr = spMXWriter.CreateInstance(__uuidof(MSXML2::MXXMLWriter30));
  7. _ASSERT(SUCCEEDED(hr)); // TODO
  8.  
  9. // Configure the IMXWriter as appropriate.  We will be using the default of
  10. // writing to a BSTR which can be retrieved using spMXWriter->get_output().
  11.  
  12. MSXML2::ISAXContentHandlerPtr spSAXContentHandler(spMXWriter);
  13. _ASSERT(spSAXContentHandler != NULL); // TODO
  14.  
  15. // Be sure to check the hrs below
  16.  
  17. hr = spSAXContentHandler->startDocument();
  18. hr = spSAXContentHandler->startElement(L"", 0, L"root", 4, L"root", 4, NULL);
  19. hr = spSAXContentHandler->characters(L"text", 4);
  20. // endElement also takes the element name.  This means we may need to
  21. // maintain our own open element stack.
  22. hr = spSAXContentHandler->endElement(L"", 0, L"root", 4, L"root", 4);
  23. hr = spSAXContentHandler->endDocument();

The rough IXmlWriter equivalent is:

  1. #include "StringXmlWriter.h"
  2.  
  3.  
  4. StringXmlWriter xw;
  5. xw.WriteStartDocument();
  6.   xw.WriteStartElement("root");
  7.     xw.WriteString("text");
  8.   xw.WriteEndElement(); // /root
  9. xw.WriteEndDocument();

However, there might be a case to change IXmlWriter to use MXXMLWriter internally.

Exception Handling: A False Sense Of Security

C++, Error Handling No Comments »

Long-time readers know that I have a bit of a penchant for error handling, especially with respect to exceptions. I just noticed that I have never, to my knowledge, posted about the classic article “Exception Handling: A False Sense of Security” by Tom Cargill.

Read it and weep.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in