Integrating HTTP Basic Authentication with MSXML ISAXXMLReader

C++ No Comments »

At Morningstar, one of my main responsibilities is maintaining and updating the internal Principia production process which involves parsing XML retrieved over HTTP. The HTTP server is changing to require HTTP Basic Authentication, so I had to change the production process accordingly.

With .NET, the changes would be basically trivial - all I would have to do would be to set the WebRequest.Credentials property. The code to load the XML would look something like:

public static XPathNavigator LoadXPath(Uri url, int timeoutMs, ICredentials credentials)
{
    WebRequest request = WebRequest.Create(url);
    request.Timeout = timeoutMs;
    if (credentials != null)
        request.Credentials = credentials;

    WebResponse response = request.GetResponse();
    using (Stream responseStream = response.GetResponseStream())
    {
        XPathDocument doc = new XPathDocument(responseStream);
        XPathNavigator nav = doc.CreateNavigator();
        return nav;
    }
}

However, the production process is written in C++ and uses MSXML’s SAX support through the ISAXXMLReader interface. Previously, retrieving the data from the web server was pretty simple — all I had to do was to call the ISAXXMLReader::parseURL() method. However, this function doesn’t support HTTP Basic Authentication.1 Therefore, I decided to use the ISAXXMLReader::parse() method and try to pass it the most optimal paramater possible: an IStream or ISequentialStream interface pointed at the response from the appropriate, authorized HTTP request.

First, I tried using an IXMLHTTPRequest object, as IXMLHTTPRequest supports authentication using the IXMLHTTPRequest::open() method. This worked, but I eventually decided to move away from it because it didn’t give me full control over the HTTP request, didn’t easily allow me to force circumvention of the HTTP cache, and gave me fewer optimization opportunities (such as HTTP KeepAlive support).

Next, I investigated the Windows HTTP Services (WinHTTP) library but I decided against it because it isn’t ubiqutous across all operating systems and wasn’t supported by the Platform SDK which ships with Visual Studio 6.0.

This led me to finally settle on using the Windows Internet (WinInet) library. This was my first serious use of WinInet, and it seems quite complete, powerful, and very verbose and difficult to work with. The remainder of this post describes how to get an ISequentialStream from a URL using HTTP Basic Authentication over WinInet. I have mostly elided error handling and resource cleanup for brevity.

The first step to using WinInet is to call the InternetOpen() function. There is nothing particularly tricky here, besides remembering that some web servers act differently based on the HTTP agent.

The first method to send the HTTP request I tried was InternetOpenUrl(). However, as I found out from the MSDN WinInet article Handling Authentication, it seems you cannot use this function with HTTP Basic Authentication. Instead, you have to use the following functions in order:

  1. InternetCrackUrl() to break up a URL into its components, as the following functions require them to be separate.
  2. InternetConnect() to create a connected session to the remote host.
  3. HttpOpenRequest() to create a HTTP request on that session.
  4. InternetSetOption() using the option flags INTERNET_OPTION_USERNAME and INTERNET_OPTION_PASSWORD to set the credentials on the request.
  5. HttpSendRequest() to send the request to the remote web server.

Finally, you need to write a COM object that implements ISequentialStream and forwards the calls to InternetReadFile() (and InternetWriteFile() for completeness’s sake). Fortunately, this is very easy using the Active Template Library (ATL) — see the CInternetFileStream object below.

Another function that you may find useful is HttpQueryInfo(), which allows you to determine information about the query and its response. Especially of interest may be the flag HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_STATUS_CODE as it allows you to determine the status code of the response.

Integrating these steps together, the code (sans real error handling and cleanup) might look something like:

typedef std::basic_string<TCHAR> tstring;

// Implements the COM IStream interface on a HINTERNET file handle opened
// with a function such as ::InternetOpenUrl().  Be sure that the corresponding
// session and internet HINTERNET handles (where applicable) *also* stay open
// until this object is finished.
class ATL_NO_VTABLE CInternetFileStream :
    public CComObjectRootEx<CComSingleThreadModel>,
    public CComCoClass<CInternetFileStream>,
    public ISequentialStream
{
private:
    HINTERNET m_hFile;

public:
    DECLARE_NOT_AGGREGATABLE(CInternetFileStream)

    BEGIN_COM_MAP(CInternetFileStream)
        COM_INTERFACE_ENTRY(ISequentialStream)
    END_COM_MAP()

public:
    // For the hFile to be valid, the associated session and internet HINTERNET
    // handles must also stay open (not seen here).
    HRESULT Init(HINTERNET hFile)
    {
        m_hFile = hFile;
    }

// ISequentialStream
public:
    STDMETHOD(Read)(void* pv, ULONG cb, ULONG* pcbRead)
    {
        BOOL bSuccess = ::InternetReadFile(m_hFile, pv, cb, pcbRead);
        return bSuccess ? S_OK : E_FAIL;
    }

    STDMETHOD(Write)(void const* pv, ULONG cb, ULONG* pcbWritten)
    {
        BOOL bSuccess = ::InternetWriteFile(m_hFile, pv, cb, pcbWritten);
        return bSuccess ? S_OK : E_FAIL;
    }
};

void LoadUrlUsingAuth
    (
    LPCTSTR strURL,
    LPCTSTR strUserName,
    LPCTSTR strPassword,
    ISequentialStream** pData
    )
{
    BOOL bSuccess;
    HRESULT hr;

    *pData = NULL;

    HINTERNET hInternet = ::InternetOpen
        (
        _T("This is my HTTP Agent"),
        INTERNET_OPEN_TYPE_PRECONFIG,
        NULL,
        NULL,
        0
        );
    ASSERT(hInternet != NULL);

    // Break up the URL into its components.  To return specific
    // URL_COMPONENTS members, set their length to 1 (a 0 length
    // means the pointer will not be returned).  InternetCrackUrl
    // populates the URL_COMPONENTS structure with pointers to
    // within the provided string and their associated lengths, but
    // does not modify the string itself: this means that you don't
    // need to worry about freeing memory but the strings are
    // not NUL-terminated.
    URL_COMPONENTS urlComponents;
    ::memset(&urlComponents, 0, sizeof(urlComponents));
    urlComponents.dwStructSize = sizeof(urlComponents);
    urlComponents.dwHostNameLength = 1;
    urlComponents.dwUrlPathLength = 1;
    bSuccess = ::InternetCrackUrl(strURL, 0, 0, &urlComponents);
    ASSERT(bSuccess);

    // As InternetCrackUrl returns pointers to non-0-terminated strings
    // within the URL, copy the values into 0-terminated strings.
    tstring strHostName(urlComponents.lpszHostName, urlComponents.dwHostNameLength);
    tstring strUrlPath(urlComponents.lpszUrlPath, urlComponents.dwUrlPathLength);

    // Connect to remote host
    HINTERNET hConnection = ::InternetConnect
        (
        hInternet.Handle(),
        strHostName.c_str(),
        urlComponents.nPort,
        _T(""),
        _T(""),
        urlComponents.nScheme,
        0,
        NULL
        );
    ASSERT(hConnection != NULL);

    // The MIME types we will accept from the web server when sending
    // the request.
    static LPCTSTR lpszAcceptTypes[] =
    {
        _T("text/xml"),
        NULL
    };

    // Create a request that will be sent to the remote host, completely
    // bypassing the cache
    hRequest = ::HttpOpenRequest
        (
        hConnection,
        _T("GET"),
        strUrlPath.c_str(),
        NULL,
        NULL,
        lpszAcceptTypes,
        INTERNET_FLAG_NO_CACHE_WRITE | INTERNET_FLAG_NO_COOKIES |
            INTERNET_FLAG_RELOAD,
        NULL
        );
    ASSERT(hRequest != NULL);

    // Set the username and password for HTTP Basic Authentication
    bSuccess = ::InternetSetOption
        (
        hRequest,
        INTERNET_OPTION_USERNAME,
        const_cast<LPTSTR>(strUserName),
        _tcslen(strUserName)
        );
    ASSERT(bSuccess);

    bSuccess = ::InternetSetOption
        (
        hRequest,
        INTERNET_OPTION_PASSWORD,
        const_cast<LPTSTR>(strPassword),
        _tcslen(strPassword)
        );
    ASSERT(bSuccess);

    // Send the request.  hRequest is now the handle we should pass
    // to InternetReadFile() as it has the response data.
    bSuccess = ::HttpSendRequest(hRequest, NULL, 0, NULL, 0);
    ASSERT(bSuccess);

#if defined(DEBUG)
    // Retrieve the request's status code
    DWORD dwStatus;
    DWORD dwStatusSize = sizeof(dwStatus);
    BOOL bSuccess = ::HttpQueryInfo
        (
        hRequest,
        HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_STATUS_CODE,
        &dwStatus,
        &dwStatusSize,
        NULL
        );
    ASSERT(bSuccess);
#endif

    CComObject<CInternetFileStream>* pFileStream;
    hr = CComObject<CInternetFileStream>::CreateInstance(&pFileStream));
    ASSERT(SUCCEEDED(hr));

    pFileStream->AddRef();

    hr = pFileStream->Init(hRequest);
    ASSERT(SUCCEEDED(hr));

    *pData = pFileStream;
}

When adopting this code for production use, please take care to properly implement error handling and to call InternetCloseHandle() on all opened HINTERNET objects, but only after you are done with the ISequentialStream.

[1] Now that I think about it, there’s a slight chance that I could have continued to use that function by using the http://username:password@server/resource URL form, but Microsoft has eliminated it.

Using the Excel Object Model and Performance

C++, Excel Interop No Comments »

Recently I’ve had to write a bit of code which communicates with Microsoft Excel using its object model. Here are a few things I have learned from this experience.

  • Interaction with the Excel object model seems to use some kind of inter-process communication with an Excel process that is started behind the scenes. If things are not shut down properly, this Excel process will continue to run indefinitely in the background. Be sure to periodically check the Task Manager for any runaway Excel processes — these typically indicate a bug in your code or incomplete shutdown (perhaps because you chose “Stop Debugging” from the debugger).
  • If you are using the Excel object model using MFC, follow the example in Microsoft KB Article 178781: HOWTO: Automate Excel Using MFC and Worksheet Functions. Be sure to call COleDispatchDriver::ReleaseDispatch() or use the COleDispatchDriver::m_bAutoRelease member on all relevent objects or the Excel process may never stop.
  • The Excel object model documentation (which Microsoft KB Article 222101: How To Find and Use Office Object Model Documentation helps you find) is quite horrible, at least as of Office 2000. It is also written exclusively with the VB developer in mind.
  • To give control of the running, hidden Excel process with which you are interacting to the user, use the following code:

    Excel::_Application app;
    // Create and work with app...
    app.SetVisible(true);
    app.SetUserControl(true);
    
  • Even if Excel is not visible to the user, Application::Quit() may pop up a hidden dialog asking if the user wants to save the changes that were made through the dialog box. Since the dialog is not visible, Excel will never shut down. To prevent this dialog, either set Application.DisplayAlerts to false or set Workbook.Saved to true for all modified workbooks. The former is preferred.

  • Each call using the Excel object model is very, very slow, probably as a result of the use of IPC. This means that the typical way one would think of interacting with cell values in Excel — iterating cell-by-cell within a set of nested for loops — is often too slow to be practical. Instead, I work in selections of nRows rows by nCols columns and use a two-dimensional SAFEARRAY. For example:

    COleSafeArray rawData;
    DWORD rawDataDimensions[2];
    rawDataDimensions[0] = nRows;
    rawDataDimensions[1] = nCols;
    rawData.Create(VT_VARIANT, 2, rawDataDimensions);
    
    // Populate the values of rawData...
    
    // Select a range of size nRows x nCols
    Excel::Range range = wksheet.GetRange(varUpperLeftCell,
                                          varLowerRightCell);
    
    // Set the cells' values in one call to .SetValue()
    // instead of setting individual cell values
    range.SetValue(rawData);
    

Roadmap to high performance XML

XML No Comments »

Roadmap to high performance XML: a set of useful tips on writing high-performance XML handling in .NET.

Writing Exception-Safe Code

C++, Error Handling No Comments »

Today I ran across the article catch considered harmful on Michael Grier’s weblog which discusses some of the perils of exceptions. This, combined with articles in the C++ Users Journal about the difficulties in writing exception-safe containers, has given me terrible pause. Considering Herb Sutter’s excellent reputation, I want to read the following article as it may help clean up the mess:

Sutter, Herb. “When and How to Use Exceptions.” C++ Users Journal, August 2004.

It seems that most of the problems of exceptions come from interactions with persistent state, which functional languages (the original source of exceptions) do not have.

Programming is much more difficult than most programmers think it is, especially considering how even seemingly simple bugs such as integer overflow can lead to viruses and losses of billions of dollars in productivity.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in