At Morningstar, one of my main responsibilities is maintaining and updating the internal Principia production process which involves parsing XML retrieved over HTTP. The HTTP server is changing to require HTTP Basic Authentication, so I had to change the production process accordingly.
With .NET, the changes would be basically trivial - all I would have to do would be to set the WebRequest.Credentials property. The code to load the XML would look something like:
public static XPathNavigator LoadXPath(Uri url, int timeoutMs, ICredentials credentials)
{
WebRequest request = WebRequest.Create(url);
request.Timeout = timeoutMs;
if (credentials != null)
request.Credentials = credentials;
WebResponse response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
XPathDocument doc = new XPathDocument(responseStream);
XPathNavigator nav = doc.CreateNavigator();
return nav;
}
}
However, the production process is written in C++ and uses MSXML’s SAX support through the ISAXXMLReader interface. Previously, retrieving the data from the web server was pretty simple — all I had to do was to call the ISAXXMLReader::parseURL() method. However, this function doesn’t support HTTP Basic Authentication.1 Therefore, I decided to use the ISAXXMLReader::parse() method and try to pass it the most optimal paramater possible: an IStream or ISequentialStream interface pointed at the response from the appropriate, authorized HTTP request.
First, I tried using an IXMLHTTPRequest object, as IXMLHTTPRequest supports authentication using the IXMLHTTPRequest::open() method. This worked, but I eventually decided to move away from it because it didn’t give me full control over the HTTP request, didn’t easily allow me to force circumvention of the HTTP cache, and gave me fewer optimization opportunities (such as HTTP KeepAlive support).
Next, I investigated the Windows HTTP Services (WinHTTP) library but I decided against it because it isn’t ubiqutous across all operating systems and wasn’t supported by the Platform SDK which ships with Visual Studio 6.0.
This led me to finally settle on using the Windows Internet (WinInet) library. This was my first serious use of WinInet, and it seems quite complete, powerful, and very verbose and difficult to work with. The remainder of this post describes how to get an ISequentialStream from a URL using HTTP Basic Authentication over WinInet. I have mostly elided error handling and resource cleanup for brevity.
The first step to using WinInet is to call the InternetOpen() function. There is nothing particularly tricky here, besides remembering that some web servers act differently based on the HTTP agent.
The first method to send the HTTP request I tried was InternetOpenUrl(). However, as I found out from the MSDN WinInet article Handling Authentication, it seems you cannot use this function with HTTP Basic Authentication. Instead, you have to use the following functions in order:
InternetCrackUrl()to break up a URL into its components, as the following functions require them to be separate.InternetConnect()to create a connected session to the remote host.HttpOpenRequest()to create a HTTP request on that session.InternetSetOption()using the option flagsINTERNET_OPTION_USERNAMEandINTERNET_OPTION_PASSWORDto set the credentials on the request.HttpSendRequest()to send the request to the remote web server.
Finally, you need to write a COM object that implements ISequentialStream and forwards the calls to InternetReadFile() (and InternetWriteFile() for completeness’s sake). Fortunately, this is very easy using the Active Template Library (ATL) — see the CInternetFileStream object below.
Another function that you may find useful is HttpQueryInfo(), which allows you to determine information about the query and its response. Especially of interest may be the flag HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_STATUS_CODE as it allows you to determine the status code of the response.
Integrating these steps together, the code (sans real error handling and cleanup) might look something like:
typedef std::basic_string<TCHAR> tstring;
// Implements the COM IStream interface on a HINTERNET file handle opened
// with a function such as ::InternetOpenUrl(). Be sure that the corresponding
// session and internet HINTERNET handles (where applicable) *also* stay open
// until this object is finished.
class ATL_NO_VTABLE CInternetFileStream :
public CComObjectRootEx<CComSingleThreadModel>,
public CComCoClass<CInternetFileStream>,
public ISequentialStream
{
private:
HINTERNET m_hFile;
public:
DECLARE_NOT_AGGREGATABLE(CInternetFileStream)
BEGIN_COM_MAP(CInternetFileStream)
COM_INTERFACE_ENTRY(ISequentialStream)
END_COM_MAP()
public:
// For the hFile to be valid, the associated session and internet HINTERNET
// handles must also stay open (not seen here).
HRESULT Init(HINTERNET hFile)
{
m_hFile = hFile;
}
// ISequentialStream
public:
STDMETHOD(Read)(void* pv, ULONG cb, ULONG* pcbRead)
{
BOOL bSuccess = ::InternetReadFile(m_hFile, pv, cb, pcbRead);
return bSuccess ? S_OK : E_FAIL;
}
STDMETHOD(Write)(void const* pv, ULONG cb, ULONG* pcbWritten)
{
BOOL bSuccess = ::InternetWriteFile(m_hFile, pv, cb, pcbWritten);
return bSuccess ? S_OK : E_FAIL;
}
};
void LoadUrlUsingAuth
(
LPCTSTR strURL,
LPCTSTR strUserName,
LPCTSTR strPassword,
ISequentialStream** pData
)
{
BOOL bSuccess;
HRESULT hr;
*pData = NULL;
HINTERNET hInternet = ::InternetOpen
(
_T("This is my HTTP Agent"),
INTERNET_OPEN_TYPE_PRECONFIG,
NULL,
NULL,
0
);
ASSERT(hInternet != NULL);
// Break up the URL into its components. To return specific
// URL_COMPONENTS members, set their length to 1 (a 0 length
// means the pointer will not be returned). InternetCrackUrl
// populates the URL_COMPONENTS structure with pointers to
// within the provided string and their associated lengths, but
// does not modify the string itself: this means that you don't
// need to worry about freeing memory but the strings are
// not NUL-terminated.
URL_COMPONENTS urlComponents;
::memset(&urlComponents, 0, sizeof(urlComponents));
urlComponents.dwStructSize = sizeof(urlComponents);
urlComponents.dwHostNameLength = 1;
urlComponents.dwUrlPathLength = 1;
bSuccess = ::InternetCrackUrl(strURL, 0, 0, &urlComponents);
ASSERT(bSuccess);
// As InternetCrackUrl returns pointers to non-0-terminated strings
// within the URL, copy the values into 0-terminated strings.
tstring strHostName(urlComponents.lpszHostName, urlComponents.dwHostNameLength);
tstring strUrlPath(urlComponents.lpszUrlPath, urlComponents.dwUrlPathLength);
// Connect to remote host
HINTERNET hConnection = ::InternetConnect
(
hInternet.Handle(),
strHostName.c_str(),
urlComponents.nPort,
_T(""),
_T(""),
urlComponents.nScheme,
0,
NULL
);
ASSERT(hConnection != NULL);
// The MIME types we will accept from the web server when sending
// the request.
static LPCTSTR lpszAcceptTypes[] =
{
_T("text/xml"),
NULL
};
// Create a request that will be sent to the remote host, completely
// bypassing the cache
hRequest = ::HttpOpenRequest
(
hConnection,
_T("GET"),
strUrlPath.c_str(),
NULL,
NULL,
lpszAcceptTypes,
INTERNET_FLAG_NO_CACHE_WRITE | INTERNET_FLAG_NO_COOKIES |
INTERNET_FLAG_RELOAD,
NULL
);
ASSERT(hRequest != NULL);
// Set the username and password for HTTP Basic Authentication
bSuccess = ::InternetSetOption
(
hRequest,
INTERNET_OPTION_USERNAME,
const_cast<LPTSTR>(strUserName),
_tcslen(strUserName)
);
ASSERT(bSuccess);
bSuccess = ::InternetSetOption
(
hRequest,
INTERNET_OPTION_PASSWORD,
const_cast<LPTSTR>(strPassword),
_tcslen(strPassword)
);
ASSERT(bSuccess);
// Send the request. hRequest is now the handle we should pass
// to InternetReadFile() as it has the response data.
bSuccess = ::HttpSendRequest(hRequest, NULL, 0, NULL, 0);
ASSERT(bSuccess);
#if defined(DEBUG)
// Retrieve the request's status code
DWORD dwStatus;
DWORD dwStatusSize = sizeof(dwStatus);
BOOL bSuccess = ::HttpQueryInfo
(
hRequest,
HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_STATUS_CODE,
&dwStatus,
&dwStatusSize,
NULL
);
ASSERT(bSuccess);
#endif
CComObject<CInternetFileStream>* pFileStream;
hr = CComObject<CInternetFileStream>::CreateInstance(&pFileStream));
ASSERT(SUCCEEDED(hr));
pFileStream->AddRef();
hr = pFileStream->Init(hRequest);
ASSERT(SUCCEEDED(hr));
*pData = pFileStream;
}
When adopting this code for production use, please take care to properly implement error handling and to call InternetCloseHandle() on all opened HINTERNET objects, but only after you are done with the ISequentialStream.
Recent Comments