A C/C++ Error Handling Discipline: Part 1

C, C++, Error Handling No Comments »

One of my passions (although I don’t know why) is proper error handling in programming. I find the fact that so few people do it properly incredibly frustrating, but I find solace in the fact that doing proper error handling is hard, and I have yet to find a solution which I deem fully satisfactory. Exceptions come close, but their terrible performance implications, out-of-bound processing, and sometimes expressly forbidden nature (such as when writing a COM function) leads to the conclusion that they should only be used for exceptional situations and cannot be used universally.

An interesting technical article entitled Thinking About Error Cases discusses a number of commong error-handling styles found in C and C++. Without talking too much about the advantages and disadvantages to these styles (I shall do so in a future post, in addition to discussing a new style), I shall simply state matter-of-factly that many Windows components simply return HRESULTs as a status code and encode return values in function parameters, and this style is usually sufficient.

One of the advantages of using the HRESULT-based error-handling style is that if your function also returns a HRESULT, you can typically return any failure codes as-is (somewhat reminiscent of exceptions). A naïve implementation of such code might look like:

HRESULT DoStuff()
{
    HRESULT hr;

    hr = Function1();
    if (FAILED(hr))
        return hr;

    hr = Function2();
    if (FAILED(hr))
        return hr;

    return S_OK;
}

The main downside to this code is that if there are any resources that were allocated by the function (such as file handles, dynamically allocated memory, etc.) they almost always will not be deallocated when a failure occurs. An exception to this case is if you wrap your resources in C++ objects using the RAII (resource acquisition is initialization) idiom where the object destructors deallocate the resources.

Given that you are not wrapping your resources in C++ objects, how might you properly perform deallocation of resources? Your first attempt might be to have each failure deallocate any resources that were allocated previously, such as:

HRESULT DoStuff()
{
    HRESULT hr;
    RESOURCE r1, r2, r3;

    hr = AllocResource(&r1);
    if (FAILED(hr))
        return hr;

    hr = AllocResource(&r2);
    if (FAILED(hr)) {
        FreeResource(&r1);
        return hr;
    }

    hr = AllocResource(&r3);
    if (FAILED(hr)) {
        // Free in the reverse order of allocation
        FreeResource(&r2);
        FreeResource(&r1);
        return hr;
    }

    ...
}

This style is obviously tedious and error-prone. Furthermore, in addition to the duplication of cleanup code between failure blocks, the function must also duplicate the cleanup code once again as part of freeing resources during normal operation. However, this was something I personally, briefly experimented with when trying to settle on a error-handling style of my own.

Your next step might be to observe that you only need to execute a subsequent operation if all previous operations were successful. By keeping track of what resources have been successfully allocated (an easy task when you can represent a not-yet-allocated resource by a unique value such as NULL), you can use this to your advantage by writing code in the following style:

HRESULT DoStuff()
{
    HRESULT hr;
    RESOURCE r1 = INVALID_RESOURCE_VALUE;
    RESOURCE r2 = INVALID_RESOURCE_VALUE;

    hr = AllocResource(&r1);
    if (SUCCEEDED(hr)) {
        hr = AllocResource(&r2);
        if (SUCCEEDED(hr)) {
            hr = FunctionWhichUsesResources(r1, r2);
            if (SUCCEEDED(hr)) {
                ...
            }
        }
    }

    // Perform cleanup
    if (r2 != INVALID_RESOURCE_VALUE)
        FreeResource(r2);
    if (r1 != INVALID_RESOURCE_VALUE)
        FreeResource(r1);

    return hr;
}

This style is a vast improvement upon the previous in that the cleanup code is performed exactly once. The nesting gets a little out of hand but it isn’t strictly necessary: the above code is equivalent to:

HRESULT DoStuff()
{
    HRESULT hr;
    RESOURCE r1 = INVALID_RESOURCE_VALUE;
    RESOURCE r2 = INVALID_RESOURCE_VALUE;

    hr = AllocResource(&r1);
    if (SUCCEEDED(hr))
        hr = AllocResource(&r2);
    if (SUCCEEDED(hr))
        hr = FunctionWhichUsesResources(r1, r2);
    if (SUCCEEDED(hr))
        ...

    // Perform cleanup
    if (r2 != INVALID_RESOURCE_VALUE)
        FreeResource(r2);
    if (r1 != INVALID_RESOURCE_VALUE)
        FreeResource(r1);

    return hr;
}

I believe this style to be somewhat common, but I still think it could use some improvement, what with all the SUCCEEDED(hr)s in there. Quick: What C/C++ feature can you think of which will allow you to easily repeat the same code over and over? That’s right, macros! One could easily make the following changes to make the above code a little cleaner (although ultimately I do not recommend it, see this post for why):

#define Chk(f)
    do {
        if (SUCCEEDED(hr)) {
            hr = (f);
        }
    } while (0)

HRESULT DoStuff()
{
    HRESULT hr;
    RESOURCE r1 = INVALID_RESOURCE_VALUE;
    RESOURCE r2 = INVALID_RESOURCE_VALUE;

    hr = AllocResource(&r1);
    Chk(AllocResource(&r2));
    Chk(FunctionWhichUsesResources(r1, r2));
    Chk(Function2WhichUsesResource(r1, r2));

    // Perform cleanup
    if (r2 != INVALID_RESOURCE_VALUE)
        FreeResource(r2);
    if (r1 != INVALID_RESOURCE_VALUE)
        FreeResource(r1);

    return hr;
}

Note that the macro uses the standard do/while trick to embed multiple statements into one macro. Also note it has the implicit dependency that there must be an HRESULT named hr in the function.

We can even make things a little cleaner. Changing the first line of the function to be HRESULT hr = S_OK allows you to use the Chk() macro consistently throughout the function:

#define Chk(f)
    do {
        if (SUCCEEDED(hr)) {
            hr = (f);
        }
    } while (0)

HRESULT DoStuff()
{
    HRESULT hr = S_OK;
    RESOURCE r1 = INVALID_RESOURCE_VALUE;
    RESOURCE r2 = INVALID_RESOURCE_VALUE;

    Chk(AllocResource(&r1));
    Chk(AllocResource(&r2));
    Chk(FunctionWhichUsesResources(r1, r2));
    Chk(Function2WhichUsesResource(r1, r2));

    // Perform cleanup
    if (r2 != INVALID_RESOURCE_VALUE)
        FreeResource(r2);
    if (r1 != INVALID_RESOURCE_VALUE)
        FreeResource(r1);

    return hr;
}

This is pretty good, but what if there is a condition where you consider the function successful, but you would like to terminate without further processing. For this case, the easiest thing is probably to tell Edsger Dijkstra to shove it and use a goto statement:

#define Chk(f)
    do {
        if (SUCCEEDED(hr)) {
            hr = (f);
        }
    } while (0)

HRESULT DoStuff()
{
    HRESULT hr = S_OK;
    RESOURCE r1 = INVALID_RESOURCE_VALUE;
    RESOURCE r2 = INVALID_RESOURCE_VALUE;

    Chk(AllocResource(&r1));
    Chk(AllocResource(&r2));
    Chk(FunctionWhichUsesResources(r1, r2));
    if (SomeEarlyTerminationConditionIsMet())
        goto Cleanup;
    Chk(Function2WhichUsesResource(r1, r2));

Cleanup:
    if (r2 != INVALID_RESOURCE_VALUE)
        FreeResource(r2);
    if (r1 != INVALID_RESOURCE_VALUE)
        FreeResource(r1);

    return hr;
}

Recommended: (see this post) Now that we have used goto once, why not use it everywhere? For my code, I standardize on the presence of a label such as Cleanup and have the Chk() macro goto upon failure. A Chk() which uses goto looks like:

#define Chk(f)
    do {
        hr = (f);
        if (FAILED(hr)) {
            goto Cleanup;
        }
    } while (0)

Now that all this infrastructure is there, you can do some pretty neat things. Want to print to the screen every time we encounter a failed HRESULT in a debug build? Simply modify Chk() such as:

#ifdef DEBUG
# define IfDebug(x) (x)
#else
# define IfDebug(x)
#endif

#define Chk(f)
    do {
        hr = (f);
        if (FAILED(hr)) {
            IfDebug(printf("Encountered hr = 0x%x at %s:%d (in function %s)", hr,
                            __FILE__, __LINE__, __FUNCTION__));
            goto Cleanup;
        }
    } while (0)

Want to simplify the if (...) goto Cleanup; for early termination? Write a ChkTrue() macro:

// If f is false, unconditionally set hr to hrNew and jump to Cleanup
#define ChkTrue(f, hrNew)
    do {
        if (!(f)) {
            hr = (hrNew);
            goto Cleanup;
        }
    } while (0)

Want to call Windows functions which return a boolean and provide extended error information in GetLastError() (such as CreateDirectory?):

#define ChkTrueGLE(f) ChkTrue(f, HRESULT_FROM_WIN32(::GetLastError()))

Want to call Windows functions which return a HANDLE and provide extended error information in GetLastError() (such as CreateFile?):

#define ChkHandleGLE(f) ChkTrueGLE((f) != INVALID_HANDLE_VALUE)

Once this infrastructure is in place, writing code which properly handles errors becomes a lot easier!

Update: Be sure to read this post with further observations about this style of error handling.

XPath and Escaping

XML No Comments »

Let’s say you have the following XML structure:

<surnames>
  <surname>Casey</surname>
  <surname>D'Oronzo</surname>
  <surname>Engelhardt</surname>
</surnames>

Now let’s say you are writing a function in JavaScript to retrieve a surname node given its text. You might write something like:

function GetNodeWithSurname(docSurnames, strSurname)
{
    return docSurnames.selectSingleNode("/surnames/surname[text() = '" + strSurname + "']");
}

What will happen if you pass the parameter D'Oronzo to this function? The XPath expression within selectSingleNode() will become /surnames/surname[text() = 'D'Oronzo']. Note how the single quote within the parameter has invalidated the XPath expression. This is a classic escaping problem. Improper escaping is the reason why so many web forms break when they encounter single quotation marks, as any Irishman may attest to.

According to the MSDN Magazine Web Q&A article Web Q&A: XPath, XML Notepad, Data Islands, Case Sensitivity, XSL, and More, XPath expressions allow the use of as an escape character. Therefore, to fix GetNodeWithSurname(), all one has to do is replace ' with ', right? Well, not exactly. You also need to escape backslashes within strSurname as well.

To simplify this, I wrote the following simple JavaScript function to perform escaping for me:

function EscapeXPathString
    (
    /* string */ str
    )
{
    var reBackslash = //g;
    var reSingleQuote = /'/g;
    var reDoubleQuote = /"/g;

    var strResult = str;
    strResult = strResult.replace(reBackslash, "");
    strResult = strResult.replace(reSingleQuote, "'");
    strResult = strResult.replace(reDoubleQuote, """);
    return strResult;
}

The large number of s are to deal with the fact that JavaScript also uses to escape characters. I escaped " in addition to ' and to properly handle the case when the user writes an XPath expression that uses " instead of ' to encase the parameter. Now GetNodeWithSurname() becomes:

function GetNodeWithSurname(docSurnames, strSurname)
{
    return docSurnames.selectSingleNode("/surnames/surname[text() = '" + EscapeXPathString(strSurname) + "']");
}

I am unaware if this method of escaping with is a standard part of XPath or a feature of Microsoft’s XML parser.

Proper character escaping is something a programmer must always worry about. For example, when writing the code samples above, I had to write &lt; to insert a <. In fact, just now I had to write &amp;lt; to represent &lt;. Wait, now I’m writing &amp;amp;lt; to represent &amp;lt;. And on and on it goes…

C# and overflow

C# No Comments »

In my previous post Implemented integer overflow class, I off-handedly mentioned that C# appeared to retain C’s overflow for semantics for addition. It appears that C# supports two contexts, checked and unchecked, in which an overflow in integer arithmetic generates or does not generate an exception, respectively [documentation]. Furthermore, there is a compiler flag, /checked, which may be used to specify the default context for integer arithmetic statements [documentation].

Kudos to the developers of C# for the forethought and attention to detail.

Perhaps in the future they should consider making /checked the default for security reasons.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in