Fascinating C++ minutae

C++ No Comments »

A few days ago I read this post by Stan Lippman which talks about C++’s behavior when calling virtual functions from within a base class constructor. Today he has another interesting post on the same topic in which he describes the actual mechanism the compiler uses to enforce the semantics. While I was able to guess the method the compiler uses to enable the correct behavior, I didn’t realize the following consequence until Stan said it:

The vptr always has to be set within the constructor of the actual class of the object being initialized, so that in itself is a necessary overhead of the virtual mechanism. This is why bitwise copy semantics or the use of memset/memcpy is not permissible for classes in which a vptr is present.

Previously I had the presence of a copy constructor was the only reason one couldn’t use memcpy to copy an instance of a class. Naturally, I was a good boy and didn’t ever try to use memcpy even if I knew the class lacked a copy constructor. Good thing, I suppose — that would have been a hell of a bug to track down.

Implemented integer overflow class

C++ No Comments »

Last night, I had some extra time so I implemented my idea for using x86 assembly to catch integer overflow errors. I believe the experiment was quite successful. Here’s a quick post-mortem:

  • After running into some bizarre, hard-to-track-down compiler problems related to templates (and the Microsoft compiler makes them more readable than most!), I decided against using partial template specialization. I justified my decision after the fact by reasoning that it doesn’t make sense to use partial template specialization unless you are also going to use the generic code (which is an option, I suppose, but I personally wouldn’t choose it).
  • I implemented the binary operator+ in terms of the operator+= member operator (and likewise for operator-). This is a common C++ technique I had forgotten about, as I don’t use C++ often. The code looks something like:

    SafeInt& operator+=(const SafeInt& rhs)
    {
        // Do work...
        return *this;
    }
    
    SafeInt operator+(const SafeInt& lhs, const SafeInt& rhs)
    {
        SafeInt sum = lhs;
        sum += rhs;
        return sum;
    }
    
  • I implemented the postfix operator++ in terms of the prefix operator++ (and likewise for operator--) as I saw in a piece of example code somewhere. The code looks something like:

    SafeInt& SafeInt::operator++() // Prefix
    {
        // Do work...
        return *this;
    }
    
    SafeInt SafeInt::operator++(int) // Postfix
    {
        SafeInt temp = *this;
        ++*this;
        return temp;
    }
    
  • For signed integers, I needed to use the imul opcode (instead of mul), which is probably obvious to anyone who regularly works in x86 assembly.
  • The actual code for performing the operation was closer to:

    SafeInt& SafeInt::operator+=(const SafeInt& rhs)
    {
        int iLhs = m_val;
        int iRhs = rhs.m_val;
        int sum;
    
        __asm {
            mov eax, iLhs
            mov ecx, iRhs
            add eax, ecx
            jo overflow
            mov sum, eax
        }
    
        m_val = sum;
        return *this;
    
    overflow:
        throw OverflowException();
    }
    

    I used temporary variables within the function because I was not otherwise able to successfully retrieve them from the inline assembly. I used ecx instead of ebx because the former is always preserved across an inline assembly section, while the latter will require the compiler to emit an extra push and pop.

    I stepped through the generated assembly in both the debug and release builds to see what it looked like. Unfortunately, it doesn’t seem as if the function was inlined, nor did the compiler optimize away some of the extra integer copies as I had hoped it would. Perhaps a few extra ns of performance can be squeezed out of this function by working on these problems, making it effectively costless, as the jo branch would presumably be correctly predicted. If only all methods of making code secure were as costless!

    Update 2006-10-04 12:33 PM: This assembly code is slow and needs to be optimized. See the below comment.

I did a quick test, and it seems that C# has retained C’s overflow semantics for addition. Personally, if I were designing a new language, I think I would seriously consider having addition throw an exception upon overflow. This has led me to the following thoughts:

  • How would one handle a language in which an operation as simple and basic as addition can throw an exception? Would you be try/catching addition operations frequently? Are there any situations nowadays where addition can throw an exception?
  • What if there were no operations which could be guaranteed to never throw an exception? Imagine a network-transparent programming language where underneath the operations were being sent out to multiple computers across the network — any operation could easily fail due to network failures — how would this affect your style of programming? Can you create a completely ’safe’ program if no operations were guaranteed to never throw an exception? (My hunch is no, for the same reason you can’t create a safe multithreaded program without atomic operations)
  • There seems to be some kind of fundamental impedence mismatch between out-of-bound exceptions and linear, imperative programming. For example, if a function fails but the caller is expecting and will immediately handle the failure, a return value denoting failure will simultaneously be cleaner code to write and much more efficient. Naturally, there are many compelling reasons to use exceptions, but the general, tautological rule of thumb has become Only use exceptions in exceptional situations. I will discuss error and exception handling in a future post.

Update 2006-10-04 12:26 PM: I see that Sree Kotay has demonstrated that my assembly version of the integer overflow handler is slower than the “tricky” pure-C MSDN implementation, and provides an even faster pure-C version (although I find it even more tricky!). That’s fine, as I didn’t spend any time working on the performance of my assembly version, I just wanted to illustrate that it was possible. Optimization would have come later. Presumably the assembly version can always be made at least as fast as the pure-C version — after all, you could use the compiled C code as a starting point.

Integer overflow

C++ No Comments »

Recently, I read the article Integer Handling with the C++ SafeInt Class on MSDN which describes a generic C++ class that modifies the common mathematical operations (multiplication, addition, etc.) to throw an exception upon overflow. The method the author chose was to bounds check the operations before execution, which results in some tricky and non-obvious code.

Personally, I would have implemented the code in terms of inline x86 assembler. The code would become much, much simpler and probably quite a bit faster. The major downsides would be portability and writing all the partial template specializations.

I’m a little rusty on x86 assembly and a lot rusty on inline assembly, but the code might look something like:

template<>
SafeInt<int> operator +(int lhs, SafeInt<int> rhs)
{
    int ret;

    __asm {
        mov eax, lhs;
        mov ebx, rhs.Value();
        add eax, ebx;
        jo overflow;
        mov ret, eax;
    }

    return SafeInt<int>(ret);

overflow:
    throw OverflowException();
}

The savings are probably especially dramatic for the multiplication operator.

A JavaScript parseInt() gotcha

JavaScript No Comments »

Be careful of the JavaScript parseInt() function. It treats numbers that start with "0" as octal and "0x" as hexadecimal, which may not be what you want if the user entered a month of "09". Furthermore, parseInt() will not throw an exception if the parameter is bogus (as "09" is) — it will return NaN.

Therefore, consider using the optional radix option to parseInt() and always check for NaN with isNaN().

I have some thoughts about the use of exceptions vs. return values to indicate errors that I may get to in a future post.

Modal dialogs in IE

HTML No Comments »

Modal dialogs in Internet Explorer are a somewhat curious creature. Here are some of the problems that I have encountered and their workarounds:


Problem Solution / Workaround

Postback of a form within a modal dialog causes a new window to open

Use one of the following:

  • Add a <base target="_self"> fragment in the <head> element of your page
  • Put all of the dialog’s content within an iframe

Navigation to a new page within a modal dialog causes a new window to open

Put all of the dialog’s content within an iframe

The DISPID_NEWWINDOW2 event fired by the Internet Explorer COM component (often abstracted by the WebBrowser control) is not fired for modal dialogs, which means that applications that host Internet Explorer cannot capture the creation of modal dialogs to add their own window decorations or object model to window.external

No solution is known to actually capture the event.

As for accessing window.external within modal dialogs, it can be passed as an argument to the window.showModalDialog() method and access can be abstracted through a function such as this:

function getExternalOM()
{
    // For IE windows, window.external will exist but will not have any
    // methods.  <starwars>This isn't the window.external we are looking
    // for.</starwars>
    if ("foo" in window.external) {
        return window.external;
    }

    if (window.dialogArguments != null &&
        window.dialogArguments.externalOM != null) {
        return window.dialogArguments.externalOM;
    }

    return null;
}

Using XmlSerializer does not guarantee a valid document

C#, XML No Comments »

The classes which are generated from an XML schema with xsd.exe do not implement every restriction of the original schema. No exceptions or warnings will be given if an XML document is serialized but the serialized document does not conform to the original schema.

This is especially common when using XML Schema restriction elements on simple types, as the generated C# classes will not enforce the same restrictions. For example, consider the schema fragment:

<xs:simpleType name="positiveLong">
    <xs:restriction base="xs:long">
        <xs:minExclusive value="0" />
    </xs:restriction>
</xs:simpleType>

Any schema elements which use this type will result, when compiled using xsd.exe, in an object which uses the standard C# long type, which will happily accept negative numbers. Furthermore, the Serialize() method will quietly write out the negative number.

Therefore, it is recommended that any document generated using XmlSerializer should be revalidated against the original XML schema.

Serializing a C# decimal with XmlSerializer bug

C#, XML No Comments »

The .NET Framework 1.0 has a bug when serializing extremely small and extremely large decimals using XmlSerializer: the resulting values do not conform to the XML Schema decimal data type. Specifically, the serializer occasionally outputs the decimal in scientific notation. This also affects the decimal.ToString() method. This bug was fixed with the .NET Framework 1.1.

For example:

.NET Framework 1.0: Console.WriteLine(0.00000000000000000001M.ToString()) prints out 1E-20.
.NET Framework 1.1: Console.WriteLine(0.00000000000000000001M.ToString()) prints out 0.00000000000000000001.

To require the use of the 1.1 Framework, add the following stanza to your app.config:

<startup>
    <supportedRuntime version="v1.1.4322" />
    <requiredRuntime version="v1.1.4322" />
</startup>

This bug can be somewhat tricky to find, as the vast majority of the time decimals will be serialized correctly.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in