Ridiculous Bug

C++ No Comments »

Today I was debugging some strange behavior of some code on my machine where a graph was randomly not displaying any data. I managed to track the bug down and it was quite amusing on many levels. Let me explain how it worked:

As a part of the data retrieval process, the code populated a SAFEARRAY of VARIANTs containg BSTRs. This SAFEARRAY had 6 members. There was then a section of code to make debugging easier which looked like:

#ifdef _DEBUG
    SAFEARRAY* psa = NULL;
    VARIANT* vArray = NULL;

    psa = varPlotData->parray;
    hr = SafeArrayAccessData(psa, (void**)&vArray);
    short nIndex(0);
    CComBSTR bstr1 = vArray[nIndex].bstrVal;
    CComBSTR bstr2 = vArray[++nIndex].bstrVal;
    CComBSTR bstr3 = vArray[++nIndex].bstrVal;
    CComBSTR bstr4 = vArray[++nIndex].bstrVal;
    CComBSTR bstr5 = vArray[++nIndex].bstrVal;
    CComBSTR bstr6 = vArray[++nIndex].bstrVal;
    CComBSTR bstr7 = vArray[++nIndex].bstrVal;
    CComBSTR bstr8 = vArray[++nIndex].bstrVal;
    SafeArrayUnaccessData(psa);
    VariantClear(vArray);
#endif

Even a non-Windows programmer should be able to see that since this piece of code accesses a non-existant 7th and 8th member of the array, there is a good chance that the access will dereference an invalid pointer which will generate an exception. Apparently someone had noticed that problem and decided to wrap the debug code block with a try { } catch (...) { return hr; } statement that would silently catch the exception and return what would inevitably be a success code (as hr was set to the value from SafeArrayAccessData() which is virtually guaranteed to succeed). Obviously, I strongly recommend against blindly catching all exceptions and returning what could possibly be a success code.

However, it turns out that the graph would only display if an exception was thrown, which happens randomly! The reason? VariantClear() was being called on an item which isn’t a VARIANT you should free — it is a pointer to an array of VARIANTs returned by SafeArrayAccessData()! The VariantClear() was destroying the first VARIANT in the array, and the function which later tried to read this value would fail, preventing the graph from displaying. If an exception was thrown, VariantClear() wasn’t called, and the graph drawing function was able to function normally!

The correct solution is to use SafeArrayGetLBound and SafeArrayGetUBound to determine the bounds of the SAFEARRAY, restructuring the debug code accordingly, and to remove the call to VariantClear(), as SafeArrayUnaccessData() is sufficient.

Proper integration of bogofilter and mutt

Email, Mutt, Unix No Comments »

Recently I reinstalled my Debian GNU/Linux1 machine and reestablished my mail setup, which uses Postfix as the MTA, Mutt as the email cilent, Procmail for mail sorting and preprocessing, and Bogofilter for spam identification. A key part of the anti-spam setup is enabling bogofilter’s post-identification message text ham/spam classification, so that bogofilter, which presumably guesses correctly most of the time, will teach itself. However, as bogofilter occasionally makes mistakes, I need a mechanism by which I can identify and inform bogofilter when it has misclassified a piece of mail. Per bogofilter’s man page, I created a set of mutt macros to enable reclassification which looked something like:

macro index S "<enter-command>unset wait_keyn
               <pipe-entry>bogofilter -Snn
               <enter-command>set wait_keyn
               <delete-message>"
      "mark message as spam when misclassified as ham"
macro index N "<enter-command>unset wait_keyn
               <pipe-entry>bogofilter -Nsn
               <enter-command>set wait_keyn
               <delete-message>"
      "mark message as ham when misclassified as spam"

The purpose of the macro is to send the selected message to bogofilter, telling it to unregister all the message tokens as ham and register them all as spam (or spam and ham respectively). However, I soon noticed that bogofilter was having horrible classification success, i.e., it was nearly always wrong! Debugging, primarily using bogoutil, eventually led me to the discovery that mutt’s <pipe-entry> does not send the entire message to the provided process — it only sends the set of headers which are displayed by default to the user (configurable using the ignore and unignore parameters in mutt). This meant that many headers, which bogofilter has incorrectly classified using its self-teaching mechanism, were not being corrected!

I did not see an easy mechanism by which I could fix this (absurdly stupid) pipe-entry behavior, so I decided to solve the problem in a different way. Now, instead of sending the message to bogofilter, I save the message to a folder named spam-false-positives or spam-false-negatives depending on the situation. Then I wrote a cron job which runs every hour and checks to see if any messages are in these folders — if so, I send the entire folder’s contents to bogofilter for reclassification. The reason why this works is that when you save a message to a separate folder, you keep the full headers intact, and bogofilter is able to read them directly from the mbox file.

For reference, my mutt macros now look like:

macro index N "<save-message>=spam-false-positivesny"
macro pager N "<save-message>=spam-false-positivesny"
macro index S "<save-message>=spam-false-negativesny"
macro pager S "<save-message>=spam-false-negativesny"

My cron job script looks like:

#!/bin/sh

FALSE_NEGATIVES=$HOME/.mail/mailboxes/spam-false-negatives
FALSE_POSITIVES=$HOME/.mail/mailboxes/spam-false-positives

if [ -f $FALSE_NEGATIVES ]; then
    bogofilter -MNs -I $FALSE_NEGATIVES
    rm $FALSE_NEGATIVES
fi

if [ -f $FALSE_POSITIVES ]; then
    bogofilter -MSn -I $FALSE_POSITIVES
    rm $FALSE_POSITIVES
fi

One advantage of this new setup is that reclassifying wrongly-identified mail is faster than before. While I don’t notice too much difference with bogofilter, as bogofilter is rather quick, I’m sure I would notice an enormous difference using this setup with SpamAssassin, as sa-learn is ungodly slow.

[1] I defer to the silly moniker “GNU/Linux” rather than simply “Linux” as that is how Debian refers to itself.
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in