C# Utility Class: CompundKey

C# No Comments »

While working in C#, I’ve often found that I want to use more than one variable as a key to a Hashtable. I’ve abstracted this functionality into an object called CompundKey. CompoundKey allows one to combine any number of variables into a single object which, through proper implementation of Equals(), GetHashCode(), and ToString(), can be used as a key to any IDictionary or even System.Web.Caching.Cache.

Usage is very simple. For example:

IDictionary urlUserAccessTimes = new Hashtable();
CompoundKey urlUserKey = new CompoundKey(new Uri("http://www.deez.info/sengelha/", "Steven Engelhardt"));
urlUserAccessTimes[urlUserKey] = DateTime.Now;

Here’s the code:

CompoundKey code
/// <summary>
/// Creates a key for an IDictionary or a System.Web.Caching.Cache
/// out of a collection of values.
/// </summary>
/// <remarks>
/// Each value stored in CompoundKey must implement Equals()
/// correctly.
/// </remarks>
public struct CompoundKey
{
    private object[] m_keyParts;

    public CompoundKey(params object[] keyParts)
    {
        Debug.Assert(keyParts != null);

        m_keyParts = keyParts;
    }

    public override bool Equals(object obj)
    {
        if (!(obj is CompoundKey))
            return false;

        CompoundKey key = (CompoundKey) obj;
        return ArrayUtils.Equals(m_keyParts, key.m_keyParts);
    }

    public override int GetHashCode()
    {
        int hashCode = 0;
        foreach (object keyPart in m_keyParts)
        {
            if (keyPart != null)
            {
                hashCode ^= keyPart.GetHashCode();
            }
        }
        return hashCode;
    }

    /// <remarks>
    /// Unfortunately, System.Web.Caching.Cache uses strings as keys
    /// instead of objects.
    /// </remarks>
    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();

        foreach (object keyPart in m_keyParts)
        {
            if (sb.Length > 0)
                sb.Append(",");
            sb.Append(keyPart != null ? keyPart.ToString() : "(null)");
        }

        return sb.ToString();
    }
}

C# Utility Class: ArrayUtils

C# No Comments »

At work, I’ve written a collection of C# utility classes which implement commonly used functionality. The classes are encapsulated in a class library project which is included by virtually every C# application or class library I write. One of the simplest utility classes I’ve written is called ArrayUtils, and it is (duh) a set of useful functions for dealing with arrays. First, the code:

ArrayUtils code
/// <summary>
/// ArrayUtils is a collection of static helper functions which implement
/// common array tasks.
/// </summary>
public sealed class ArrayUtils
{
    /// <summary>
    /// Determines whether the provided array contains the specified
    /// member.
    /// </summary>
    /// <remarks>
    /// Execution time is O(n).
    /// </remarks>
    public static bool Contains(Array a1, object val)
    {
        Debug.Assert(a1 != null);
        Debug.Assert(val != null);

        foreach (object o in a1)
        {
            if (Object.Equals(o, val))
                return true;
        }

        return false;
    }

    /// <summary>
    /// Determines whether the provided array has any duplicated members.
    /// </summary>
    /// <remarks>
    /// Execution time is O(n^2).
    /// </remarks>
    public static bool ContainsDuplicates(Array a1)
    {
        Debug.Assert(a1 != null);

        for (int i = 0; i < a1.Length; i++)
        {
            for (int j = i + 1; j < a1.Length; j++)
            {
                if (Object.Equals(a1.GetValue(i), a1.GetValue(j)))
                {
                    return true;
                }
            }
        }

        return false;
    }

    /// <summary>
    /// Determine whether the two arrays are equal.  Equality is defined
    /// as having the same number of members and each member, in order,
    /// matches the corresponding member in the other array.
    /// </summary>
    /// <remarks>
    /// Execution time is O(n).
    /// </remarks>
    public static bool Equals(Array a1, Array a2)
    {
        Debug.Assert(a1 != null);
        Debug.Assert(a2 != null);

        if (a1.Length != a2.Length)
            return false;

        for (int i = 0; i < a1.Length; i++)
        {
            if (!Object.Equals(a1.GetValue(i), a2.GetValue(i)))
                return false;
        }

        return true;
    }

    private ArrayUtils() {}
}

Most of the code is very straight forward. Here’s a few pieces which I believe deserve closer attention:

public sealed class ArrayUtils
{
    ...

    private ArrayUtils() {}
}

The Utils suffix on the class name is a convention I use to denote a class with only static methods. The combination of sealed and a private constructor is a common idiom when declaring such classes. The next version of C#, Whidbey, introduces a static class attribute to denote such classes; classes with this attribute will be disallowed from declaring any non-static methods.

public static bool Contains(Array a1, object val)
{
    Debug.Assert(a1 != null);
    Debug.Assert(val != null);

    ...
}

I tend to use debug-only parameter validation for my utility classes; the alternative is to do things such as throw ArgumentNullExceptions. My justification is that these utility methods are internal and all validation should have been performed before calling them. I’m not particularly beholden to this convention—I consider there to be a decent chance that in the future I will change my mind.

if (Object.Equals(o, val))

I am very careful to use Object.Equals() throughout this class to allow for the possibility that either object may be null and to consider two null objects identical.

The rest, I think, is self-explanatory. Future posts about these C# utility classes should be more interesting as I plan to discuss much less trivial code.

By the way, I mentioned earlier that this post was going to be part of of a series. I’ve changed my mind; these utility classes are largely independent and will stand on their own.

Beware ThreadPools And HttpWebRequest

C# No Comments »

At work I am responsible for a program which involves a very large number of authenticated HTTP requests to retrieve data. In an effort to make it as efficient as possible, I used asynchronous HTTP requests. Furthermore, I needed to retrieve and cache certain data from the webserver across threads.

However, I ran across a terrible problem. To illustrate, I’ve simplified the code down to the following (the TimedLock object is from my post Useful IDisposable Class 1: TimedLock (Post 3 of 5)):

class Class1
{
    private ValueType sharedData;
    private object sharedDataLock = new object();

    public void Run()
    {
        for (int i = 0; i < 100; i++)
        {
            ThreadPool.QueueUserWorkItem(new WaitCallback(Process));
        }
    }

    private void Process(object state)
    {
        using (TimedLock.TryLock(sharedDataLock, TimeSpan.FromSeconds(60)))
        {
            WebRequest request = HttpWebRequest.Create(...);
            using (WebResponse response = request.GetResponse())
            {
                // update sharedData
            }
        }

        // Do other work
    }
}

The problem manifests itself as a lock timeout (or a deadlock if you use the lock keyword): One thread acquires the lock in TimedLock.TryLock() but request.GetResponse() blocks forever, so it never releases the lock. The other threads remain stuck at the TimedLock.TryLock() line, waiting for the first thread to relinquish the lock.

Why does this happen? Well, it took a long time for me to figure out, but I finally determined it is due to the problem described in this .NET Matters article:

The first thing to be aware of is that in version 1.x of the Microsoft®.NET Framework, HttpWebRequest never makes synchronous requests. What do I mean by that? Take a look at the code for HttpWebRequest.GetResponse as coded in the Shared Source CLI (SSCLI), shown here omitting the code that checks to see if the response was previously retrieved and that accounts for timeouts:

public override WebResponse GetResponse() {
    •••
    IAsyncResult asyncResult = BeginGetResponse(null, null);
    •••
    return EndGetResponse(asyncResult);
}

As you can see, HttpWebRequest.GetResponse is simply a wrapper around the pairing of BeginGetResponse and EndGetResponse. These operate asynchronously, meaning that BeginGetResponse makes the actual HTTP request from a different thread than the one from which it was called, and EndGetResponse blocks until the request has completed. The net result of this is that HttpWebRequest queues a work item to the ThreadPool for every outbound request.

In the scenario I have provided, the ThreadPool spins up a number of threads to process the QueueUserWorkItem() requests, which then all block on TimedLock.TryLock() — except for one, which gets to request.GetResponse(). Then, when GetResponse() attempts to grab a ThreadPool thread of its own (per the article), it deadlocks waiting for a ThreadPool thread to become free. Incidentally, the HttpWebRequest class is supposed to throw an Exception if the number of threads in the ThreadPool is too low, but that didn’t seem to be happening for me.

What’s the solution? I first tried keeping a minimum number of ThreadPool threads available by explicitly checking the number of available threads in the ThreadPool by using ThreadPool.GetAvailableThreads() before calling QueueUserWorkItem(), but ThreadPool threads aren’t started immediately — they are started up at a later time, spaced apart with a small delay — so GetAvailableThreads() indicated that there were plenty of threads available.

Outside of upgrading to the .NET Framework 2.0 — which isn’t even released yet — the article suggests writing a “throttling” ThreadPool which handles thread management itself and limits the number of active threads to a programmer-specified maximum number. Here’s the article’s sample implementation:

public sealed class Semaphore : WaitHandle
{
    public Semaphore() : this(1, 1) {}

    public Semaphore(int initialCount, int maximumCount)
    {
        if (initialCount < 0 || initialCount > maximumCount)
            throw new ArgumentOutOfRangeException("initialCount");
        if (maximumCount < 1)
            throw new ArgumentOutOfRangeException("maximumCount");
        IntPtr h = CreateSemaphore(
            IntPtr.Zero, initialCount, maximumCount, null);
        if (h == WaitHandle.InvalidHandle || h == IntPtr.Zero)
            throw new Win32Exception();
        Handle = h;
    }

    public void ReleaseOne()
    {
        int previousCount;
        if (!ReleaseSemaphore(Handle, 1, out previousCount))
            throw new Win32Exception();
    }

    [DllImport("kernel32.dll", SetLastError=true)]
    private static extern IntPtr CreateSemaphore(
        IntPtr lpSemaphoreAttributes, int lInitialCount,
        int lMaximumCount, string lpName);

    [DllImport("kernel32.dll", SetLastError=true)]
    private static extern bool ReleaseSemaphore(
        IntPtr hSemaphore, int lReleaseCount, out int lpPreviousCount);
}
public class ThreadPoolThrottle : IDisposable
{
    private Semaphore _throttle;

    public ThreadPoolThrottle(int maximumAllowed)
    {
        if (maximumAllowed > 1)
            throw new ArgumentOutOfRangeException("maximumAllowed");
        _throttle = new Semaphore(maximumAllowed,maximumAllowed);
    }

    public void QueueUserWorkItem(WaitCallback callback)
    {
        QueueUserWorkItem(callback, null);
    }

    public void QueueUserWorkItem(WaitCallback callback, object state)
    {
        if (_throttle == null)
            throw new ObjectDisposedException(this.GetType().FullName);
        if (callback == null)
            throw new ArgumentNullException("callback");

        _throttle.WaitOne();
        try
        {
            QueuedCallback qc = new QueuedCallback();
            qc.Callback = callback;
            qc.State = state;
            ThreadPool.QueueUserWorkItem(
                new WaitCallback(HandleWorkItem), qc);
        }
        catch
        {
            _throttle.ReleaseOne();
            throw;
        }
    }

    private void HandleWorkItem(object state)
    {
        QueuedCallback qc = (QueuedCallback)state;
        try { qc.Callback(qc.State); }
        finally { _throttle.ReleaseOne(); }
    }

    private class QueuedCallback
    {
        public WaitCallback Callback;
        public object State;
    }

    public void Dispose()
    {
        if (_throttle != null)
        {
            ((IDisposable)_throttle).Dispose();
            _throttle = null;
        }
    }
}

I was able to easily adapt this solution for my needs. Unfortunately, I do not have a solution if one desires to continue using asynchronous method calls — such as delegates’ BeginInvoke() method or Stream.BeginRead() — as they internally use ThreadPool threads.

Matching Enumerated Types Using Regular Expressions

C#, Regular Expressions No Comments »

Regular expressions are a very useful tool. Among the uses I’ve found for them are validating user input, performing simple HTML manipulation (although in general this is a bad idea — one should prefer a real HTML parser), and parsing textual data in custom formats from numerous sources.

Naturally, regular expressions have downsides as well. They are virtually a write-only language (although Perl’s x flag combined with copious comments largely alleviates this), some regular expressions have ghastly performance characteristics, learning their syntax takes quite a bit of time, far too many developers seem to be unaware of their existence, different regular expression implementations have different features, and one needs to get intimately familiar with the escaping rules for both regular expressions and the programming language (e.g. to create a regular expression which matches a single backslash character in C/C++, one needs to write “\\\\”)

One common task I often need to perform is to create a regular expression which matches any one of a number of values, e.g., matching an enumerated type1. Consider creating a regular expression which matches any two-letter U.S. state code. Most people will write something like (greatly simplified):

regex = "(AK|AL|AR|AZ|...|WA|WI|WV|WY)"

This will work fine, but as ( ) defines a capturing group I prefer to use the non-capturing (?: ) unless otherwise required:

regex = "(?:AK|AL|AR|AZ|...|WA|WI|WV|WY)"

Furthermore, since I don’t know the rules for operator precedence in regular expressions very well, I prefer to encase each allowed value in its own non-capturing group. This will also allow me to use any regular expression as an allowed value, even those which include | characters:

regex = "(?:(?:AK)|(?:AL)|(?:AR)|(?:AZ)|...|(?:WA)|(?:WI)|(?:WV)|(?:WY))"

One can easily write a function to perform this enumerated type regular expression generation. Here’s one implementation in C#:

class RegexUtils
{
    public static string CreateEnumeration(string[] regexs)
    {
        Debug.Assert(regexs != null);
        Debug.Assert(regexs.Length >= 2);

        StringBuilder sb = new StringBuilder();
        sb.Append("(?:");

        foreach (string regex in regexs)
        {
            sb.Append("(?:");
            sb.Append(regex);
            sb.Append(")|");
        }

        sb.Remove(sb.Length - 1, 1);
        sb.Append(")");
        return sb.ToString();
    }
}

The function is used as follows:

string[] stateCodeRegexs = new string[] { "AK", "AL", "AR", "AZ", ..., "WA", "WI", "WV", "WY" };
string anyStateCodeRegex = RegexUtils.CreateEnumeration(stateCodeRegexs);

Please note that the contents of stateCodeRegexsAK, AL, etc. — are themselves regular expressions and not simple character strings. This means that one can use the full set of regular expression features, but one must also beware of escaping issues.

In general, one must be very careful when combining regular expressions together. Typically, copious use of non-capturing groups is required in order to ensure correct behavior; blind string concatenation is just asking for bugs.

[1] For single characters one can use the [ ] construct, but that doesn’t work for more complicated enumerated types.

Useful IDisposable Class 3: AutoReleaseComObject (Post 5 of 5)

C#, Excel Interop No Comments »

This is the fifth and final post in a series of posts. Here are the previous posts in this series: Deterministic Finalization in Garbage-Collected Languages, Rules For Implementing IDisposable, Useful IDisposable Class 1: TimedLock, and Useful IDisposable Class 2: AutoDeleteFile.

This is the final example in my series on deterministic finalization in garbage-collected languges and the true motive behind the series: AutoReleaseComObject. The idea behind AutoReleaseComObject is simple: it is nothing but a wrapper around a COM object which calls Marshal.ReleaseComObject() upon Dispose() until the COM object’s reference count is 0 and the object is freed. Here’s the implementation:

public class AutoReleaseComObject : IDisposable
{
    private object m_comObject;
    private bool m_armed = true;
    private bool m_disposed = false;

    public AutoReleaseComObject(object comObject)
    {
        Debug.Assert(comObject != null);

        m_comObject = comObject;
    }

#if DEBUG
    ~AutoReleaseComObject()
    {
        // We should have been disposed using Dispose().
        Debug.Assert(false);
    }
#endif

    public object ComObject
    {
        get
        {
            Debug.Assert(!m_disposed);
            return m_comObject;
        }
    }

    public void Disarm()
    {
        Debug.Assert(!m_disposed);
        m_armed = false;
    }

    #region IDisposable Members

    public void Dispose()
    {
        Dispose(true);
#if DEBUG
        GC.SuppressFinalize(this);
#endif
    }

    #endregion

    protected virtual void Dispose(bool disposing)
    {
        if (!m_disposed)
        {
            if (m_armed)
            {
                int refcnt;
                do
                {
                    refcnt = Marshal.ReleaseComObject(m_comObject);
                } while (refcnt > 0);

                m_comObject = null;
            }

            m_disposed = true;
        }
    }
}

Why is this class so useful? Well, it has to do with a topic I’ve discussed before: Excel interop. As I insinuate in that post, a problem that users of the Excel object model often encounter is either runaway Excel processes which never quit, or multiple Excel processes when one would suffice. Furthermore, the Excel processes tend to stay around much longer than they have to. For C++, my solution was to either be sure to explicitly call COleDispatchDriver::ReleaseDispatch() or to use the COleDispatchDriver::m_bAutoRelease flag on all Excel objects (this is more than just the application: it is any Excel object such as Range or Workbook).

In C#, you can run into the same problem — basically the Excel process will stay around as long as any Excel COM interop object has a non-zero reference count. While I suspect the .NET Excel interop objects include code in their finalizers to decrement their COM reference counts to zero, which should mean that in the worst case the Excel process will end at the same time your .NET process ends, I think we can and should do better. After all, consider the implications if your .NET process is very long-lived, or if you repeatedly, serially interact with Excel (the system will likely unnecessarily launch many Excel processes).

The solution to these problems is to call Marshal.ReleaseComObject() on all Excel objects as soon as possible. Once all objects’ COM reference count reach zero, the Excel process will terminate. Therefore, I decided to wrap this functionality into the AutoReleaseComObject class.

Unfortunately, this makes using the Excel object model quite a bit more tedious. The casting becomes annoying, but this is easily solvable by writing a series of Excel object wrappers which inherit from AutoReleaseComObject and provide access to the wrapped object already casted to the appropriate type (I can’t wait for Whidbey’s generics). I called these objects ExcelApplicationWrapper, ExcelWorkbookWrapper, etc. and their implementation and use should be fairly obvious. However, consider what happens if you execute the following code:

using (ExcelApplicationWrapper excelAppWrapper =
           new ExcelApplicationWrapper(new Excel.Application()))
using (ExcelWorkbookWrapper workbookWrapper =
           new ExcelWorkbookWrapper(excelAppWrapper.Application.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet)))
{
    // ... Do work with workbook
}

Looks fine, doesn’t it? Wrong. excelAppWrapper.Application.Workbooks is itself an Excel object model object which also must be wrapped in AutoReleaseComObject in order for our desired behavior to happen. You need to be very careful to catch and wrap all Excel objects or you are back to square one in having near-immortal Excel processes. The above code should properly be written:

using (ExcelApplicationWrapper excelAppWrapper =
           new ExcelApplicationWrapper(new Excel.Application()))
using (ExcelWorkbooksWrapper workbooksWrapper =
           new ExcelWorkbooksWrapper(excelAppWrapper.Application.Workbooks))
using (ExcelWorkbookWrapper workbookWrapper =
           new ExcelWorkbookWrapper(workbooksWrapper.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet)))
{
    // ... Do work with workbook
}

Happy interop!

Useful IDisposable Class 2: AutoDeleteFile (Post 4 of 5)

C# No Comments »

This is the fourth post in a series of five posts. Here are the previous posts in this series: Deterministic Finalization in Garbage-Collected Languages, Rules For Implementing IDisposable, and Useful IDisposable Class 1: TimedLock. I have adjusted the post date/time to make the series sequential.

I guess my definition of tomorrow is much longer than I thought, but here’s another useful IDisposable class which I shall present without comment: AutoDeleteFile.

using System;
using System.Diagnostics;
using System.IO;

/// <summary>
/// A file wrapper which automatically deletes the file unless Disarm()
/// is called.
/// </summary>
public sealed class AutoDeleteFile : IDisposable
{
    private FileInfo m_underlyingFile;
    private bool m_armed = true;
    private bool m_disposed = false;

    public AutoDeleteFile(FileInfo underlyingFile)
    {
        Debug.Assert(underlyingFile != null);

        m_underlyingFile = underlyingFile;
    }

    ~AutoDeleteFile()
    {
        Dispose(false);
    }

    public FileInfo File
    {
        get { return m_underlyingFile; }
    }

    public void Disarm()
    {
        m_armed = false;
    }

    #region IDisposable Members

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    #endregion

    private void Dispose(bool disposing)
    {
        if (!m_disposed)
        {
            if (m_armed)
            {
                try
                {
                    m_underlyingFile.Delete();
                }
                catch (Exception)
                {
                    // If we can't delete, oh well!
                }
            }

            m_disposed = true;
        }
    }
}

Useful IDisposable Class 1: TimedLock (Post 3 of 5)

C# No Comments »

This is the third post in a series of five posts. Here are the previous posts in this series: Deterministic Finalization in Garbage-Collected Languages, and Rules For Implementing IDisposable. I have adjusted the entry date/time to make the series sequential.

For the first example of a useful custom class which implements IDisposable, I will simply link to and reproduce Ian Griffith’s TimedLock — an enhancement of the C# lock statement which allows the specification of a timeout period instead of blocking forever while trying to obtain the lock.

The code for TimedLock is reproduced below:

using System;
using System.Threading;

// Thanks to Eric Gunnerson for recommending this be a struct rather
// than a class - avoids a heap allocation.
// Thanks to Change Gillespie and Jocelyn Coulmance for pointing out
// the bugs that then crept in when I changed it to use struct...
// Thanks to John Sands for providing the necessary incentive to make
// me invent a way of using a struct in both release and debug builds
// without losing the debug leak tracking.
public struct TimedLock : IDisposable
{
    public static TimedLock Lock (object o)
    {
        return Lock (o, TimeSpan.FromSeconds (10));
    }

    public static TimedLock Lock (object o, TimeSpan timeout)
    {
        TimedLock tl = new TimedLock (o);
        if (!Monitor.TryEnter (o, timeout))
        {
#if DEBUG
            System.GC.SuppressFinalize(tl.leakDetector);
#endif
            throw new LockTimeoutException ();
        }

        return tl;
    }

    private TimedLock (object o)
    {
        target = o;
#if DEBUG
        leakDetector = new Sentinel();
#endif
    }

    private object target;

    public void Dispose ()
    {
        Monitor.Exit (target);

        // It's a bad error if someone forgets to call Dispose,
        // so in Debug builds, we put a finalizer in to detect
        // the error. If Dispose is called, we suppress the
        // finalizer.
#if DEBUG
        GC.SuppressFinalize(leakDetector);
#endif
    }

#if DEBUG
    // (In Debug mode, we make it a class so that we can add a finalizer
    // in order to detect when the object is not freed.)
    private class Sentinel
    {
        ~Sentinel()
        {
            // If this finalizer runs, someone somewhere failed to
            // call Dispose, which means we've failed to leave
            // a monitor!
            System.Diagnostics.Debug.Fail("Undisposed lock");
        }
    }
    private Sentinel leakDetector;
#endif
}

public class LockTimeoutException : ApplicationException
{
    public LockTimeoutException () : base("Timeout waiting for lock")
    {
    }
}

It is trivial to use TimedLock instead of lock in your applications. Simply change statements from:

lock (objectToLock)
{
    ... Do work while holding lock
}

… to:

using (TimedLock.Lock(objectToLock))
{
    ... Do work while holding lock
}

Others have enhanced TimedLock even futher, such as by having it keep track of the stack trace of the thread which is holding the lock.

Rules For Implementing IDisposable (Post 2 of 5)

C# No Comments »

This is the second part of a series of five posts. Here is the first post in this series: Deterministic Finalization in Garbage-Collected Languages. I have adjusted the entry date/time to make the series sequential.

In my last post, Deterministic Finalization in Garbage-Collected Languages (e.g. C#’s IDisposable/using), I discussed what IDisposable and using are and why they are useful. This post describes some rules when implementing IDisposable in a class:

  • Follow the .NET Framework Developer’s Guide Implementing a Dispose Method guidelines.
  • Try hard not to throw an Exception from Dispose() (this is analagous to the C++ basic exception safety guarantee’s demand that destructors not throw exceptions). One major reason why you should avoid throwing an Exception from Dispose() is that it could mask any Exception that is “currently” being thrown.
  • If you write a class which contains a member which implements IDisposable, your class must implement the IDisposable interface and dispose the member in its Dispose() method. (This rule obviously can become very annoying as IDisposable propagates up your implementation hierarchy.) For example:

    // FileStreamHolder must implement IDisposable because it contains
    // a member which implements IDisposable (FileStream).
    class FileStreamHolder : IDisposable
    {
        private FileStream m_fileStream;
    
        public void Dispose()
        {
            Dispose(true);
        }
    
        protected virtual void Dispose(bool disposing)
        {
            if (disposing)
            {
                // We only dispose IDisposable members if we reached this
                // function through IDisposable.Dispose().
                m_fileStream.Dispose();
            }
        }
    }
    
  • It can be useful to add a debug-only finalizer which verifies that Dispose() was called for your object. For example:
    class DisposableObject : IDisposable
    {
    #if DEBUG
        ~DisposableObject()
        {
            // The finalizer should never be called -- Dispose() should have
            // been called instead.
            Debug.Assert(false, "This object was not disposed using Dispose().");
        }
    #endif
    
        public void Dispose()
        {
            Dispose(true);
    #if DEBUG
            // The finalizer only exists in debug builds
            GC.SuppressFinalize(this);
    #endif
        }
    
        protected virtual void Dispose(bool disposing)
        {
            ....
        }
    }
    

    Incidentally, I use #if DEBUG instead of the [Conditional("DEBUG")] attribute because Conditional is not allowed on finalizers.

  • Beware of classes which implement Dispose() only through the IDisposable interface and not through the general interface of the class, as IntelliSense may not show the Dispose() method for the class, and calling Dispose() directly won’t compile. (Example: TextWriter) Use ((IDisposable) obj).Dispose(); instead.

In my upcoming posts I will show some useful classes I’ve written which implement IDisposable.

Update 2006-03-28 12:38 PM: See More on IDisposable for some further discussion.

Deterministic Finalization in Garbage-Collected Languages (Post 1 of 5)

C# No Comments »

This is the first post in a series of five posts. I have adjusted the entry date/time to make the series sequential.

This topic has been covered many times by many others (such as here and here), so if you are familiar with C#’s using statement and IDisposable interface, feel free to skip this post. I’m writing this introduction to provide the necessary background information to set up a series of subsequent posts.

Garbage collection, found in languages such as C# and Java (among many others), is a very useful feature: it largely alleviates the need for a programmer to manually handle resource management. The most commonly cited benefit is that garbage collection eliminates the need for the programmer to explicitly call heap memory management functions such as malloc and free; instead, the garbage collector automatically keeps track of whether objects are still in use and frees them when they are no longer needed.1 However, in addition to handling memory management, garbage collection may also release other scarce resources upon cleanup, such as file locks or network connections.

An important point to note about most (all?) garbage collectors is that they are nondeterministic. This means that, in general, a programmer does not and should not know when the actual garbage collection phase happens.2 In other words, a program could stop using an object but its underlying memory may not be freed for seconds, minutes, hours, days, or possibly ever. Usually this is a good thing; it can often be a large performance boost.

However, as I mentioned above, garbage collection manages more than just memory. Consider what happens when you call .NET’s File.Open() method, which returns a FileStream object with which you can read and write bytes to the file. Unless explicitly specified otherwise, the FileStream will create an exclusive lock on the underlying file; no other process (or thread) will be able to open the file for reading or writing while the FileStream is open. Usually this isn’t much of a problem, as once the process has ended the file will be closed and most processes are short-lived.

Consider, if you will, the case where the process isn’t short-lived. Perhaps the process opened up the file and wrote to it without explicitly closing it, expecting the garbage collector to eventually notice that the process was done with the file and to close it, releasing the lock. However, as the garbage collector is nondeterministic, we simply don’t know when — if ever — the garbage collector will close the file, and the process will keep a lock on the file for potentially a very long time.4

Another way to illustrate the above problem is to consider the following C# code which first writes to a file and then immediately reopens the file to read from it; the code as shown is virtually guaranteed to fail.

string filename = ...;

FileStream writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
writeStream.Write(...);

// The following line is virtually guaranteed to throw an Exception as
// it cannot open the file because writeStream will not have been garbage
// collected yet.
FileStream readStream = File.Open(filename, FileMode.Open, FileAccess.Read);

Now, many developers will say “That’s easy to solve. Just call the FileStream.Close() method when you are done with the FileStream.” (A few may say call GC.Collect() but that’s a bad idea3) OK, fine, let’s add the Close() to the above code:

string filename = ...;

FileStream writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
writeStream.Write(...);
writeStream.Close();

In the above code, what happens if writeStream.Write() throws an exception which is caught and handled at a higher level? That’s right — Close() is never called and once again you are dependent on the whims of the garbage collector to clean up the file.5

One common solution to the above problem is to wrap the code using a try {} finally {} block. For example:

string filename = ...;

FileStream writeStream = null;
try
{
    writeStream = File.Open(filename, FileMode.Create, FileAccess.Write);
    writeStream.Write(...);
}
finally
{
    if (writeStream != null)
        writeStream.Close();
}

The C# developers, being pretty bright people, recognized that the above situation is actually fairly common — that in addition to garbage collection’s nondeterministic finalization, programs also often need a form of deterministic finalization to free scarce resources as soon as possible. To this end, they invented two concepts: the IDisposable interface and the using statement.

The IDisposable interface contains exactly one method: Dispose(). It is nothing but a cleanup method which uses a slightly more generic name than Close(). Many diverse objects implement IDisposable, from AsymmetricAlgorithm to Image to SqlConnection. A list of direct implementors of IDisposable in the .NET Class Library is here, but please note that it doesn’t include classes which indirectly implement IDisposable by having a parent (or grandparent, or great-grandparent…) class which is a direct implementor.

The using statement is basically nothing but syntactic sugar, as

using (FileStream fs = File.Open(filename, FileMode.Create, FileAccess.Write))
{
    ... do work with fs
}

… is more-or-less short for

FileStream fs = null;
try
{
    fs = File.Open(filename, FileMode.Create, FileAccess.Write);
    ... do work with fs
}
finally
{
    if (fs != null)
    {
        ((IDisposable) fs).Dispose();
    }
}

The cast in the code fragment ((IDisposable) fs).Dispose(); is necessary because it is possible in C# to implement interface methods which are only exposed via that particular interface and not by the implementing class (see here). In other words, the following code won’t compile:

class A : IDisposable
{
    void IDisposable.Dispose() { ... }
}

A a = new A();
a.Dispose();

… whereas if you replace a.Dispose() with ((IDisposable) a).Dispose(); it will. This was likely added to allow a class to implement two separate interfaces which have a method with an identical name and signature.

People familiar with C++ may note, as Herb Sutter did, that using and IDisposable are little more than a verbose (and perhaps uglier) form of a C++ destructor. Furthermore, since a C++ destructor is automatically executed (whether upon block exit for stack-based objects or upon delete for heap-based objects), whereas Dispose() must be explicitly invoked, one is much less likely to forget to call a C++ destructor (i.e. essentially never unless one leaks memory). This is important because it is usually bad to forget to call Dispose() for any objects which implement IDisposable once you are done with them. (By the way, Anders Hejlsberg, I wouldn’t mind a construct in C# which provides for automatically calling Dispose() at block-end; it would help eliminate using’s verbosity.)

In my upcoming posts, I will discuss some guidelines for writing classes which implement IDisposable and then describe and demonstrate some useful classes which I have written that implement IDisposable.

[1] If you are interested as to how the .NET garbage collector works, read the article Garbage Collector Basics and Performance Hints on MSDN.
[2] Savvy readers may be aware that many garbage collected languages provide a way for the programmer to force (more like strongly suggest) that a garbage collection happen at this instant — such as .NET’s GC.Collect() method3.
[3] Extremely savvy readers may be aware that in general calling the GC.Collect() method is a bad idea.
[4] File locking isn’t the only reason to worry about nondeterministic finalization of FileStream objects. Another concern is the fact that FileStream performs buffering, and the data won’t be flushed unless Flush(), Close(), or Dispose() is called. Therefore, if you were to open up a file for writing with the permissive FileShare.Read flag (which probably isn’t a good idea in most cases), there’s a high probability that readers will see incomplete data until the aforementioned functions are called (either explicitly or through a form of deterministic finalization).
[5] I used the example of file locking because it is close to heart. At a previous job I had to deal with the problem of a coworker inadvertently holding onto locks in perpetuity in a daemon process quite a few times. I presume the problem related to not closing the file when exceptions were thrown (otherwise it would have happened more often). Unfortunately the code was apparently poorly designed or not understood and the program was not fixed; instead the solution was to reboot the machine. Yow.

Windows Forms FAQ

C# No Comments »

Today I ran across an invaluable resource for Windows Forms development: George Shepherd’s Windows Forms FAQ. The DataGrid section is especially relevant to my recent work.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in