Revisiting Excel Interop

C#, Excel Interop No Comments »

I ran into problem today relating to Excel interop. A coworker made a change to a C# application I wrote and was trying to build it. The program relied on a project which had a reference to the Microsoft Excel 9.0 Object Library which ships with Office 2000. However, the coworker had Office 2003 installed which includes the Excel 11.0 Object Library and not the Excel 9.0 Object Library. Because of this, he could not build the application.

I first considered copying the Excel 2000 executable to his machine to try to make the Excel 9.0 Object Library available, but I decided this was nasty and possibly illegal. I next considered using the Microsoft Office XP Primary Interop Assemblies but ruled them out because I must support Office 2000. I then spent a good amount of time trying to use the Excel 5.0 Object Library (distributed by both Office 2000 and Office 2003) but I ran into some fairly challenging issues relating to the lack of documentation and difficulties with the autogenerated wrapper classes.

My eventual solution was to use the lowest common denominator: my own custom Excel wrapper classes which map directly to objects in the Excel object model but internally use OLE automation (a.k.a. run-time binding a.k.a. IDispatch). The wrapper classes inherit from AutoReleaseComObject to get nice IDisposable-based COM Release() semantics (which is important for Excel!). A sample wrapper class looks something like:

// Maps directly to the Workbooks object in the Excel
// object model.
public sealed class ExcelWorkbooks : AutoReleaseComObject
{
    // Needed for .InvokeMember()
    private Type m_excelWorkbooksType;

    public ExcelWorkbooks(object comObject) : base(comObject)
    {
        m_excelWorkbooksType = comObject.GetType();
    }

    // Maps directly to the Workbooks.Add() method in the
    // Excel object model.
    public ExcelWorkbook Add()
    {
        object o = m_excelWorkbooksType.InvokeMember
            (
            "Add",
            BindingFlags.InvokeMethod,
            null,
            this.ComObject,
            new object[] {}
            );
        return new ExcelWorkbook(o);
    }
}

These IDispatch-based wrapper classes give me maximum flexibility with minimum dependencies (and zero compile-time dependencies) but they are extraordinarily tedious to write. In the future I may experiment with using custom attributes and run-time code generation through System.CodeDom to simplify writing these wrappers. Alternatively, I may look into generating the code at compile-time but that would probably mean moving to a more sophisticated build tool such as NAnt.

Minimize your dependencies is an important rule to keep in mind when programming.

Useful IDisposable Class 3: AutoReleaseComObject (Post 5 of 5)

C#, Excel Interop No Comments »

This is the fifth and final post in a series of posts. Here are the previous posts in this series: Deterministic Finalization in Garbage-Collected Languages, Rules For Implementing IDisposable, Useful IDisposable Class 1: TimedLock, and Useful IDisposable Class 2: AutoDeleteFile.

This is the final example in my series on deterministic finalization in garbage-collected languges and the true motive behind the series: AutoReleaseComObject. The idea behind AutoReleaseComObject is simple: it is nothing but a wrapper around a COM object which calls Marshal.ReleaseComObject() upon Dispose() until the COM object’s reference count is 0 and the object is freed. Here’s the implementation:

public class AutoReleaseComObject : IDisposable
{
    private object m_comObject;
    private bool m_armed = true;
    private bool m_disposed = false;

    public AutoReleaseComObject(object comObject)
    {
        Debug.Assert(comObject != null);

        m_comObject = comObject;
    }

#if DEBUG
    ~AutoReleaseComObject()
    {
        // We should have been disposed using Dispose().
        Debug.Assert(false);
    }
#endif

    public object ComObject
    {
        get
        {
            Debug.Assert(!m_disposed);
            return m_comObject;
        }
    }

    public void Disarm()
    {
        Debug.Assert(!m_disposed);
        m_armed = false;
    }

    #region IDisposable Members

    public void Dispose()
    {
        Dispose(true);
#if DEBUG
        GC.SuppressFinalize(this);
#endif
    }

    #endregion

    protected virtual void Dispose(bool disposing)
    {
        if (!m_disposed)
        {
            if (m_armed)
            {
                int refcnt;
                do
                {
                    refcnt = Marshal.ReleaseComObject(m_comObject);
                } while (refcnt > 0);

                m_comObject = null;
            }

            m_disposed = true;
        }
    }
}

Why is this class so useful? Well, it has to do with a topic I’ve discussed before: Excel interop. As I insinuate in that post, a problem that users of the Excel object model often encounter is either runaway Excel processes which never quit, or multiple Excel processes when one would suffice. Furthermore, the Excel processes tend to stay around much longer than they have to. For C++, my solution was to either be sure to explicitly call COleDispatchDriver::ReleaseDispatch() or to use the COleDispatchDriver::m_bAutoRelease flag on all Excel objects (this is more than just the application: it is any Excel object such as Range or Workbook).

In C#, you can run into the same problem — basically the Excel process will stay around as long as any Excel COM interop object has a non-zero reference count. While I suspect the .NET Excel interop objects include code in their finalizers to decrement their COM reference counts to zero, which should mean that in the worst case the Excel process will end at the same time your .NET process ends, I think we can and should do better. After all, consider the implications if your .NET process is very long-lived, or if you repeatedly, serially interact with Excel (the system will likely unnecessarily launch many Excel processes).

The solution to these problems is to call Marshal.ReleaseComObject() on all Excel objects as soon as possible. Once all objects’ COM reference count reach zero, the Excel process will terminate. Therefore, I decided to wrap this functionality into the AutoReleaseComObject class.

Unfortunately, this makes using the Excel object model quite a bit more tedious. The casting becomes annoying, but this is easily solvable by writing a series of Excel object wrappers which inherit from AutoReleaseComObject and provide access to the wrapped object already casted to the appropriate type (I can’t wait for Whidbey’s generics). I called these objects ExcelApplicationWrapper, ExcelWorkbookWrapper, etc. and their implementation and use should be fairly obvious. However, consider what happens if you execute the following code:

using (ExcelApplicationWrapper excelAppWrapper =
           new ExcelApplicationWrapper(new Excel.Application()))
using (ExcelWorkbookWrapper workbookWrapper =
           new ExcelWorkbookWrapper(excelAppWrapper.Application.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet)))
{
    // ... Do work with workbook
}

Looks fine, doesn’t it? Wrong. excelAppWrapper.Application.Workbooks is itself an Excel object model object which also must be wrapped in AutoReleaseComObject in order for our desired behavior to happen. You need to be very careful to catch and wrap all Excel objects or you are back to square one in having near-immortal Excel processes. The above code should properly be written:

using (ExcelApplicationWrapper excelAppWrapper =
           new ExcelApplicationWrapper(new Excel.Application()))
using (ExcelWorkbooksWrapper workbooksWrapper =
           new ExcelWorkbooksWrapper(excelAppWrapper.Application.Workbooks))
using (ExcelWorkbookWrapper workbookWrapper =
           new ExcelWorkbookWrapper(workbooksWrapper.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet)))
{
    // ... Do work with workbook
}

Happy interop!

Using the Excel Object Model and Performance

C++, Excel Interop No Comments »

Recently I’ve had to write a bit of code which communicates with Microsoft Excel using its object model. Here are a few things I have learned from this experience.

  • Interaction with the Excel object model seems to use some kind of inter-process communication with an Excel process that is started behind the scenes. If things are not shut down properly, this Excel process will continue to run indefinitely in the background. Be sure to periodically check the Task Manager for any runaway Excel processes — these typically indicate a bug in your code or incomplete shutdown (perhaps because you chose “Stop Debugging” from the debugger).
  • If you are using the Excel object model using MFC, follow the example in Microsoft KB Article 178781: HOWTO: Automate Excel Using MFC and Worksheet Functions. Be sure to call COleDispatchDriver::ReleaseDispatch() or use the COleDispatchDriver::m_bAutoRelease member on all relevent objects or the Excel process may never stop.
  • The Excel object model documentation (which Microsoft KB Article 222101: How To Find and Use Office Object Model Documentation helps you find) is quite horrible, at least as of Office 2000. It is also written exclusively with the VB developer in mind.
  • To give control of the running, hidden Excel process with which you are interacting to the user, use the following code:

    Excel::_Application app;
    // Create and work with app...
    app.SetVisible(true);
    app.SetUserControl(true);
    
  • Even if Excel is not visible to the user, Application::Quit() may pop up a hidden dialog asking if the user wants to save the changes that were made through the dialog box. Since the dialog is not visible, Excel will never shut down. To prevent this dialog, either set Application.DisplayAlerts to false or set Workbook.Saved to true for all modified workbooks. The former is preferred.

  • Each call using the Excel object model is very, very slow, probably as a result of the use of IPC. This means that the typical way one would think of interacting with cell values in Excel — iterating cell-by-cell within a set of nested for loops — is often too slow to be practical. Instead, I work in selections of nRows rows by nCols columns and use a two-dimensional SAFEARRAY. For example:

    COleSafeArray rawData;
    DWORD rawDataDimensions[2];
    rawDataDimensions[0] = nRows;
    rawDataDimensions[1] = nCols;
    rawData.Create(VT_VARIANT, 2, rawDataDimensions);
    
    // Populate the values of rawData…
    
    // Select a range of size nRows x nCols
    Excel::Range range = wksheet.GetRange(varUpperLeftCell,
                                          varLowerRightCell);
    
    // Set the cells’ values in one call to .SetValue()
    // instead of setting individual cell values
    range.SetValue(rawData);
    
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in