Internet Explorer Event Handler Leaks

If you’ve been developing for the open web for long, you’ll know that Internet Explorer is the bane of any web developer’s existence. The number of CSS bugs are enough to give any web developer a headache, but then add to that JScript’s deviations from the ECMAScript 3 spec and you have yourself a pain in the neck as well. Grab your favorite pain reliever, because there is another nuisance: memory leaks.

Leaks, Circular References, and Host Objects

There are two types of leaks in web development: single-page and cross-page. A single-page leak happens when an object is not collected by the garbage collector (GC) while the page is running, but is cleaned up when the page is unloaded. The cause of single-page leaks is usually an object being referenced unexpectedly. A cross-page leak happens when an object is not collected by the GC when the page is unloaded. We will be focusing on cross-page leaks in this blog post; from this point on, “leak” will refer to cross-page leaks.

The cause of cross-page leaks is usually circular references that trip up the GC. A circular reference is formed when object A references object B which, in turn, references object A. This seems to be pretty simple to avoid, but you can add more objects into the mix and still have a circular reference: A -> B -> C -> D -> A. In garbage collected languages, such as ECMAScript (which includes JavaScript and JScript), circular references are (or should be) handled and cleaned up properly. In fact, most implementations of ECMAScript do a good job of cleaning up circular references between native objects. However, ECMAScript 3 defined something called “host objects” that don’t have to follow the rules that natvie objects do (see What’s Wrong With Extending the DOM by Juriy Zaytsev for a run-down of host object wackiness). Circular references between host objects and native objects are what trips Internet Explorer up. In fact, all versions of Internet Explorer – except patched versions of 6 and version 7 – leak when you leave a page that contains code that forms circular references between host objects and native objects (yes, that includes version 8).

Why do I mention host objects and why can’t we avoid them all together? Internet Explorer implements DOM nodes and ActiveXObjects as host objects that wrap COM+ objects (the exception to the rule is version 8 where only ActiveXObjects are host objects; you can run this test to confirm the ActiveXObject leak). This wouldn’t be so bad, except the interaction between the garbage collection in JScript and the reference counting that COM+ objects use cause problems. When a native object (NO) is referenced, it is put on the “scavenger” list until it is no longer referenced; when a host object (HO) is referenced, the COM+ object’s reference count is increased and will only be destroyed when the reference count reaches 0. When a NO and HO are in a circular reference, the HO’s COM+ object’s reference count will be decreased when the NO is garbage collected or stops referencing the HO; the NO will be taken off the “scavenger” list when nothing references it – including COM+ objects. This means any DOM node or ActiveXObject that references native objects has the potential to leak when the page is unloaded. But all is not lost! The pattern is easy to identify and there is a fail-safe way to keep leaks from happening.

Identifying the Pattern

The basic pattern is this:

DOM_Node.objectRef ->
    Object ->
    Object.nodeRef ->
    DOM_Node

This translates to the following code:

var elem = document.getElementById("someElement"),
    obj = {};
elem.someObject = obj;
obj.someElement = elem;

This is pretty easy to spot (and is a reminder of why not to use expando properties that are non-primitive), but remember that you can add multiple objects into the mix and still have a circular reference:

var elem = document.getElementById("someElement"),
    obj1 = {}, obj2 = {};
elem.someObject = obj1;
obj1.someObject = obj2;
obj2.someElement = elem;

That seems easy enough, but there’s a much more elusive version of this pattern (adapted from the comp.lang.javascript FAQ on closures):

DOM_Node.onevent ->
    inner_function_object ->
    inner_function_object.scope_chain ->
    outer_activation_object ->
    outer_activation_object.nodeRef ->
    DOM_Node

This translates into the following (note the surrounding closure):

(function(){
    var elem = document.createElement("div");
    elem.attachEvent("onclick", function(){
        elem.innerHTML = "foo";
    });
})();

I’m not going to explain closures (for a good run-down, see the aforementioned FAQ), but the basic gist of what is happening here is a circular reference is formed through the activation object of the outer function when accessed through the scope chain of the event handler. In pseudo-code, the ES3 internals would look like this:

outer_activation_object.elem = document.createElement("div");
// The scope chain is formed by prepending inner_func's activation
// object to a copy of inner_func's [[Scope]], which is a copy of
// outer_func's scope chain.
inner_func.scope_chain = [
    inner_activation_object,
    outer_activation_object,
    global_object
];
elem.onclick = anonfunc;

As you can see, the reference chain looks like this:

elem.onclick ->
    inner_func ->
    inner_func.scope_chain ->
    outer_activation_object ->
    outer_activation_object.elem ->
    elem

Visiting one or two pages that leak won’t be too bad (unless they’re using an inordinate amount of event handlers). However, if you visit several pages that have circular references on them you will notice IE consuming more and more memory. I’ve put together a page that will leak in IE6 and refresh itself every 2 seconds to demonstrate the leak. You can use the system monitor to monitor overall system memory usage or a tool like Process Explorer to monitor the memory usage of individual processes. This example will be our starting point and we’ll look at a couple ways to keep a page like this from leaking. All of the examples in this article can also be downloaded in a tarball or a zip file.

If you look at the source of this leaky page, you will notice I include helper.js. As the name suggests, this file has some helper functions:

As we look at the different ways to break the leak pattern, I will be modifying the listen and stopListening functions within source of the page; I won’t have to modify anything in helpers.js. Now that we have all of that out of the way, let’s start exploring ways to prevent event handler leaks.

Fixing It, the Microsoft Way

Microsoft is well aware of this problem; in fact, they have an MSDN article on leak patterns in IE, an article from Scott Isaacs, and a Knowledge Base article as well. Let’s take a look at the ways they recommend to fix the problem.

Detach on Unload

The first way that Microsoft recommends is to call elem.detachEvent(eventName, handler) in an unload handler. The example in the MSDN article mentioned earlier recommends storing the handler in an expando property of the node, however I won’t show that example here because storing things in an expando property of a node is a no-no in IE for a few reasons:

The third reason is the one that should make you wary of their solution; as we saw earlier, expandos are an easy way to make circular references and their solution is no exception. Suppose the node with the expando on it is destroyed via .innerHTML: unless we keep a reference to that node around, we’ve lost it and cannot detach the handler and we now have a leak.

A slightly better solution that runs in the same vein as Microsoft’s would be to register an unload handler, which forms a closure around the element, for each event registered:

function listen(obj, evt, handler){
    var e = normalizeEventName(evt);
    _listen(obj, e, handler);
    // unload every attached event
    window.attachEvent("onunload", function(){
        obj.detachEvent(e, handler);
    });
}

This breaks the circular reference manually by removing the reference to handler from the element. You can see for yourself in this example. Although this works, it could potentially be registering hundreds, if not thousands, of unload handlers (depending on its use). A better solution to this would be to use one unload handler with a cache of the arguments used to register each handler:

(function(global){
    var _evtData = {}, _emptyObject = {}, _nextId = 0,
        unloadAttached = false;

    function attachUnload(){
        if(isHostType(document, "attachEvent")){
            // one onunload handler to rule them all
            global.attachEvent("onunload", function(){
                for(var i in _evtData){
                    if(!(i in _emptyObject)){
                        stopListening(i);
                    }
                }
            });
        }
        unloadAttached = true;
    }

    function listen(obj, evt, handler){
        var id = _nextId++,
            e = normalizeEventName(evt);
        _listen(obj, e, handler);
        _evtData[id] = [obj, e, handler];

        if(!unloadAttached){
            attachUnload();
        }
        return id;
    }
    function stopListening(id){
        if(id in _evtData){
            var data = _evtData[id];
            _stopListening.apply(null, data);
            delete _evtData[id];
        }
    }

    global.listen = listen;
    global.stopListening = stopListening;
})(this);

The API changes slightly: listen now returns the ID of the cached arguments, and stopListening accepts that ID. It seems that this is a simple solution to a big problem!

This simple solution, however, introduces a new problem: adding an unload handler breaks the “back/forward” cache (or, bfcache) in modern browsers. Although we added a check for attachEvent in our attachUnload function to weed out Firefox, Safari, and Chrome, this would break bfcache in Opera (since Opera has attachEvent); it also could match IE10+ where bfcache might be implemented. Since there’s no reliable way to detect if a browser has bfcache, it’s best to avoid possibly breaking it.

Define Handlers in Another Scope

Another solution Microsoft suggests is to define all event handlers in a separate scope so the element isn’t in the handlers' scope chain; the code shown in their example uses the global scope, but having all of your event handlers in the global scope is impractical for larger applications. For this solution, we’ll revert to listen and stopListening from the original leaking example and modify the loop:

while (++i < l){
    var hookupOnClick = (function(){
        function handler(){
            this.innerHTML = "foo";
        }
        return function(elem){
            listen(elem, "onclick", handler);
            return handler;
        }
    })();
    (function(){
        var e = document.createElement("div");
        var handler = hookupOnClick(e);
        // use stopListening(e, "onclick", handler) to de-register
        container.appendChild(e);
    })();
}

But do we really want to do that every time we set up a handler and for each event type? It also doesn’t look very elegant (IMHO). This solution is starting down the right path, though: we need to create the function attached to the node in a different scope. Care needs to be taken because it’s easy to just add another reference into the circular reference chain:

function createHandler(func){
    function handler(){
        func.apply(this, arguments);
    }
    return handler;
}
(function(){
    var elem = document.getElementById("someElement");
    listen(elem, "onclick", createHandler(function(){}));
})();

At first glance, this looks like it should work; a trained eye would see that this is just adding another reference into the chain:

elem.onevent ->
    handler ->
    handler.scope_chain ->
    createHandler_activation_object.func ->
    inner_func ->
    inner_func.scope_chain ->
    outer_activation_object ->
    outer_activation_object.elem ->
    elem

We need a way to break that chain.

Fixing It By Global Reference

As we’ve seen, accessing an element from a function through an activation object through the scope chain will cause a leak if it’s not properly cleaned up or worked around. However, there’s an interesting behavior I haven’t mentioned: a circular reference through the global object through the scope chain will not leak! This behavior was pointed out to me by John David Dalton. By combining this fact with what we’ve learned so far, we can create a generic event handling system that the browser will clean up after:

var _cache = {};
var createWrapper = (function(){
    var id = 0;
    function createLookupHandler(id){
        return function(){
            if(id in _cache && _cache[id]){
                _cache[id].apply(this, arguments);
            }
        };
    }
    function createWrapper(func){
        var cid = id++,
            wrapper = createLookupHandler(cid);
        _cache[cid] = func;
        return wrapper;
    }

    return createWrapper;
})();

function listen(node, evt, handler){
    var wrapper = createWrapper(handler);

    _listen(obj, normalizeEventName(evt), wrapper);

    return wrapper;
}
function stopListening(node, evt, handler){
    _stopListening(node, evt, handler);
}

Let’s analyze the reference chain created here:

elem.onclick ->
    createLookupHandler_inner ->
    createLookupHandler_inner.scope_chain ->
    global_object ->
    global_object._cache ->
    _cache ->
    _cache[id] ->
    LeakFunc ->
    LeakFunc.scope_chain ->
    outer_activation_object ->
    outer_activation_object.elem ->
    elem

As you can see, we have a circular reference but we’re going through the global object through a scope chain. _cache MUST be a global variable or reached through the global scope. Accessing _cache through an activation object through the scope chain will cause a leak. For instance, the following will leak:

var createWrapper = (function(){
    var _cache = {};
    var id = 0;
    function createLookupHandler(id){
        return function(){
            if(id in _cache && _cache[id]){
                _cache[id].apply(this, arguments);
            }
        };
    }
    ...
})();

Why? _cache is now accessible via the outer function’s activation object:

elem.onevent ->
    createLookupHandler_inner ->
    createLookupHandler_inner.scope_chain ->
    createWrapper_inner_activation_object ->
    createWrapper_inner_activation_object._cache ->
    _cache ->
    _cache[id] ->
    LeakFunc ->
    LeakFunc.scope_chain ->
    outer_activation_object ->
    outer_activation_object.elem ->
    elem

The following will also leak:

var _myCache = {};
var createWrapper = (function(_cache){
    var id = 0;
    function createLookupHandler(id){
        return function(){
            if(id in _cache && _cache[id]){
                _cache[id].apply(this, arguments);
            }
        };
    }
    ...
})(_myCache);

In this example, _myCache is added to the outer function’s activation object via the arguments (which are added to the activation object):

elem.onevent ->
    createLookupHandler_inner ->
    createLookupHandler_inner.scope_chain ->
    createWrapper_inner_activation_object ->
    createWrapper_inner_activation_object._cache ->
    _myCache ->
    _myCache[id] ->
    LeakFunc ->
    LeakFunc.scope_chain ->
    outer_activation_object ->
    outer_activation_object.elem ->
    elem

The object used to store the function references MUST BE REFERENCED THROUGH THE GLOBAL SCOPE. My best guess as to why is that somehow IE6 will increase the reference count of a COM+ object (in this case, a DOMNode) when it is referenced through an activation object through the scope chain. However, it seems that looking up an object through the global scope (which means you’re ultimately going through the scope chain) doesn’t have this same effect. If someone can shed some more light on this, I would be much obliged. I have set up a gist to try to explain what is going on in terms of the ES3 spec and I would appreciate any corrections there as well.

The following examples will not leak because they all ultimately look up functions through the global scope:

var createWrapper = (function(global){
    global._cache = {};
    var id = 0;
    function createLookupHandler(id){
        return function(){
            if(id in _cache && _cache[id]){
                // note that we're referencing _cache
                // via the global scope
                _cache[id].apply(this, arguments);
            }
        };
    }
    ...
})(this);
var my = {
    really: {
        long: {
            namespace: {
                _cache: {}
            }
        }
    }
};
var createWrapper = (function(){
    var id = 0;
    function createLookupHandler(id){
        return function(){
            // note that we're not closing around
            // any part of the namespace object
            var _cache = my.really.long.namespace._cache;
            if(id in _cache && _cache[id]){
                _cache[id].apply(this, arguments);
            }
        };
    }
    ...
})();

Event Handlers in Libraries

I’ve taken the liberty of writing some leak tests for the latest versions (at the time of this article) of 6 of the most popular JavaScript libraries used today (full disclosure: I am a Dojo committer). I have included the results of the tests on IE6 version 6.0.3790.3959 running on Windows 2003 (this version doesn’t have the patch applied to it to fix the circular reference leak). Note that I have not included which method the library uses to prevent leaks, just if it leaks or not:

Conclusion

From all that we’ve seen here, the following pattern (or one similar to it) should be used to prevent leaks in event handlers in Internet Explorer without having to use an unload handler:

var _cache = {};

var createHandler = (function(){
    var id = 0;

    function createLookupHandler(id){
        return function(){
            if(id in _cache && _cache[id]){
                _cache[id].apply(this, arguments);
            }
        };
    }

    function createHandler(func){
        var cid = id++,
            handler = createLookupHandler(cid);

        _cache[cid] = func;

        return handler;
    }

    return createHandler;
})();

elem.attachEvent("onclick", createHandler(function(){}));