Garbage Collection and the Finalizer

One aspect of modern web development that sometimes seems to be taken for granted is memory management. While you might not need to create a custom boot disk anymore in order to run your application on a modern machine, it is still important to understand how your memory allocations are cleaned up. Two of the main components to cleaning up memory allocation are the garbage collector and the finalizer.

Garbage Collector Basics
The garbage collector (GC) does pretty much what its name suggests, it collects the garbage from your application and throws it away. However, just like the garbage trucks that empty trash cans placed by the side of the road, the garbage collector only cleans up memory that is in a certain state. Before you can understand what the GC does you will need to know what a GC root is. If you think of your application as a tree and all of your application’s objects as the leaves in the tree, then a GC root would be an object that is like one of the roots in the tree upon which everything else is built. The most common GC roots are going to be static objects, objects global to your application, anything that is currently in scope, and any objects in the finalizer queue.

At a high level, the GC will build a list of all objects that can be reached from at least one of the GC roots and then remove everything else. Since the GC will almost always pause the application’s threads while it runs, the GC is set to only run when triggered. The GC is triggered when the available physical memory hits a specified threshold, the ratio of allocated memory to total memory hits a certain threshold, or the programmer manually triggers it. The GC first walks all of the reference trees starting from your roots and marks any objects that it finds as a rooted object, meaning that they are reachable by one of the roots, and then walks the entire object tree and cleans (sweeps) any objects that are not rooted. Since the memory heap will become fragmented over time as objects are removed, the GC will also rearrange the objects on the heap as needed during collection in order to reduce the fragmentation of the heap.

Generational Collection
If the garbage collector was required to consider every object and walk the entire heap every time that it ran then performance would suffer. To mitigate this, the garbage collector breaks the heap up into smaller groups of objects based on age, called generations, and then adjusts the collection times for each generation. The collection intervals on the generations are based on the idea that most objects are short lived, so the younger generations will be collected much more frequently than the older generations. When an object survives a collection of its current generation, then it will be promoted to the next higher generation until it reaches the oldest generation, where it will stay. .NET uses three generations, gen0, gen1, and gen2, with gen0 being the youngest and gen2 being the oldest generation. When a generation is collected, the GC will first collect all of the younger generations, which means that a gen2 collection is essentially a complete garbage collection since both gen0 and gen1 are also collected.

Object Heaps
.NET also utilizes multiple object heaps for allocating objects. The small object heap (SOH) is intended for objects that require less than 85KB of memory and any objects allocated in the SOH will start in gen0. Any objects that will require larger than 85KB of memory will be allocated on the large object heap (LOH). There are two very important differences that developers and application administrators need to know. The first difference is that objects on the LOH are only collected with gen2 garbage collections. This means that any large objects will most likely be long lived since gen2 collections happen infrequently. The second important difference is that the LOH is not compacted when the gen2 collection happens. As a consequence of not compacting the LOH, if your LOH becomes too fragmented then you may see OutOfMemoryExceptions get thrown even though that there might actually be enough memory available. However, starting in .NET 4.5.1 there is a property that the developer can use to force a large object heap compaction, but you should profile your system prior to doing that in production to determine if the cost is worth the gain.

The Finalizer
And now we come to the finalizer. If an object makes use of any unmanaged resources, then you should add a finalizer method to that object to ensure that the resources are released prior to the object getting collected by the GC. Every CLR instance in your application’s process will have a dedicated finalizer thread that is responsible for executing the finalizer methods for any objects in the finalizer queue for that instance of the CLR.

The finalizer does have an effect on garbage collection. When the GC processes an object with a finalizer method, instead of sweeping it, the GC will move it to the finalizer queue to be processed by the finalizer and then move on to the next object. Since the finalizer queue is a GC root, this guarantees that any object with a finalizer method will survive at least one more collection than it would have without a finalizer method, and since the finalizer runs independently of the garbage collector, there is no guarantee of when the finalizer will actually process objects out of the queue. After an object is finalized, it becomes eligible to be garbage collected again and finally cleaned up.

If you implement the IDisposable pattern and a finalizer method,  you can make use of the GC.SuppressFinalize(this) method in your Dispose() method. This lets the garbage collector know that all of the cleanup was performed already and that the finalizer method does not need to be executed. This will also allow the object to be garbage collected earlier, since it would no longer be dependent on the finalizer processing it first.

Applications for the Developer
The first takeaway for you to consider involves the LOH. The components of your different objects are stored separately on the heap, so even though the following object contains 6 different 50KB byte arrays totaling 300KB, this object will not end up on the LOH since each child object is stored independently. However, if you read a 100KB text file into a byte array, then that array will be allocated on the LOH and will only be garbage collected during gen2 collections. If you are going to be performing a lot of LOH allocations then you should evaluate using an object that stores the data in smaller chunks so that each chunk could be allocated on the SOH instead of the LOH, otherwise you might end up with a bunch of long gen2 collections and a fragmented LOH. There are some tricks that a developer can use to force data to stay in the SOH. For instance, consider the following example:

byte[90000] data = new byte[90000]; // allocated on the LOH
List data2 = new List(); // allocated on the SOH

A custom Stream implementation could be created that would stream the bytes stored in data2 as if they were a single byte array. The trade-off with this implementation is that data2 would have a larger memory footprint due to the List, but it would be stored on the SOH.

The next takeaway involves the finalizer. Objects that need to be finalized are guaranteed to survive at least one garbage collection since they are moved to the finalizer queue after the GC decides that they are not rooted. Depending on when the finalizer actually processes them they might actually survive multiple collections. Because of this, you should always consider whether your finalizer method is actually necessary. If it is necessary, then also consider implementing the IDisposable pattern and use the following code to perform the cleanup earlier removing the need for the object to be placed into the finalizer queue.

public void Dispose()
{
    Dispose(true);
}

private void Dispose(bool disposing)
{
    if (disposing)
    {
    // Dispose all managed resources here
    }
    // Dispose all unmanaged resources here
    GC.SuppressFinalize(this); // This will keep this object from moving to the finalizer queue
}

~MyObject()
{
    // Clean up unmanaged resources here
}

Another very important thing to know about the finalizer thread deals with exception handling. According to the MSDN documentation, http://msdn.microsoft.com/en-us/library/system.object.finalize(v=vs.110).aspx, any unhandled exceptions from a finalizer method will by default terminate the process. Somewhat related to this is that the order of finalization of objects cannot be guaranteed, so a finalizer method should never expect that any other managed objects, even child objects of the current class, still exist and have not been collected or finalized.

The final takeaway is one of the most important. Each instance of the CLR in your application only gets a single finalizer thread. The average web application that runs in a single application pool in IIS will only have one finalizer thread regardless of how many CPU cores the server has. It is very important that the finalizer does not get deadlocked since that will eventually cause the server to run out of memory. Locking statements should be used very carefully to prevent a deadlock.

Summary
In this article we have covered the basics of garbage collection and object finalization. These are still important items for developers to be aware of, and hopefully this article has helped to shed some light on things to watch out for in any application’s code base.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: