This post is about a wonderfully embarrassing experience I had recently from which you can (hopefully) learn something. Or - at the very least - this can serve as a reminder of what you already know.
A few weeks ago, a couple of guys from the Infragistics IS department and I decided to get to the bottom of some of the performance issues we had been living with for a while on our main website. So, we locked ourselves in a room for a few days and scoured through the code for the entire website. The biggest thing we found ended up being a few stupid lines of code I had written...
Introducing the Bug
It all started with "hey, we've gotta add RSS feeds to our site." The idea was to be able to add a whole bunch of feeds to a page, and then have a summary control (shown on the left) to organize them. Since our site makes heavy use of master pages and user controls, I wanted the ability for any of the items in the page lifecycle to be able to add an RSS feed to the page and have it show up in this summary control. And, oh yeah - I wanted to cache them so future requests to the page didn't have to load them again. So, I created a static RssFeedManager class that pages and controls could register their feeds with and, on PreRender, my Summary control could get the list of feeds that had been registered and display them.
Finding the Bug
The weirdest/most difficult thing about this bug was the fact that it seemed to rarely ever occur. That is, on most requests throughout the site, everything went fine and nothing seemed out of the ordinary. And, on top of that, even though the culprit ended up involving a (supposedly) managed object, it never really showed up in our memory profiling. We made the final breakthrough when one of the guys from IS, Martin, noticed that we could reproduce the problem every time we hit the login page... Since this is one of the few pages on our site that is not cached via the Output Cache we immediately started hitting some of the other pages that are not cached and got the same result: the server memory grew and grew until it reached the maximum allocated memory and the application pool recycled. Cool - we were able to reliably reproduce it!
For the next step - actually figuring out the specific bug - we kicked up our trusty profiler. It didn't take long for Martin to locate the problem: the controls created and added to support our RSS feeds. Eventually, I ended up narrowing the problem down to a function in the RssFeedManager, listed below. Can you spot the problem?
private static Dictionary<Page, List<RssChannel>> RegisteredFeeds = new Dictionary<Page, List<RssChannel>>();
public static void RegisterPageFeed(Page page, RssChannel channel)
{
// If we don't have a page or a channel, bail out now
if (page == null channel == null)
return;
// Get the page feeds
List<RssChannel> pageFeeds = null;
if (RegisteredFeeds.ContainsKey(page))
{
// Get the existing feeds
pageFeeds = RegisteredFeeds[page];
// Add the feed to the existing feeds if it isn't there already
if (!pageFeeds.Contains(channel))
pageFeeds.Add(channel);
}
else
pageFeeds = new List<RssChannel>(new RssChannel[] { channel });
// Update the registered feeds list
RegisteredFeeds[page] = pageFeeds;
}
Yeah - that's right... I'm caching the list of registered feeds using the
Page object. At first glance, that seems like it might work - associating the
RssChannel with the current
Page that it belongs on. It might work, that is, until you realize that the
Page object is unique. That is to say, the
Page object created for the first request to
/default.aspx will
not be the same (or
Equals()) the one created for processing the second request to the same page,
/default.aspx. The result of the preceding code is that the current
Page object (referencing and referenced by literally thousands of objects) is placed into the static dictionary, never to be properly picked up by the garbage collector. The result of
that is pure, asinine memory leakage:
every request adds another
Page object to the dictionary until the Application Pool finally reaches its limits and explodes... er, recycles.
The Fix
As usual, the fix for this one was pretty simple. I updated the above function to store the registered feeds the HttpContext.Current.Items collection. That way, I could track things on a per-request basis and everything was nicely cleaned up for me after the HttpContext was out of scope (after the request had been fully processed). The final, fixed method is shown below:
1: public static void RegisterRssFeed(HttpContext context, RssChannel channel)
2: {
3: // If we don't have a page or a channel, bail out now
4: if (context == null channel == null)
5: return;
6:
7: // Get the existing feeds
8: List<RssChannel> currentFeeds =
9: (context.Items[RSS_FEEDS_KEY] as List<RssChannel>) ??
10: new List<RssChannel>();
11:
12: // Add the feed to the existing feeds if it isn't there already
13: if (!currentFeeds.Contains(channel))
14: currentFeeds.Add(channel);
15:
16: // Update the registered feeds list
17: context.Items[RSS_FEEDS_KEY] = currentFeeds;
18: }
Since it was introduced in ASP.NET 2.0, I've been using the HttpContext.Items collection more and more. It's readily available to practically anything (modules, handlers, user controls, pages, etc.) and it comes in real handy for situations like this.
Exactly How I Screwed Up
Here's a checklist of things that I knew to watch out for, but didn't:
- The most obvious - and main point of this post- is to be incredibly careful what you use as your Key in any collection. Also, be especially wary of the context in which you're doing it, which brings us to...
- Think hard about exactly what you're putting in static variables, since it keeps them away from the Trash Man (garbage collector), which could be a very bad thing.
- Reinforcing and expanding on #2: when you've got something like a static Dictionary<[foo],[bar]>, your goal may be to keep the Values alive, but keep in mind that you're also keeping the Key objects alive as well, which may not be exactly what you're shooting for. For example, instead of using an entire User object as a key, consider using the User's Username value instead (assuming it's unique, of course).
- At the risk of being redundant, I'll just go ahead and get real specific here, since it's the catalyst for this post: in the average page request, the Page object is, like, the worst thing to cache and/or keep in a static collection... Think about all of the objects that are latched on to it and are subsequently kept alive due to their association with this monster of an object. The thought of everything I was keeping alive with EVERY unique page request still haunts my dreams today...
- Overall, take a minute and think about why you're choosing to cache this particular item in the first place, and if you're actually duplicating data that's already been cached somewhere else, or even increasing performance at all.
As it turned out in my case, not only was I keeping large objects from being destroyed, I was also caching data that was already cached elsewhere (the individual RSS feed links themselves), and I wouldn't have been helping performance even it if was working like I'd planned! Unless the entire page was being cached in the Output Cache (in which case this entire discussion is moot), the page still had to go through its lifecycle, and the RSS feeds were still being registered with the RssFeedManager - the only thing that would've changed is that the RssFeedManager would check to see if they'd already been added and decide not to add them. Where's the performance gain in that?
ASIDE: Sorry if I lost you in that last paragraph... My point is that I was attempting to cache something that was already cached. Had I thought through the entire process, I (hopefully!) would've noticed that.
Remember how I said that the Page object was a "(supposedly) managed object"? Well, for the most part, it didn't show up in the memory profiler reports; at least not all of it. That is, if we started up the profiler and watched the server memory and the profiler report while we hit the page repeatedly, the profiler report showed approx. 400k difference for each page hit while the server memory would shoot up between 5 and 20 megs! It was very interesting and it certainly didn't help us get to the bottom of the problem any faster. I never did get to the bottom of this, but it's definitely good to know going forward.
So, there you have it - my great blunder. Hopefully my mistake can be to your benefit.