Today we were investigating several cases of Omea forms not getting released properly after closing. Usually this is not too much noticeable, but in the case of Manage Newsgroups window (which loads in memory the complete list of newsgroups on the selected server) the memory loss was measured in megabytes.
After some investigation with Reflector and .NET Memory Profiler, we found that the leaks were caused by two issues in the Windows Forms implementation. One is clearly a bug, and another may be a bug or some kind of a weird compatibility fix.
The first issue is simple. When you attach an ImageList to a ListView, the ListView hooks two events of the ImageList: RecreateHandle and Disposed. However, ListView.Dispose() unhooks only the Disposed event handler. In Omea, there is a global image list of all resource icons that exists all the time while Omea is running, and the ListView on the form remained forever live because of the event handler attached to the global image list. The form remained live because of an event handler attached to ListView.
Fortunately, there is an easy workaround for this: we can just clear all ImageLists on a ListView when the form is disposed.
The second issue is more complex: because of an implementation weirdness in Form.RemoveOwnedForm(), the last shown modal dialog remains in the owned forms list. (It is not visible through Form.OwnedForms property. The list is stored as two items in the property store: the count and the array of forms. The size of the array returned from OwnedForms is determined by the count. When the last form is removed, the count is decremented to zero, but the corresponding item in the array is not reset to null.)
For this one, we weren’t able to find a good workaround. Sure, we could use some heavy reflection hackery to dig into the Form internals and clear the owned forms array manually, but this is very likely to break on other versions of the framework besides 1.1. Because of this, and since only the last shown modal dialog remains forever live, we decided that we could live with the second problem for now.
The fix for the first problem will be integrated in Omea 1.0.4.
It looks like a recent post in Michael Kaplan’s blog, where he demonstrates usage of surrogate pairs, exposes a bug in the implementation of System.IO.BinaryReader.ReadString() in .NET 1.1. The bug appears in Omea as ArgumentException “Conversion buffer overflow” when trying to read the body of the post from the resource store (which stores strings in UTF-8 encoding).
I have studied the Rotor sources of binaryreader.cs and utf8encoding.cs, and while they probably don’t exactly match the .NET 1.1 implementation, I think they give me a good idea of what’s actually going on.
As far as I understand, the problem is the following. BinaryReader.ReadString() reads the string in 128–byte chunks, using the Decoder class to store the intermediate state of the encoding conversion. It also creates a 128–char buffer where it puts the results of converting each chunk. Thus, it assumes that Decoder.GetChars() will not return more characters than it got bytes.
However, if I understand correctly, the assumption will be violated if the last byte of the byte sequence encoding a surrogate pair immediately follows the boundary of the 128–byte chunk, and all other bytes in the chunk represent regular ASCII characters. In this case, the UTF8 decoder will return the complete surrogate pair as the first two characters of the new chunk, and it will be followed by 127 regular ASCII characters. The result: trying to store 129 characters in a 128–character buffer.
I guess I am really lucky to have hit this problem… fortunately, it is fairly easy to replace BinaryReader.ReadString() with custom code that will not have this problem, and I’ll do just that.
Yesterday, when doing a scan of blogs.msdn.com, I noticed a reference to a blog of Slava Oks, who seems to be working on low-level features of SQL Server (like memory management). In one of his posts he wrote about a little-known utility from Microsoft Product Support, called LeakDiag.
The tool works by intercepting the memory allocation functions in a process, recording the call stack of every allocation and logging the allocations grouped by call stack. It can work on many different levels – VirtualAlloc(), heap allocation functions, C runtime memory allocator and others.
The most interesting results for Omea were obtained by VirtualAlloc() traces. We have really good tools to analyze the usage of managed memory by the Omea process (in particular, we are using SciTech Memory Profiler). But the large difference between the size of the managed heap and the VM size of the process reported by Task Manager has always been somewhat of a mystery for us.
Now, with LeakDiag and a small Python script I wrote to post-process its results, I can get a complete breakdown of where our virtual memory goes. Later I’ll publish some more details on the results of my analysis.
The main problem with LeakDiag is that it can only attach to a process which is running, so the first bunch of allocations made at process start are not caught by it. I tried to attach LeakDiag to a process halted in the debugger immediately after creation, but it seems like attaching LeakDiag so early interferes with USER32 initialization, and some very weird effects result – basically, Omea cannot initialize its user interface completely. So, instead of that, I made an option to show a message box on entry to the Main() function, and attach LeakDiag when the message box is shown.
Another tip is that the option “Use DbgHelp StackWalk APIs to walk stacks” does not really work for managed processes – if it’s enabled, LeakDiag hangs when writing the log.
The start of the EAP for the next version of Omea, codenamed Tokaj, looks like a good time to get back into blogging. This time there will be more of us around – David Booth, our sales and marketing guy, is already blogging, and several more team members will join us soon.
A lot has happened in the time since my last blog post. To get started, this page on our new Confluence site contains the list of new features for the Tokaj release. We’ll keep it updated with new stuff as it appears in the product.