String Interning: Effective Memory Management with dotMemory
Starting with version 4.1, dotMemory offers the String duplicates inspection. The idea behind it is quite simple: it automatically checks memory for string objects with the same value. After you open a memory snapshot, you will see the list of such strings:
How can this help? Well, string duplicates possibly indicate ineffective memory usage. Why create a new string if it is already in memory?
Imagine, for example, that in the background your app parses some text files with repetitive content (say, some XML logs).
So, dotMemory finds a lot of strings with identical content. What can we do?
The obvious answer – rewrite our app so that it allocates strings with unique content just once. Actually, there are at least two ways this can be done. The first one is to use the string interning mechanism provided by .NET.
CLR Intern Pool
.NET automatically performs string interning for all string literals. This is done by means of an intern pool – a special table that stores references to all unique strings. But why aren’t the strings in our example interned? The thing is that only explicitly declared string literals are interned on the compile stage. The strings created at runtime are not checked for being already added to the pool. For example:
Of course, you can circumvent this limitation by working with the intern pool directly. For this purpose, .NET offers two methods: String.Intern and String.IsInterned. If the string value passed to String.Intern is already in the pool, the method returns the reference to the string. Otherwise, the method adds the string to the pool and returns the reference to it. If you want to just check if a string is already interned, you should use the String.IsInterned method. It returns the reference to the string if its value is in the pool, or null of it isn’t.
Thus, the fix for our log parsing algorithm could look as follows:
Further memory profiling will show that strings are successfully interned.
Nevertheless, such an implementation has one rather serious disadvantage – the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).
If, for example, our app has to parse a large number of different log files, this could be a problem. In such a case, a better solution would be to create a local analogue of the intern pool.
Local Intern Pool
The simplest (though very far from optimal) implementation might look like this:
The processing algorithm will change a little bit as well:
In this case, pool will be removed from memory with the next garbage collection after ProcessLogFile is done working.