String Interning: Effective Memory Management with dotMemory

Starting with version 4.1, dotMemory offers the String duplicates inspection. The idea behind it is quite simple: it automatically checks memory for string objects with the same value. After you open a memory snapshot, you will see the list of such strings:

String duplicates inspection in dotMemory

How can this help? Well, string duplicates possibly indicate ineffective memory usage. Why create a new string if it is already in memory?

Imagine, for example, that in the background your app parses some text files with repetitive content (say, some XML logs).

Code example for log file processing

So, dotMemory finds a lot of strings with identical content. What can we do?

Inspection results for the example

The obvious answer – rewrite our app so that it allocates strings with unique content just once. Actually, there are at least two ways this can be done. The first one is to use the string interning mechanism provided by .NET.

CLR Intern Pool

.NET automatically performs string interning for all string literals. This is done by means of an intern pool – a special table that stores references to all unique strings. But why  aren’t the strings in our example interned? The thing is that only explicitly declared string literals are interned on the compile stage. The strings created at runtime are not checked for being already added to the pool. For example:

Interning example

Of course, you can circumvent this limitation by working with the intern pool directly. For this purpose, .NET offers two methods: String.Intern and String.IsInterned. If the string value passed to String.Intern is already in the pool, the method returns the reference to the string. Otherwise, the method adds the string to the pool and returns the reference to it. If you want to just check if a string is already interned, you should use the String.IsInterned method. It returns the reference to the string if its value is in the pool, or null of it isn’t.

Thus, the fix for our log parsing algorithm could look as follows:

CLR interning example

Further memory profiling will show that strings are successfully interned.

Inspection after the fix

Nevertheless, such an implementation has one rather serious disadvantage – the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).

If, for example, our app has to parse a large number of different log files, this could be a problem. In such a case, a better solution would be to create a local analogue of the intern pool.

Local Intern Pool

The simplest (though very far from optimal) implementation might look like this:

Local pool code example

The processing algorithm will change a little bit as well:

Local pool example

In this case, pool will be removed from memory with the next garbage collection after ProcessLogFile is done working.

Thanks for reading! We hope this post was helpful. If you want to try dotMemory and the full set of its automatic inspections on your code, just download your free 5-day trial here.

Discover more