String Interning: Effective Memory Management with dotMemory

Starting with version 4.1, dotMemory offers the String duplicates inspection. The idea behind it is quite simple: it automatically checks memory for string objects with the same value. After you open a memory snapshot, you will see the list of such strings:

String duplicates inspection in dotMemory

How can this help? Well, string duplicates possibly indicate ineffective memory usage. Why create a new string if it is already in memory?

Imagine, for example, that in the background your app parses some text files with repetitive content (say, some XML logs).

Code example for log file processing

So, dotMemory finds a lot of strings with identical content. What can we do?

Inspection results for the example

The obvious answer – rewrite our app so that it allocates strings with unique content just once. Actually, there are at least two ways this can be done. The first one is to use the string interning mechanism provided by .NET.

CLR Intern Pool

.NET automatically performs string interning for all string literals. This is done by means of an intern pool – a special table that stores references to all unique strings. But why  aren’t the strings in our example interned? The thing is that only explicitly declared string literals are interned on the compile stage. The strings created at runtime are not checked for being already added to the pool. For example:

Interning example

Of course, you can circumvent this limitation by working with the intern pool directly. For this purpose, .NET offers two methods: String.Intern and String.IsInterned. If the string value passed to String.Intern is already in the pool, the method returns the reference to the string. Otherwise, the method adds the string to the pool and returns the reference to it. If you want to just check if a string is already interned, you should use the String.IsInterned method. It returns the reference to the string if its value is in the pool, or null of it isn’t.

Thus, the fix for our log parsing algorithm could look as follows:

CLR interning example

Further memory profiling will show that strings are successfully interned.

Inspection after the fix

Nevertheless, such an implementation has one rather serious disadvantage – the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).

If, for example, our app has to parse a large number of different log files, this could be a problem. In such a case, a better solution would be to create a local analogue of the intern pool.

Local Intern Pool

The simplest (though very far from optimal) implementation might look like this:

Local pool code example

The processing algorithm will change a little bit as well:

Local pool example

In this case, pool will be removed from memory with the next garbage collection after ProcessLogFile is done working.

Thanks for reading! We hope this post was helpful. If you want to try dotMemory and the full set of its automatic inspections on your code, just download your free 5-day trial here.

This entry was posted in dotMemory Tips&Tricks, How-To's and tagged , . Bookmark the permalink.

14 Responses to String Interning: Effective Memory Management with dotMemory

  1. KooKiz says:

    “CLR will allocate string only if it’s not already in the pool”

    The comment is misleading. The string will still be allocated, but won’t be referenced anymore, and will therefore be collected the next time the GC runs.

  2. Pingback: Dew Drop – February 12, 2015 (#1953) | Morning Dew

  3. Chris Staley says:


    string s = "ABC";

    string s = "A" + "B" + "C";

    These both produce the same IL since the C# compiler automatically performs concatenation on constants, so I would be shocked if the CLR treated them differently at runtime.

    • Alexey Totin says:

      Yep. Blooper. Compiler is smart enough to concatenate constants at compile time. Corrected to

      string s1 = "A";
      string s2 = s1 + "BC"; // will not be interned

  4. Pingback: JetBrains Newsletter, March 2015 | Indie Game Developer!

  5. Greg Sohl says:

    I’m trying to resolve an apparent difference between your statement:

    “the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).”

    and the documentation for String.Intern

    From: https://msdn.microsoft.com/en-us/library/system.string.intern%28v=vs.110%29.aspx
    “the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR’s reference to the interned String object can persist after your application, or even your application domain, terminates”

    Which is right?

    Greg

    • Alexey Totin says:

      Hello, Greg
      The post was written keeping standalone apps in mind. Of course, there are cases when your app (and CLR) is hosted by another process e.g. IIS app pool. In such a case the more general statement “… until the CLR terminates” is correct.

  6. Jörg Preiß says:

    Great article, thanks a lot.

    Now I have those duplicate strings galore. The problem is – we use the standard XML serialization, e.g. instead of XmlReader.Create we use new XmlSerializer(type).
    So, is there a way to avoid these dusplicated strings, too?

  7. Dave Black says:

    your following comment is incorrect:
    “or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed”.

    Strings are interned across app domains (unlike statics which are scoped to an AppDomain). Refer to Chris Brumme’s blog post here – http://blogs.msdn.com/b/cbrumme/archive/2003/04/22/51371.aspx

  8. Dave Black says:

    Thus, the string will persist for the lifetime of the hosting process – not the lifetime of the AppDomain.

  9. Aishel M says:

    Can HashSet be used instead of dictionary for the LocalPool?
    Also, can you highlight the sub-optimal characteristics of that implementation?

    • Alexey Totin says:

      Hi Aishel,
      HashSet – yes, why not.
      Talking about possible improvements – for example, more flexible pool lifetime management. Another possible improvement – same techniques that are used for cache (LRU, MRU) in case the amount of processed strings is really huge.

Leave a Reply

Your email address will not be published. Required fields are marked *