String Interning: Effective Memory Management with dotMemory
Starting with version 4.1, dotMemory offers the String duplicates inspection. The idea behind it is quite simple: it automatically checks memory for string objects with the same value. After you open a memory snapshot, you will see the list of such strings:
How can this help? Well, string duplicates possibly indicate ineffective memory usage. Why create a new string if it is already in memory?
Imagine, for example, that in the background your app parses some text files with repetitive content (say, some XML logs).
So, dotMemory finds a lot of strings with identical content. What can we do?
The obvious answer – rewrite our app so that it allocates strings with unique content just once. Actually, there are at least two ways this can be done. The first one is to use the string interning mechanism provided by .NET.
CLR Intern Pool
.NET automatically performs string interning for all string literals. This is done by means of an intern pool – a special table that stores references to all unique strings. But why aren’t the strings in our example interned? The thing is that only explicitly declared string literals are interned on the compile stage. The strings created at runtime are not checked for being already added to the pool. For example:
Of course, you can circumvent this limitation by working with the intern pool directly. For this purpose, .NET offers two methods: String.Intern and String.IsInterned. If the string value passed to String.Intern is already in the pool, the method returns the reference to the string. Otherwise, the method adds the string to the pool and returns the reference to it. If you want to just check if a string is already interned, you should use the String.IsInterned method. It returns the reference to the string if its value is in the pool, or null of it isn’t.
Thus, the fix for our log parsing algorithm could look as follows:
Further memory profiling will show that strings are successfully interned.
Nevertheless, such an implementation has one rather serious disadvantage – the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).
If, for example, our app has to parse a large number of different log files, this could be a problem. In such a case, a better solution would be to create a local analogue of the intern pool.
Local Intern Pool
The simplest (though very far from optimal) implementation might look like this:
The processing algorithm will change a little bit as well:
In this case, pool will be removed from memory with the next garbage collection after ProcessLogFile is done working.
Thanks for reading! We hope this post was helpful. If you want to try dotMemory and the full set of its automatic inspections on your code, just download your free 5-day trial here.
KooKiz says:
February 12, 2015“CLR will allocate string only if it’s not already in the pool”
The comment is misleading. The string will still be allocated, but won’t be referenced anymore, and will therefore be collected the next time the GC runs.
Alexey Totin says:
February 12, 2015You’re absolutely right.
Removed the misleading comment
Dew Drop – February 12, 2015 (#1953) | Morning Dew says:
February 12, 2015[…] String Interning: Effective Memory Management with dotMemory (Alexey Totin) […]
Chris Staley says:
February 12, 2015string s = "ABC";
string s = "A" + "B" + "C";
These both produce the same IL since the C# compiler automatically performs concatenation on constants, so I would be shocked if the CLR treated them differently at runtime.
Alexey Totin says:
February 12, 2015Yep. Blooper. Compiler is smart enough to concatenate constants at compile time. Corrected to
string s1 = "A";
string s2 = s1 + "BC"; // will not be interned
JetBrains Newsletter, March 2015 | Indie Game Developer! says:
March 12, 2015[…] String Interning with dotMemory — String interning is an important effective memory management practice, and dotMemory knows a thing or two about that. Explore how to detect and fix .NET memory issues indicated by dotMemory’s ‘String duplicates’ inspection. The blog post covers two string interning mechanisms. […]
Greg Sohl says:
March 20, 2015I’m trying to resolve an apparent difference between your statement:
“the interned strings will stay in memory “forever” (or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed).”
and the documentation for String.Intern
From: https://msdn.microsoft.com/en-us/library/system.string.intern%28v=vs.110%29.aspx
“the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR’s reference to the interned String object can persist after your application, or even your application domain, terminates”
Which is right?
Greg
Alexey Totin says:
March 24, 2015Hello, Greg
The post was written keeping standalone apps in mind. Of course, there are cases when your app (and CLR) is hosted by another process e.g. IIS app pool. In such a case the more general statement “… until the CLR terminates” is correct.
Jörg Preiß says:
March 23, 2015Great article, thanks a lot.
Now I have those duplicate strings galore. The problem is – we use the standard XML serialization, e.g. instead of XmlReader.Create we use new XmlSerializer(type).
So, is there a way to avoid these dusplicated strings, too?
Alexey Totin says:
March 24, 2015Hello Jörg
No obvious solution comes to mind, except overriding deserialization.
Dave Black says:
October 21, 2015your following comment is incorrect:
“or, to be more correct, they will persist for the lifetime of AppDomain, as the intern pool will store references to the strings even if they are no longer needed”.
Strings are interned across app domains (unlike statics which are scoped to an AppDomain). Refer to Chris Brumme’s blog post here – http://blogs.msdn.com/b/cbrumme/archive/2003/04/22/51371.aspx
Dave Black says:
October 21, 2015Thus, the string will persist for the lifetime of the hosting process – not the lifetime of the AppDomain.
Aishel M says:
April 14, 2016Can HashSet be used instead of dictionary for the LocalPool?
Also, can you highlight the sub-optimal characteristics of that implementation?
Alexey Totin says:
April 14, 2016Hi Aishel,
HashSet – yes, why not.
Talking about possible improvements – for example, more flexible pool lifetime management. Another possible improvement – same techniques that are used for cache (LRU, MRU) in case the amount of processed strings is really huge.
Vishwas says:
March 16, 2018Hi,
I’m new to this topic, I have a doubt.
After observing your second approach to have a local string pool, i am wondering what is the difference between Normal usage of strings ( without intern or local pool) and the one with the local pool. In both case garbage collector collects the strings right.?
ry says:
March 30, 2020Although the garbage collector may collect the strings (depending on the specific application) in both cases, in the case of the local pool, you can eliminate duplicates, thereby reducing memory consumption, and potentially garbage collection time.