Unusual Ways of Boosting Up App Performance. Lambdas and LINQs

This is the third post in the series. The previous ones can be found here:

Today, we’re going to uncover the common pitfalls of using lambda expressions and LINQ queries, and explain how you can evade them on a daily basis.

Lambda Expressions

Lambda expressions are a very powerful .NET feature that can significantly simplify your code in particular cases. Unfortunately, convenience has its price. Wrong usage of lambdas can significantly impact app performance. Let’s look at what exactly can go wrong.

The trick is in how lambdas work. To implement a lambda (which is a sort of a local function), the compiler has to create a delegate. Obviously, each time a lambda is called, a delegate is created as well. This means that if the lambda stays on a hot path (is called frequently), it will generate huge memory traffic.

Is there anything we can do? Fortunately, .NET developers have already thought about this and implemented a caching mechanism for delegates. For better understanding, consider the example below:

Caching lambdas 1

Now look at this code decompiled in dotPeek:

Caching lambdas example. Decompiled code

As you can see, a delegate is made static and created only once – LambdaTest.CS<>9__CachedAnonymousMethodDelegate1.

So, what pitfalls should we watch out for? At first glance, this behavior won’t generate any traffic. That’s true, but only as long as your lambda does not contain a closure. If you pass any context (this, an instance member, or a local variable) to a lambda, caching won’t work. It make sense: the context may change anytime, and that’s what closures are made for—passing context.

Let’s look at a more elaborate example. For example, your app uses some Substring method to get substrings from strings:

Lambdas example 1

Let’s suppose this code is called frequently and strings on input are often the same. To optimize the algorithm, you can create a cache that stores results:

Lambdas example 2

At the next step, you can optimize your algorithm so that it checks whether the substring is already in the cache:

Lambdas example 3

The Substring method now looks as follows:

Lambdas example 4

As you pass the local variable x to the lambda, the compiler is unable to cache a created delegate. Let’s look at the decompiled code:

Lambdas example. Decompiled code with no caching

There it is. A new instance of the c__DisplayClass1() is created each time the Substring method is called. The parameter x we pass to the lambda is implemented as a public field of c__DisplayClass1.

How to Find

As with any other example in this series, first of all, make sure that a certain lambda causes you performance issues, i.e. generates huge traffic. This can be easily checked in dotMemory.

  1. Open a memory snapshot and select the Memory Traffic view.
  2. Find delegates that generate significant traffic. Objects of …+c__DisplayClassN are also a hint.
  3. Identify the methods responsible for this traffic.

For instance, if the Substring method from the example above is run 10,000 times, the Memory Traffic view will look as follows:

Lambdas shown in dotMemory

As you can see, the app has allocated and collected 10,000 delegates.

When working with lambdas, the Heap Allocation Viewer also helps a lot as it can proactively detect delegate allocation. In our case, the plugin’s warning will look like this:

Warning about lambdas in the HAV plug-in

But once again, data gathered by dotMemory is more reliable, because it shows you whether this lambda is a real issue (i.e. whether it does or does not generates lots of traffic).

How to Fix

Considering how tricky lambda expressions may be, some companies even prohibit using lambdas in their development processes. We believe that lambdas are a very powerful instrument which definitely can and should be used as long as particular caution is exercised.

The main strategy when using lambdas is avoiding closures. In such a case, a created delegate will always be cached with no impact on traffic.

Thus, for our example, one solution is to not pass the parameter x to the lambda. The fix would look as follows:

Caching lambdas code fix

The updated lambda doesn’t capture any variables; therefore, its delegate should be cached. This can be confirmed by dotMemory:

Labdas caching after the fix shown in dotMemory

As you can see, now only one instance of Func is created.

If you need to pass some additional context to GetOrCreate, a similar approach (avoiding variable closure) should be used. For example:

Code example of passing additional context to lambdas

LINQ Queries

As we just saw in the previous section, lambda expressions always assume that a delegate is created. What about LINQ? The concepts of LINQ queries and lambda expressions are closely connected and have very similar implementation ‘under the hood.’ This means that all concerns we discussed for lambdas are also true for LINQs.

If your LINQ query contains a closure, the compiler won’t cache the corresponding delegate. For example:

LINQ caching example

As the threshold parameter is captured by the query, its delegate will be created each time the method is called. As with lambdas, traffic from delegates can be checked in dotMemory:

LINQ caching shown in dotMemory

Unfortunately, there’s one more pitfall to avoid when using LINQs. Any LINQ query (as any other query) assumes iteration over some data collection, which, in turn, assumes creating an iterator. The subsequent chain of reasoning should already be familiar: if this LINQ query stays on a hot path, then constant allocation of iterators will generate significant traffic.

Consider this example:

LINQ iterator allocation example

Each time GetLongNames is called, the LINQ query will create an iterator.

How to Find

With dotMemory, finding excessive iterator allocations is an easy task:

  1. Open a memory snapshot and select the Memory Traffic view.
  2. Find objects from the namespace System.Linq that contain the word “iterator”. In our example we use the Where LINQ method, so we look for System.Linq.Enumerable+WhereListIterator<string> objects.
  3. Determine the methods responsible for this traffic.

For instance, if we call the Foo method from our example 10,000 times, the Memory Traffic view will look as follows:

LINQ iterator allocation shown in dotMemory

The Heap Allocation Viewer plugin also warns us about allocations in LINQs, but only if they explicitly call LINQ methods. For example:

LINQ iterator allocation warning by the HAV plug-in

How to Fix

Unfortunately, the only answer here is to not use LINQ queries on hot paths. In most cases, a LINQ query can be replaced with foreach. In our example, a fix could look like this:

LINQ iterator allocation fix example

As no LINQs are used, no iterators will be created.

LINQ iterator allocation fix shown in dotMemory

We hope this series of posts has been helpful. Just in case, the previous two can be found here:

Please follow @dotmemory on Twitter or dotMemory google+ page to stay tuned.

Comments below can no longer be edited.

9 Responses to Unusual Ways of Boosting Up App Performance. Lambdas and LINQs

  1. Avatar

    Charlie Hayes says:

    July 24, 2014

    Could you post a followup explaining why allocating a lot of LINQ iterators is bad and how the proposed alternative doesn’t suffer from the same or similar issue? My naive view is that the new method will instantiate a new temporary result list every invocation, also including many possible dynamic backed-array reallocation and copy operations. Both the LINQ result and the ‘filtered list’ result would require iterators for iterating over, maybe the LINQ iterator isn’t as efficient? My understanding of LINQ is that the iterator won’t be created until the result is iteratated over, which would happen when the ToList() method is called and would produce a similar list to the proposed work around.

    • Avatar

      Steve Ruble says:

      July 25, 2014

      Charlie, I think there would be a better trade off between iterator allocations and List allocations if the method parameter and return value were typed as IEnumerable<string> rather than List<<string>. With the current signature you get the worst of both worlds, because you're allocating an iterator and calling ToList() at the end which causes a list to be allocated as well.

    • Avatar

      Antão Almada says:

      July 4, 2019

      You’re right that the iterator will only be created when the first first MoveNext() is called but still, it’s an extra memory allocation. The framework data structures, like List, define the enumerator as a value type so that it’s allocated on the stack instead of on the heap, avoiding pressure on the GC. This is as long as it’s not cast to IEnumerable, that will cause the the enumerator to be boxed.

      I’ve been writing a series of articles on enumeration in .NET explaining all this in detail:

  2. Avatar

    Matt Warren says:

    July 25, 2014

    @Charlie, it’s because you can do a regular foreach using a enumerator that is a struct, i.e. with no heap allocation. For a bit more information see the talk here or slide 35 from

  3. Avatar

    Patrick Smacchia says:

    July 29, 2014

    >In most cases, a LINQ query can be replaced with foreach.

    Indeed, and in most cases a foreach can be replaced with a

    var count = list.Count;
    for(var i =0; i< count; i++) {
    var obj = list[i];

    which is faster than

    foreach(var obj in list) {

  4. Avatar

    Dave Black says:

    November 18, 2014


    The performance of ‘for’ vs. ‘foreach’ is dependent on the type of collection being iterated over as well as the type contained within the collection. I’ve extended Vance Morrison’s (CLR Performance Architect) MeasureIt tool (google it), I’ve come up with some numbers I’ll post here along with explanations of said numbers. These “micro tests” in the tool account for JIT warmup, inlining, no-op instructions, are JIT-optimized, etc. So it is far more advanced than the standard timing done using System.Diagnostics.Stopwatch. Here is the description/interpretation followed by the numbers:

    Below are the results of running a series of benchmarks. Use the MeasureIt /usersGuide for more details on exactly what the benchmarks do.

    The resultant data numbers are the number of microseconds (µ) to run the operation n times divided by the scale where n is the ‘count’ and scale is listed in the test description. If the scale isn’t shown, it is implicitly a value of 1. Scaling is useful if you want to normalize a single iteration of an operation

    To improve the stability of the measurements, a measurement may be cloned several times and this cloned code is then run in a loop. If the benchmark was cloned the ‘scale’ attribute represents the number of times it was cloned, and the count represents the number of times the cloned code was run in a loop before the measurement was made. The reported number divides by both of these values, so it represents a single instance of the operation being measured.

    The benchmarks data can vary from run to run, so the benchmark is run several times and the statistics are displayed. If we assume a normal distribution, you can expect 68% of all measureuments to fall within 1 StdDev of the Mean. You can expect over 95% of all measurements to fall within 2 StdDev of the Mean. Thus 2 StdDev is a good error bound. Keep in mind, however, that it is not uncommon for the statistics to be quite stable during a run and yet very widely across different runs. See the users guide for more info.

    Generally, the mean is a better measurment if you use the number to compute an aggregate throughput for a large number of items. The median is a better guess if you want to best guess of a typical sample. The median is also more stable if the sample is noisy (eg. has outliers).

    Measure Iteration - [top]
    Test Name	Median	Mean	StdDev	Min	Max	Samples
    sum numbers 1-20 [count=1000]	1.99	2.13	0.32	1.99	3.07	10
    sum numbers 1-100 [count=1000]	22.53	22.41	8.35	15.06	43.98	10
    foreach over ValueType[] (250 elems) [count=1000]	1017.83	1022.68	19.78	989.40	1064.10	10
    foreach over ValueType[] using 'var' (250 elems) [count=1000]	1012.83	1023.84	30.07	991.57	1105.00	10
    Foreach delegate method over ValueType[] (250 elems) [count=1000]	398.37	413.28	34.53	385.84	484.94	10
    Foreach lambda method over ValueType[] (250 elems) [count=1000]	398.55	401.10	11.54	385.84	424.40	10
    foreach over List (250 elems) [count=1000]	482.77	482.86	12.39	472.95	516.75	10
    for over List using 'Count' (250 elems) [count=1000]	186.48	188.43	5.49	182.35	200.90	10
    ForEach delegate method over List (250 elems) [count=1000]	493.22	497.95	19.62	479.16	553.86	10
    ForEach lambda method over List (250 elems) [count=1000]	481.14	486.81	19.62	471.87	543.86	10
    for over int[] with hardcoded check against constant of 250 (250 elems) [count=1000]	52.53	53.20	1.45	52.35	57.23	10
    for over int[] - w/o hoisting using 'Length' (250 elems) [count=1000]	89.28	89.33	0.30	89.10	90.00	10
    for over int[] - with hoisting (250 elems) [count=1000]	94.16	94.58	1.07	93.98	97.47	10
    foreach over int[] (250 elems) [count=1000]	88.34	93.72	10.79	86.51	120.72	10
    foreach over int[] using 'var' (250 elems) [count=1000]	93.98	95.31	3.28	92.35	102.35	10
    foreach over ArrayList(int) (250 elems) [count=1000]	1860.66	1876.77	29.58	1848.74	1942.05	10
    foreach over ArrayList(int) using 'var' (250 elems) [count=1000]	1832.38	1850.15	41.18	1819.46	1958.19	10
    for over ArrayList(int) w/o hoisting using 'Count' (250 elems) [count=1000]	780.63	789.05	30.41	762.23	865.90	10
    for over ArrayList(int) with hoisting (250 elems) [count=1000]	609.52	613.22	14.99	596.99	648.61	10
    foreach over List (250 elems) [count=1000]	951.20	958.88	32.75	921.39	1031.39	10
    foreach over List using 'var' (250 elems) [count=1000]	959.55	975.79	59.96	935.90	1151.21	10
    for over List w/o hoisting using 'Count' (250 elems) [count=1000]	419.10	424.17	14.82	406.39	464.04	10
    for over List with hoisting (250 elems) [count=1000]	392.86	396.61	22.49	367.83	455.48	10
    ForEach delegate method over List (250 elems) [count=1000]	722.77	737.33	39.02	699.16	816.27	10
    ForEach lambda method over List (250 elems) [count=1000]	724.25	735.03	32.08	703.31	815.36	10
    for over string[] w/o hoisting using 'Length' (250 elems) [count=1000]	344.40	350.14	29.76	321.45	435.12	10
    for over string[] with hoisting (250 elems) [count=1000]	349.28	344.04	12.29	321.08	360.18	10
    foreach over string[] (250 elems) [count=1000]	342.02	340.92	7.18	328.92	350.54	10
    foreach over string[] using 'var' (250 elems) [count=1000]	343.77	340.95	8.74	325.30	352.95	10
    Array.ForEach (delegate) over string[] (250 elems) [count=1000]	613.67	632.49	29.87	612.77	702.41	10
    Array.ForEach (lambda) over string[] (250 elems) [count=1000]	593.95	595.27	6.58	583.86	608.98	10
    foreach over StringCollection (250 elems) [count=1000]	1924.40	1918.02	34.57	1864.40	1970.42	10
    foreach over StringCollection using 'var' (250 elems) [count=1000]	1913.13	1911.47	57.73	1848.01	2060.24	10
    for over StringCollection w/o hoisting using 'Count' (250 elems) [count=1000]	869.55	868.61	16.52	843.55	899.16	10
    for over StringCollection with hoisting (250 elems) [count=1000]	757.62	769.70	33.57	723.86	846.08	10
    foreach over ArrayList(string) (250 elems) [count=1000]	1910.66	1916.57	16.97	1898.92	1948.19	10
    foreach over ArrayList(string) using 'var' (250 elems) [count=1000]	1924.13	1927.58	36.69	1862.05	1990.96	10
    for over ArrayList(string) w/o hoisting using 'Count' (250 elems) [count=1000]	857.17	867.64	36.09	830.24	953.92	10
    for over ArrayList(string) with hoisting (250 elems) [count=1000]	738.43	740.42	20.43	716.99	778.80	10
  5. Avatar

    Dave Black says:

    November 18, 2014

    Note that the optimizations in the JIT compiler are smart enough to do loop unrolling, bounds hoisting, etc. It can detect that a bounds check is taking place and cache that value instead of computing it every iteration –

    The old days of trying to eek a couple of cycles out of a loop by hoisting its bounds check are over – and can actually hurt performance.

    In other words, try to outsmart the JIT optimizer and you will lose…it’s come a long way and is smarter than most people think!

  6. Avatar

    Krittayot Techasombooranakit says:

    August 19, 2018

    Is this still relevant today in .netcore era?

    • Avatar

      Maarten Balliauw says:

      August 20, 2018

      It very much is, the compiler-generated code hasn’t changed too much over the years.

Discover more