Tag Archives: c#

Towards a better local caching strategy

Why Caching?

A while ago, I explained to some of my co-workers the benefits of caching. I’m always surprised to see how this technique is so misunderstood by some developers.

In computing, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere.

The thing is that caching is already present everywhere: CPU, Disk, Network, Web, DNS, … it’s one of the oldest programming techniques, available in any programming language and frameworks. You may think that it was mandatory with –only- 8 Ko of RAM two decades ago, but don’t too naive: it’s still a pertinent approach in our always-connected world : more data, more users, more clients, real-time, …

In this article, I will focus only on application caching through System.Runtime.Caching. Nothing really new here, but I just want to review 3 basics caching strategies, that you can see in popular and oss projects; it’s important to have solid foundations. Even if the language is C#, many concepts listed here are also valid in others languages.

Local Caching Strategies

By Local/InMemory cache, I mean that data is held locally on the computer running an instance of an application. System.Web.Caching.Cache, System.Runtime.Caching.MemoryCache, EntLib CacheManager are well-known local cache.

There is no magic with caching and there is a hidden trade-off: caching means working with stale data. Should I increase the cache duration? Should I keep short TTL value? It’s never easy to answer to these questions, because it simply depends on your context: topology of data, number of clients, user load, database activity…

When implementing a local caching strategy, there is an important list of questions to ask yourself :

  • How long the item will be cached?
  • Is data coherence important?
  • How long it takes to reload the data item?
  • Does the number of executed queries on the data source matter?
  • Does caching strategy impact the end user ?
  • What is the topology of data: Reference data, Activity data, session data, … ?

The –very- basic interface we will implement in those 3 following examples contains a single method.

 public interface ICacheStrategy
 {
 /// <summary>
 /// Get an item from the cache (if cached) else reload it from data source and add it into the cache.
 /// </summary>
 /// <typeparam name="T">Type of cache item</typeparam>
 /// <param name="key">cache key</param>
 /// <param name="fetchItemFunc">Func<typeparamref name="T"/> used to reload the data from the data source (if missng from cache)</param>
 /// <param name="durationInSec">TTL value for the cache item</param>
 /// <param name="tokens">list of string to generate the final cache key</param>
 /// <returns></returns>
 T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens);
 }

Basic Strategy

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            var cacheKey = this.CreateKey(key, tokens);
            var item = this.Cache.Get<T>(cacheKey);
            if (this.IsDefault(item))
            {
                item = fetchItemFunc();
                this.Cache.Set(cacheKey, item, durationInSec, false);
            }
            return item;
        }

This is similar to Read-Through caching. The caller will always get an item back, coming from the cache itself or the data source. When a cache client asks for an entry, and that item is not already in the cache, the strategy will automatically fetch it from the underlying data source, then place it in the cache for future use and finally will return the loaded item to the caller.

Double-checked locking

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            string cacheKey = this.CreateKey(key, tokens);
            var item = this.Cache.Get<T>(cacheKey);

            if (this.IsDefault(item))
            {
                object loadLock = this.GetLockObject(cacheKey, SyncLockDuration);
                lock (loadLock)
                {
                    item = this.Cache.Get<T>(cacheKey);
                    if (this.IsDefault(item))
                    {
                        item = fetchItemFunc();
                        this.Cache.Set(cacheKey, item, durationInSec);
                    }
                }
            }

            return item;
        }

This version introduces a locking system. A global synchronization mechanism (a single object for every cache item) is not efficient here, that’s why there is a dedicated synchronization object per cache item (depending on the cache key). The double-checked locking is also really important here to avoid useless/duplicated requests on the data source.

Refresh ahead strategy

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            // code omitted for clarity

            // not stale or don't use refresh ahead, nothing else to do =&gt; back to double lock strategy
            if (!item.IsStale || staleRatio == 0) return item.DataItem;
            // Oh no, we're stale - kick off a background refresh

            var refreshLockSuccess = false;
            var refreshKey = GetRefreshKey(cachekey);

            // code omitted for clarity

            if (refreshLockSuccess)
            {
                var task = new Task(() =>
                {
                    lock (loadLock)
                    {
                        // reload logic
                    }
                });
                task.ContinueWith(t =>
                {
                    if (t.IsFaulted) Trace.WriteLine(t.Exception);
                });
                task.Start();
            }
            return item.DataItem;
        }

In this implementation it’s possible to configure a stale ratio, enabling an automatic and asynchronous refresh on any recently accessed cache entry before its expiration. The application/end user will not feel the impact of a read against a potentially slow cache store when the entry is reloaded due to expiration. If the object is not in the cache and If the object is accessed after its expiration time, it’s similar to the double-checked locking strategy.

Refresh-ahead is especially useful if objects are being accessed by a large number of users. Values remain fresh in the cache and the latency that could result from excessive reloads from the cache store is avoided.

Experimental Results

To see the impact of each strategy, I’ve committed code Github. This program simulates fake workers, that get the same item from the cache during 60 sec for each strategy. A fake reload method taking one sec is used to simulate access to the datasource.  All cache hits, cache misses and reloads are recorded. This may not be the best program of the world, but it’s fairly enough to illustrate this article.

Strategy Number of Gets/Reloads Avg. Get Time (ms) < 100 ms
Basic 3236 / 181 59.17 ms 94.15 %
Double-checked locking 3501 / 11 51.22 ms 94.10 %
Refresh Ahead 3890 / 11 0.20 ms 100%

Between the basic and the double-checked locking strategy, there is an important improvement in terms of reloads from the data source, with nearly the same avg. response time. Between the double-checked locking and refresh-ahead strategy, the number of reloads is exactly the same but the response time is greatly improved. It’s very easy to use a cache, but be sure to use the appropriate pattern that fits correctly to your use cases.

Bonus : Local Cache invalidation

One year ago, I’ve posted an article on Betclic Tech Blog about implementing local memory cache invalidation with Redis PubSub feature. The idea is fairly simple: catch invalidation messages at application level to remove or more items from a local cache. The current version is more mature and a nuget is available here.  This can be easily used in many ways:

  • Invalidate items, such as remove them from the cache
  • Mark items as stale, to use Background reloading with external event..

To conclude

We covered here only some variations of the cache-aside pattern. I hope that you’re now more aware of the possibilities and troubles you may have with a local cache. It’s very easy and efficient to use a local cache, so be sure to use the appropriate pattern that fits correctly to your use cases.

By the way, in an era of cloud applications, fast networks, low latencies, a local cache is –still- very important. Nothing, absolutely nothing is faster that accessing local memory. One natural evolution often cited for a local cache, is a distributed cache. As we’ve seen here, this is not always the unique solution, but there is another story.

The full source code is available on Github.

Book review : Writing High-Performance .NET Code

Cover-Tall-2000x2828-212x300
Ben Watson has been a software engineer at Microsoft since 2008. On the Bing platform team, he has built one of the world’s leading .NET-based, high-performance server applications, handling high-volume, low-latency requests across thousands of machines for millions of customers.

More recently Ben Watson posted two articles on codeproject, both are “Best C# article of the month”. So you can trust this guy.

There are so many books, so many articles & blogs, so many tweets talking about .net performance. Why do we need another book in 2014 and even why do we still need to understand .net internals today when our applications may be hosted on servers with 16 cores and 112 GB of RAM (Azure A9) ?

First, optimizing your code will scale much more better than scaling up/adding new hardware. For sure, you can add more cores, more RAM to your production servers for few €/$ per month, and this is may be a pertinent approach in some cases.  But don’t be naive, a bottleneck in your application is like the sword of Damocles “with great fortune and power come also great peril and anxiety”

More than ever, it’s difficult to predict the performance of a single line of code. By integrating nuget packages, OSS librairies, commercial components … everywhere (“don’t reinvent the wheel, use a framework, they all say”) … our code becomes some kind of magic. In an Agile world, our codebase changes very often and it’s the same of our dependencies: Any commit could potentially destroy your application performance. Good developers have full tests suites (unit, integration, e2e, …) Why not performance tests and benchmarks ?

More than 13 years after the first version of .net, it’s still there, very competitive & efficient in many areas. We can use .net to develop web sites, universal apps, android/iOs app, …. “One language to rule them all, and it’s C#”. Recent additions like async/await and the new features of C# 6 show us that it’s an active and dynamic language. Learning .net runtime internals is certainly not useless.

This is my feeling about dotnet performance today. That’s why I bought this book one month ago.

If you’re a bit concerned about performance, you may know that it’s a very broad topic. What is performance? response time, CPU on a front server, low available memory on a back service … The famous quote of Donald Knuth “Premature optimization is the root of all evil” is a really good argument for detractors “don’t spend time of code optimizations, it’s useless”

One simple rule, maybe the most important in this book is : Measure, Measure, Measure! Don’t try to guess, it’s impossible; don’t try to optimize everything, it’s also impossible. By measuring accurately, you will be efficient. Most of the time to reach your performance goal, you just have to change a few areas in your code, and not everything.

In this book, the author lists a wide range of tools that could help you when investigating performance issues: Visual Studio Profiler, PerfView, CLR Memory Profiler, MeasureIt, … This tool chain should be part of every Performance Chief Architect. “Optimizing performance is meaningless if you do not have effective tools for measuring it”.

This book is quite recent (summer 2014), so the content is quite up-to-date; For example, there are specific chapters on Windows Phone, TPL, .net 4.5 … By the way, it’s a delicious mix between .net internals, best practices and recommendations. It’s hard to talk about code optimization without talking about Garbage collection & Jit compilation.  It may not be the best explanation (in terms of details), but it’s certainly one of the easiest to understand. If you’re quite new in .net, it’s mandatory to understand these topics.

The middle of this book is like a catalog. What is a performance cost of throwing exceptions? Class or struct ? Why Enum.ToString() is so CPU-consuming ? How to work with strings efficiently ? Any difference between for & foreach ? …. You may have so many questions and there are so many answers in this book. Each topic is illustrated by code, IL, … and how to diagnose this problem with your new favorite tools (PerfView, CLR Profiler, Dump, …)

What I like also about this book is that it’s not only focused on code. The Performance effort should not be supported by only one guy (you or me) but by the whole team. Having always up-to-date production metrics, automated benchmarks & profiling, educated product owners, is very important for a successful performance strategy.

It will definitively be part of my library about performance, near to others famous books on this topic. I recommend it to every .net developer.

Writing High-Performance .NET Code

The importance of useless Micro-optimization

Micro-optimization is “the process of meticulous tuning of small sections of code in order to address a perceived deficiency in some aspect of its operation (excessive memory usage, poor performance, etc)”

We’ve all seen this kind of debates : Is X faster than Y? Should I have to replace A by B in my code? It’s not specific to any language, but it’s a real question for all developers. Programmers are generally advised to avoid micro-optimization, unless they have a solid justification. Yes, it depends

Most of the time, Micro-optimizations are confirmed by benchmarks: a very specific code section is executed thousands/millions times to illustrate the problem and to confirm initial hypothesis. A is X times slower than B. Of course, in real world applications, we rarely call one piece of code so many times, so this stuff may seem inappropriate. But the trouble is that the extra N-milliseonds is CPU time on the server. A web server could simply be idle waiting for request, processing other requests, … but instead it is busy executing the same inefficient methods over and over. Those N-ms can even become N-s during peaks load.

Improving performance of an application is an endless road. It’s a very time-consuming activity and don’t forget that’s it’s not the core of your Business : Your Boss wants you to deploy new features and not pointless optimizations. It’s very common to spend several hours (days?) to reach your performance goals.

By performance I mean something that limits scalability of your application. It could be CPU, network IO, Memory … You may know that all applications (systems) have throughput limits. Your Job, as a chief performance officer, is to be keep fast response times in all circumstances (unexpected system difficulties, light load/heavy load) and to be far from these limits.

The performance of a code section is a mix between frequency & efficiency. I don’t care to write an inefficient code if it’s rarely used but I’m really concerned by most called operations. Have you already count how many times ToString() is called in your .net project ? We can’t blame someone else for having written un-optimized code that we want to use in our –different- context. For example, .toString() has maybe not been coded to be used as a key for Dictionaries.

Since 10 years of professional programming, I’ve seen serious performance problems. “I don’t understand why it takes 5 sec to generate my page each evening. Everything is cached and I’ve run many benchmarks: all my methods take less than 5ms.” Yes, but these methods are called 100 times while generating your page. Simply speaking, 5 ms is too slow for your context !

Even with a clean architecture it’s sometimes difficult to determine the critical rendering path.  That’s why I’m a big fan of profiling tools like Visual Studio Profiler: they give you a complete overview without changing your application. Collected data during a load test session,  can be precious.

I often see Micro-optimization as a challenge: let me try to significantly improve the performance –and scalability- of your application by changing the fewest lines as possible. If I can’t do this, I’m either not good enough or there are no interesting optimizations.

Examples in .net

A few months ago, I already explained on the Betclic Techblog one bad experience we had with Structuremap and Web Api. To summarize, we’ve significantly reduced CPU usage of a web application, just by splitting IoC bindings into multiple containers.

More recently, I was mandated to do a performance review on a WCF service consuming too much CPU & memory. Without any doubts –thanks to my favorite profiler- , one method is causing several alerts & warnings.

buildmessage

This method is in a base class. Because the project follows the Onion architecture, dependency injection is heavily used here to manage dependencies (at core, service & repository layers). As a consequence to be in a pure stateless world, this method can be invoked several times during each request processing.  The average call rate is approx. 150/sec during peaks load, so this is a good candidate for a micro-optimization.

How to improve this method? It doesn’t look so bad at the first view ….

A first idea could be to optimize string operations. We all know the obvious beginner mistakes of string concatenation. Memory allocations are fast on modern PCs, but they’re far from free.  String.Join()/StringBuilder seems better in this case.

A Second better idea is to remove Enum.ToString(). The type of Legislation/Brand/Platform is enum  The method Enum.ToString() is called implicitly in this code section (during concatenation). An interesting article on codeproject explains the troubles with Enum.ToString().

A few minutes later, I finally produced this extension method (Gist)


public static class EnumExtensions
{
    public static class EnumCache&lt;TEnum&gt;
    {
        public static Dictionary&lt;TEnum, string&gt; Values = new Dictionary&lt;TEnum, string&gt;();
        static EnumCache()
        {
            var t = typeof(TEnum);
            var values = (TEnum[])Enum.GetValues(t);
            var names = Enum.GetNames(t);
            for (var i = 0; i &lt; values.Length; i++)
            {
                Values.Add(values[i], names[i]);
            }           
        }

        public static string Get(TEnum enm)
        {
            return Values[enm];
        }
     }

     public static string AsString&lt;TEnum&gt;(this TEnum enm) where TEnum : struct, IConvertible
     {
         return EnumCache&lt;TEnum&gt;.Values[enm];
     }
}

Let’s run benchmarks it on 50000 iterations (check the gist for the full source code)

vsprofilerenum

consoleoutputenum

Faster and Less allocated memory, Not so bad. (Note : It’s even a little better than the original codeproject article but with less possibilities). For sure, a .net Guru may find a more efficient way but It’s enough for me: I reached my performance goal. We now have an optimized ready-to-use extension method with less than 30 lines of code, that’s 40 times faster. Very well done!

But, We’re ALL WRONG !

Optimizing code means you have to think differently. Simply create a property (eventually lazy),  computed just once per request, and inserted somewhere in the scope. That’s it.

In every project, there are code conventions: this class to manage logging, this one to format date format, this one to translate string… We should be very careful with this kind of helpers, extension methods, utilities; we all have this kind of classes in our projects to factorize pure-technical stuff. Unsupervised usage may lead to serious performance issues.  Never do anything more than once.

Is this micro-optimization a complete waste of time? No at all. We’ve done a lot of interesting things: explore .net internals, run profiling sessions & benchmarks. The final solution is still valid but just not so important in our context. But it could be useful in the future…

Introducing Toppler

I recently created a new repository (https://github.com/Cybermaxs/Toppler ) and I would like to share with you the idea behind this project.

It’s been officially one year since I’ve discovered Redis and like every fan I can see many possibilities here and there. I’m also quite surprised that this DB is so little known in Microsoft stack but I’m sure this will change in a few months as Redis Cache will become the default caching service in Azure. But Redis is not just a cache! This Key-value Store has unique features and possibilities. Give it a chance.

So what is Toppler ? It’s just a very small package built on the top of StackExchange.Redis that helps you to count hits and get emitted events to build Rankings/Leaderboard.

Here are a few use cases where Toppler could help you

  • You want a counter for various events (an item is viewed, a game is played, … ) and get statistics about emitted events for custom time ranges (today, last week, this month, …)
  • You want to implement a leaderboard with single or incremental updates.
  • You want to track events in custom dimensions and get statistics for one, two, .. or all dimensions
  • You want to provide basic recommendations by combining most emitted events and random items for custom time ranges

How does it work?

One of the most important aspects in Toppler is the Granularity. Each hit is stored in a range of sorted sets representing different granularities of time (e.g. seconds, minutes, …).

The granularity class has 3 properties (Factor, Size, TTL) that allow to compose a smart key following this pattern [PREFIX]:[GRAN_NAME]:[ TS_ROUNDED_BY_FACTOR_AND_SIZE]:[TS_ROUNDED_BY_FACTOR] where [PREFIX] is the combination of the configured namespace with the current dimension and [TS_ROUNDED_XX] is the rounded unix timestamp for a given granularity.

Here are the values for the 4 default Granularities

Factor TTL Size
Second 1 7200 3600
Minute 60 172800 1440
Hour 3600 1209600 168
Day 86400 63113880 365

A TTL is assigned to each key (using Redis EXPIREAT) to keep a reasonable DB space usage.

So, a hit emitted at 17/07/2014 14:23:18 (UTC) will create/update these keys

  • [NAMSPACE]:[DIMENSION]:second:1405605600:1405606998
  • [NAMSPACE]:[DIMENSION]:minute:1405555200:1405606980
  • [NAMSPACE]:[DIMENSION]:hour:1405555200:1405605600
  • [NAMSPACE]:[DIMENSION]:day:1387584000:1405555200

When an event is emitted, the number of hits (often 1) is added to the target Sorted Set via the ZINCRBY command.

The retrieval of results use the same logic to recompose keys as the granularity and resolution are parameters of the Ranking method, but we use the ZUNIONSTORE command to combine all results in a single sorted set. This allows to store the result of the query (like a cache) or to apply a weight function.

Show Me the code !

topplerintro

It’s just a very basic example and many additional options are available to emit events (when, in which dimension, how many hits …) or compute statistics (single/multi dimensions, caching, granularity & resolution, weight function …).

The project is currently in Beta  so please be indulgent and patient ; Feel free to contact me, create issues, tell me what’s wrong … Thanks.

Acknowledgements