Performance Tips & Tricks with Xamarin for Android

I was recently involved on a project made with Xamarin. This new mobile app is not released yet, but it’s on a good track and I’m satisfied of the result. The initial objective was to create several Xamarin apps (Brands x Areas  on Android + iOS) with an important focus on code sharing & quality. I can’t talk too much on the software architecture but it’s built with MvvmCross.

This article targets Xamarin for Android (aka MonoDroid) only. I hope to find the time to write the same kind of article for iOS, but most of things listed here are pertinent for both platforms.

Watch out network requests

The primary data source for the app are public web apis. These days, it’s a fairly common architecture for a mobile application but don’t forget that latency on mobile networks matters. On a mobile network, we can’t really expect to get the response in less than 100 ms like on desktop browsers. To illustrate this, we’ve simply generated trace messages on every HTTP request ! (url, headers, and sometimes content) It may seem very basic, but it’s terribly important. This helped us to detect duplicated calls, lack of caching, ghost UI controls, duplicated events handlers …

Here are some HTTP logs at app’s startup. Same color = Same uri. Look at the timings, any problem here?

xam_iosI suggest you enable this kind of logging very early in your project, so everyone will be familiar and educated to mobile latency. Logging all IOs was certainly one of the best idea, but it’s only half of the jjob. To reduce the number of requests you could

  • Use Batch requests (combine several calls into a single request) or design your api following “Scenario Driven Design” principles
  • Use Abuse local caching, for example with Akavache or Sqlite
  • Use modernhttpclient and check gzip compression
  • Use pertinent speculative requests and background refreshe

Watch out memory allocation

My team and I am are used to develop server side code (web sites, web apis, workers, ..), running on servers having decent hardware. Unfortunately, you don’t have –yet- that luxury on a mobile app. I don’t think there are limitations but to be clear your app will have to run with only a few MBs. You should be more concerned on memory than ever. There are mainly two problems :

  • Excessive allocation adds pressure on GC

Garbage Collection & Memory and Performance Best Practices are must read to understand GC mechanisms on Xamarin. Why GC is so bad ? It just stops all running threads, including the UI thread L. Bye Bye, smooth animations and fast rendering. Minor collections are cheap (only a few ms, sometimes 50 ms) but Major collections are quite slow (from 100 ms to 1 sec). Don’t forget that Major GC collects Gen1 and large object space heaps (LOS). The LOS is where objects that require more than 8000 bytes are kept.  8000 is small, terribly small, especially when you have to consume web apis.

  • Memory leak increase major collections and leads to maximum GRefs reached

If you don’t pay attention on memory, your nursery, major heap or LOS will be rapidly full. Since Xamarin.Android 4.1.0, a full  -major- GC is performed when a gref threshold is crossed. This threshold is 90% of the known maximum grefs for the platform: 1800 grefs on the emulator (2000 max), and 46800 grefs on hardware (maximum 52000).

Here are some GC minor messages, visible via logcat. You should always try to understand these messages, and when they happen.

xam_gcs.pngObserve how the LOS size increases after HttpClient calls, a major GC is expected very soon…

Here is another typical example: Suppose translations contains 2000 items, and that this code is executed on each webview. Failed !

xam_ls.png

To detect excessive allocation and memory leaks, Android Studio are Xamarin Profiler are great. The produced mlpd file by the last tool can also be opened by Heap-Shot (mono).

Optimize your images

iOS has a built-in image optimization process but not Android. Like for Web performance Optimization, it’s better to have small & optimized images for both the GPU and your app size. Especially on Android, where are there are several folders, it is easy to include hundreds of images.

Many online services are available but you can also use directly pngout/opting for png and jpegtran for jpeg files. Just for your information, 25% was saved of the total assets size.

Note : please be sure to not include useless resources

Optimize your webviews

It’s also very common to have webviews in a mobile app. You can think of the WebView as a chromeless browser window that’s typically configured to run fullscreen. It’s sometimes required to explicitly NOT use native code for some topics (login, registration, content boxes with html, ….)  There are two well-known issues with them. First, webviews and their mobile browser counterpart don’t have similar performance profiles (Do u webview ?) ; it’s less accurate since latest versions, but that’s still something to take into account. Second, a webview is often focused on one feature and that’s all. Traditional SPAs has references to all scripts and styles in the main document, leading to a few seconds lost at startup (networks requests, parsing, layout …). That’s a terribly bad idea to integrate an SPA (made with any JS framework), and hope the page to be fully loaded in less than 5 secs. This simply means that your webviews should be optimized for your native app usage.

  • Log every request of your webview
  • Intercept HTTP requests and sometimes replace response
  • Use Javascript interfaces to bootstrap your JS
  • Check HTTP caching headers
  • Optimize your web resources (js, css, images, …) for a webview usage

Here is what you should AVOID (aka our initial attempt to integrate a SPA into a webview)

xam_webviews.png

Optimize your .NET code and keep your prod build clean

Developers often add debug messages, diagnostics code, local variables … don’t forget to cleanup that kind of code. Here is one of my favorite examples:

xam_traces.png

string.format and JsonConvert are executed even in a RELEASE build.

I will not talk about Asynchronous programming, which is just mandatory to create a responsive and professional app in 2015 but as a general perf recommendation, try to understand the “hot path” on your app/views. For example, the primary serializer used in this app is Newtonsoft.Json. It’s not the fastest .NET json serializer but certainly the more mature on Xamarin. There are some tips to make it go even faster. Using stream for deserialization is also great to avoid allocation on LOS, because the threshold could be easily reached with json format (see memory section, only 8KB).

Finally, like for any app, keep your production build optimized and clean.

  • Use a Release configuration
  • Remove dead code, unused variables & classes
  • Disable Android debugging
  • Review logging & diagnostics

Understand the Android Platform and controls

The app may be written in C#, using Mono/Xamarin … but it’s still running on Android. If you are familiar with Windows development, Xaml or WP, please try to forget everything because it’s a completely different platform. Is it better to use a listview or a recyclerview ? How to implement properly a Pager Adapter ? Is my layout optimized ? So yes, you will have to read a lot of articles here but it’s not a waste of time, because it is very close to the –official- Android development. Most of the articles, tips, SO answers that you can read on Android also apply to Xamarin.Android. I’ve already talked about webviews and we’ve done a lot of things that are specific to Android to get the fastest page load. At the middle of the project, a memory leak was identified on a recyclerview. There were simply too many instances of a custom controls.

xam_mvxframe.jpg

Our understanding of the inner PagerAdapter was just totally wrong, and we’ve fixed it after reading several SO answers.

Here the main tricks & tips that help me a lot. I wish I knew this at the beginning. My team and I did a lot of codes fixes & fine tuning that are mostly irrelevant without our  context, that’s why you have to find your way into Xamarin development.  I Hope this will help you.

Storing Time Series in Redis

Last time, I explored how to store time series in Microsoft Azure Table Service. This time I’ll do the same but in Redis. It’s is a very popular key-value store (but not only) and I highly encourage you to review it if you still don’t know it. In addition to my latest post it’s important to note that Time Series have very special properties.

  • Append only
    Most of the time, collectors or emitters append data “at the end”.
  • Readonly mode
    Time Series are never updated. Points are just read to create beautiful charts, compute statistics, etc … Most probably, latest points are more important than the others : it’s common to use last minute, last hour, last day in charts.
  • Big Data
    A simple collector that runs every second creates 86 400 points per day,  2 592 000 per months, 31 536 000 per year… and this is just for a single metric.

There are many ways to implement Time series in Redis, for example by using sorted sets, lists or even hashes. None is really better and the choice depends on your context (yes, again). A few weeks ago, for one of my needs, I needed a very efficient way to store thousands of thousands of data points.  Here are the details of this implementation, heavily inspired by a reposity of @antirez https://github.com/antirez/redis-timeseries

How it works

The main idea is to use the APPEND command.

If key already exists and is a string, this command appends the value at the end of the string. If key does not exist it is created and set as an empty string, so APPEND will be similar to SET in this special case.

redistsdatapoints
Every time series is stored into a single string containing multiple data points that are mostly ordered in time. So, the library appends every data point as a serialized string terminated by a special character (#).Internally, every data point contains two parts: a timestamp and a value (also delimited by a special character (‘:’).

Now to make things a little bit more efficient, Key Names are generated via a basic algorithm. The key name contains a constant (user-defined) and a rounded timestamp, so we will not add all the points to the same key. The rounding factor depends on your configuration settings. If you set it to one hour, all data points inserted between 4PM and 5PM will go to the same key.

Reading data is done via GETRANGE. Why not a basic GET ? It’s mainly because each key can have a lot of data points. I don’t want to risk an OutOfMemoryException, Excessive Large Heap Allocations or GC … Depending of the rounding factor, it is also possible to hit redis several times. For example, if the rounding factor is set to one hour and you want the last 4 hours.

Pros Cons
Very space efficient Inserts at the end only
Fast inserts (APPEND is O(1)) Unordered list
Up to 512 MB Range queries in a serie not supported (requires fixed-size)
TTL on keys

Introducing RedisTS (Redis Time Series)

I’ve committed an implementation on github and a nuget package is available here. The usage is pretty trivial and you can inspect the test project or samples to understand how to use it.

Note : RedisTS depends on StackExchange.Redis. An open ConnectionMultiplexer is required and redists will never open a new connection or close the current.

At first, you have to create a client you’re your time series.

var options= new TimeSeriesOptions(3600 * 1000, 1, TimeSpan.FromDays(1));
var client = TimeSeriesFactory.New(db, "msts", options);

This operations is cheap and the client is –more or less- stateless. Once you have a client, you can add one or more datapoints. This is typically done via a background task or a scheduled task.

//here I add new datapoint. Date : now and Value : 123456789
client.AddAsync(DateTime.UtcNow, 123456789);

That’s all on client side. Now if you insert data points from one location, it’s expected to read them from another one, maybe to display beautiful charts or compute statistics. For example, if you want to get all the data points since one hour for the serie “myts”, you can use the following piece of code :

client.RangeAsync(DateTime.UtcNow.AddHours(-1), DateTime.UtcNow);

That’s all for today. I hope this will help you and feedback is heavily encouraged. Thanks

Moving to xUnit.net

xUnit.net is a free, open source, community-focused unit testing tool for the .NET Framework. Written by the original inventor of NUnit v2, xUnit.net is the latest technology for unit testing C#, F#, VB.NET and other .NET languages. xUnit.net works with ReSharper, CodeRush, TestDriven.NET and Xamarin.”

Many Test Framework are available in .Net : MS Test, NUnit, xUnit, … Testing is an important aspect in Agile and XP and you can’t reach Continuous Deployement without it. Today, we write more and more tests, sometimes we have more tests than production code ; that’s why a decent test framework is so important today.

Why another test framework?

There are several articles & posts on internet explaining the pros and cons of each Test Framework. Here are my –unordered- personal reasons:

  • Nuget ‘All the way’

You don’t need to install a vsix extensions to VS, or manually add a reference to a file-based dependency (Microsoft.VisualStudio.QualityTools.UnitTestFramework.dll…. v10.0 or v10.1?)  or to create a specific Test Project. In xUnit, you just need a basic class Library with a reference to the official Nuget. The VS Test Runner is even a nuget. Everything is distributed via nupkg and it’s terribly simple and efficient.

  • Great community & active development

xUnit.net is free and open source. The code is hosted on github (800 commits, 30 contibutors)  and was previously on codeplex. The core team is made of inspired evangelists and very active. The official twitter account has 1400 followers and it’s still growing. xUnit is extensible and a lot of extensions like AutoFixture are already available on nuget.org.

  • Part of the next big thing

Do you know ASP.NET 5, Xamarin, DNX, …? xUnit is already compatible with all of these platforms. xUnit may be the first class citizen and default choice in .net in the future. It’s not useless to spend a couple of hours to understand it.

xunitplatforms

  • Well integrated in .NET ecosystem

There are troubles to switch to xUnit because it is already well integrated in the .net ecosystem: Team Foundation Server, CruiseControl.net, AppVeyor, TeamCity, Resharper, TestDriven.net, …

  • It helps to write “Better, Faster, Stronger” Tests

It’s maybe the most important reason. I have the impression that parameterized tests (Via Theories), Generic Tests, extensibility via Attributes, Shared Context, … remove a lot of glue and frictions in my test project. Assertions & Attributes are quite common compared to the others test Framework but there are a few additional interesting properties.xunittest

Integrating xUnit with Team Foundation Server

As I said, xUnit.net is already well integrated with several Build Server like AppVeyor or TeamCity. Using it with TFS is not so complicated but we will have a few extra configuration steps.

  1. Download the latest stable version of runner.visualstudio
  2. Change the extension to zip and extract all the files.
  3. 3 assemblies should be in the folder build\_common
  4. The 3 files show to committed somewhere in TFS (up to you)
  5. xunitdllsOpen Team Explorer Tab, Build -> Actions ->Manage Build Controllers, Select Properties. You should fill the TFS path to custom assemblies (xUnit)

controllersettings

That’s all. You can now write your tests and build your solution on the build agent. xUnit will be automatically used.

One particular problem in the current version is that TestFilters/TestCategories are not supported. xUnit has Traits but these ones are not understood by the runner. It’s common to mix unit tests & integration tests in a solution in order to have a better and stronger test strategy.

ctegories

If you are in this case, you have only two options:

  • Wait for xunit.runner.visualstudio 2.1 (summer 2015 ?)
    A recent commit added support to Test Filters. It’s a question a week to see it on prod. (Note : the fix is available on the public MyGet feed)
  • Create another solution without integration tests
    A little bit painful to maintain but having a clean and fast solution to build is a good practice.

I’m completely convinced by xUnit.net and I hope you too ! If you still don’t know it,  please give it a try.

Being valuable, Being a swiss knife

ultimate-swiss A Swiss Knife is a very popular pocket knife, used by several armies -but not only- and generally has a very sharp blade, as well as various tools, such as screwdrivers, a can opener, and many others. These attachments are stowed inside the handle of the knife through a pivot point mechanism. When I was a child, I owned one and was fascinated by its utility. What a great design! To be clear, my wish, as a software engineer is to be a swiss knife and I am very proud to be considered as is by my co-workers or my managers. I will try to explain in this post why it’s important to me and why every developer should take this into consideration.

The need for a swiss knife

A swiss knife is a highly valuable tool that everyone wants to have in its pocket because it helps you to solve any problem. And yes this is the sad tragedy of the software industry: problems and puzzles are everywhere. Why do I have an exception here? How will you implement this business feature ? What is this crappy code ? What will be the architecture of our future web site? Why our application has miserably failed yesterday? Is this new hype tech/language/framework/tool/… interesting for us? .… Of course, we can’t predict our future problems but solving puzzles is the essence of software engineering. At the end of the month, we’re not paid just for coding but for solving problems. The most important thing is here : each time that we fix up something, we bring value to someone or something: our customers, our company, our co-workers or even ourself. Bringing value is what makes us valuable, one of the most important recognition in our job. Even better, try to avoid puzzles, but it’s another story.

To be a swiss knife you need … to be focused on value

Even nowadays, I can see some developers waiting for tasks, committing quick and dirty solutions, being focused only on their scope and not aligned on company’s objectives … of course this isn’t how we should be. Everyone in the company should be focused on the final product, on the customer, on business value, to help customer support & product owners. Read more on devops culture. Many people can say here “Of course, I am” but don’t be too naive : Being focused only on value means that you should sometimes work on boring tasks, old-fashioned techs, deprecated libraries, crappy code, ..etc… This is the tradeoff

To be a swiss knife you need … to think as a software engineer

An engineer is a professional practitioner of engineering, concerned with applying scientific knowledge, mathematics, and ingenuity to develop solutions for technical, societal and commercial problems.

We’ve already seen several articles that illustrate this concept but please stop to be focused on your favorite stack. NOW ! After a decade, I’ve seen so many things… For example, asp.net WebForms were very cool 10 years ago (and yes it was really compared to the others available techs at this period), but at some moment I’ve switched to something else. Because I learned strong foundations, I was able to reuse my skills quite easily in another tech after. Don’t learn to code. Learn to think ! The Silver Bullet Syndrome is a good illustration of this problem. It is the belief that “the next big change in tools, resources or procedures will miraculously or magically solve all of an organization’s problems”. Stop to chase the chimera, it won’t really fix it. If it helps you to fix your current problems, you will quickly see new ones. The Answer to the Ultimate Question of Life, The Universe, and Everything is not “42”, but “it depends on the context”. Like for design patterns, there is absolutely no magic solution for everything, you just need to find a good one, by applying YOUR vision in YOUR context. Like a real craftman, you need tools in your daily job but unlike a real craftman, our tools are constantly evolving. Do you know an Industry in the world with so many flavors & communities that allow you to work on tools released less than two weeks ? Neither do I. You have to learn to think and practices and not technologies, you should not be narrow minded but open to experience and feedbacks.

To be a swiss knife you … don’t need to be a technical expert

There are several technical experts and evangelists in this world (and we need them) but we’re not all dedicated to this future. By the way, it’s so boring to always do the same thing and to always chase the same daemons. We have the illusion that TOP programmers are the most valuable resource for a company but I don’t think so. You no longer need to be a brilliant programmer to bring success to your company or to achieve success. You just have to solve problems and the good news is that it’s fairly useless to stay hours in front of our screen to reinvent the wheel in each project. On one side, you have a supercomputer in your pocket, another supercomputer on your desk, and dozens of supercomputers in the cloud. On another side, thousands of open source frameworks and libraries can do 90% of the work for you. GitHub, Wikipedia, Stack Overflow, and of course the very wide range of articles, tutorials, feedbacks, posts available on the Internet. The hard part of building software is specification, design, modelling … not how you will write code. Architecture, Monitoring & Diagnostics, TestS, Ownership, Continuous delivery, maintainability, Performance & Scalability, … so many things that should be part of your software.

To be a swiss knife you should … practice

The positive side effect of considering myself as a swiss knife is for learning new stuff and run experimentation. We know that it’s vital for us to learn because our world is moving so fast. When I started to work, it was very different and I know it will be different in 10 years too. Compared to a traditional developers attached to its “confort zone”, I have much more occasions to try & test new things in a real world examples. At the end, if it doesn’t work properly, if I takes too much times, … it doesn’t matter because I am my own product owner. All these tests give me more arguments, experiences, more skills, to become a better problem solver aka a better swiss knife. It’s a way to be pro-active : I have a wide range of things in my pocket to help me in every situation. To conclude, I prefer to know thousands of topics -imperfectly- compared to mastering just only one. I know that I have enough skills and motivations to adapt myself to any problem. A new stack to learn? Ok, let me several days, I’ll do it. It doesn’t really matter the number of skills that you have, what is important is to be able to adapt to each situation, as a soldier with a swiss knife. I know that if I have to work on an unknow topic, I will spent my hours to learn it in order to be efficient and in order to make the right decisions when it’s needed.

Storing time series in Microsoft Azure Table Storage

In a recent project, I needed to store time series into Microsoft Azure Table Storage. As for most of NoSQL databases, the design is very important and you have to mainly think about your access patterns.

A time series is a sequence of data points, typically consisting of successive measurements made over a time interval. Time series data often arise when monitoring an application or tracking business metric, but occur naturally in many application areas like finance, economics, medicine …

appinsights

Time series are very frequently plotted via –beautiful- charts. Here is an example on availability on Application Insight. In this example, the end user can view response time for several time ranges. Data is aggregated depending on the selected time range (every second for last hour, every minute for last day, every 5 minutes for last 48 hours, …) to keep it easy to understand.

If you’re unfamiliar with Table Service on Azure, I highly recommend you to read the Introduction then the Design Guide ; after reading both articles, I hope you will understand why storing time series is not so easy and needs this kind of article.

To illustrate following designs, I use random generated data, basically one point/second. Having one data point per second makes it very clear and easy to read. In your implementation, you can have more than one point per second; if doesn’t matter.

Basic design

A first design could be to create a basic entity for each data points, like this one

public class DataPoint : TableEntity
{
    public long Value { get; set; }
    public DataPoint() { }
    public DataPoint(string source, long timeStamp, long value)
    {
        this.PartitionKey = source;
        this.RowKey = timeStamp.ToString();
        this.Value = value;
    }
}

Why a timestamp for the time property? It’s simply a better choice because RowKey is a string (constraint by Table Service). Ticks could be an idea but I prefer a common value (Unix timestamp) for time axis, so any client could request this table via REST endpoint.

This will produce this kind of results

ts_basic

The RowKey equals here to the Unix timestamp in seconds (due to my test data) but it could be the traditional Unit timestamp (ms). Now If you want to query entities for a specific time range (two hours), you need to use a range query on the RowKey. This is the 2nd most efficient query type on table storage. Here is an example for the first two hours of 2015:

// time range : first two hours of 2015
var from = new DateTime(2015, 1, 1, 0, 0, 0);
var to = new DateTime(2015, 1, 1, 2, 0, 0);

// generate the range query
string filter = TableQuery.CombineFilters(
    TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "mysource"),
    TableOperators.And,
    TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, ToSecondsTimestamp(from).ToString()),
        TableOperators.And,
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, ToSecondsTimestamp(to).ToString()))
        );

TableQuery<DataPoint> query = new TableQuery<DataPoint>().Where(filter);
var results = table.ExecuteQuery<DataPoint>(query);

The generated filter is $filter= (PartitionKey eq ‘mysource’) and ((RowKey ge ‘1420070400’) and (RowKey le ‘1420077600’))

This design seems to be good but it isn’t for real world scenarios. First, the rule #1 of Table Service (scalability) is not followed here. Performance is not the same for 1 000 rows and 1 000 000 000 rows. Having a hot partition (always the same PartitionKey) is an anti-scalability on Table Service. Second, the number of data points included in the desired time range affects performance : in my example, for 2 hours, we will receive 7200 entities (one entity for every second). The REST service limits the number of returned entities to 1000 per request. So, we will have to execute several requests to get all entities for the selected time range. Fortunately the .net will silently do this job for us but it’s not the case for all clients. And what about last 24 hours ? What about data every ms ?

Pros Cons
Natural representation No Scalability (hot partition)
Easy to query with Range queries Inefficient queries (limited to 1000 entities per request)
  Slow inserts (limited to 100 points by EGT)

A better approach could be to store row in reverse order. New data points are automatically added at the beginning of the table. Queries should be a little more efficient, but not so many in facts.

Advanced Design

The key idea with time series is that one dimension is well-known: the time. Every request contains a lower bound (from) and an upper bound (to). So, we will use the Compound key pattern to compute smart Partition & Row Keys thus enabling a client to lookup related data with an efficient query. A very common technique when working with time series is to round the time value with a magic factor. To fully understand these magic factors, you need one of my gists. ToRoundedSecondsTimestamp() is heavily used in the my examples. For example Tuesday, 28 April 2015 12:04:35 UTC could be rounded to

Factor Rounded timestamp Datetime
1 (no factor) 1430222735 Tuesday, 28 April 2015 12:04:35 UTC
60 (one hour) 1430222700 Tuesday, 28 April 2015 12:04:00 UTC
3600 (one hour) 1430222400 Tuesday, 28 April 2015 12:00:00 UTC
86400 (one day) 1430179200 Tuesday, 28 April 2015 00:00:00 UTC

In this design, each row contains more than one data points; basically all points included in the current step (between the current rounded timestamp and the next rounded timestamp). This is possible thanks to DynamicTableEntity in Table Service. Just to illustrate the concept, here is an example with PartitionFactor = 60 secs and RowFactor = 5 secs.

ts_advanced

Notice how PartitionKey/RowKey are now rounded timestamps. This is exactly the same data in my first design. In this example, each row contains 5 data points and each partition contains 12 rows. Of course, in a real scenario, our objective is to maximize the number of points in a single row.

To get the first two hours of 2015, the query is a little bit more complicated but not less efficient.

var from_ts = ToRoundedSecondsTimestamp(new DateTime(2015, 1, 1, 0, 0, 0), RowFactor);
var to_ts = ToRoundedSecondsTimestamp(new DateTime(2015, 1, 1, 2, 0, 0).AddSeconds(-1), RowFactor);

// generate all row keys in the time range
var nbRows = (to_ts - from_ts) / RowFactor + 1;
var rowKeys = new long[nbRows];

for (var i = 0; i < nbRows; i++)
{
    rowKeys[i] = from_ts + i * RowFactor;
}

// group row keys by partition
var partitionKeys = rowKeys.GroupBy(r => ToRoundedTimestamp(r, PartitionFactor));

var partitionFilters = new List<string>();
foreach (var part in partitionKeys)
{
    // PartitionKey = X and (RowKey=Y or RowKey=Y+1 or RowKey=Y+2 ...)
    string partitionFilter = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, part.Key.ToString());
    string rowsFilter = string.Join(" " + TableOperators.Or + " ", part.Select(r => TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, r.ToString())));
    string combinedFilter = TableQuery.CombineFilters(partitionFilter, TableOperators.And, rowsFilter);
    partitionFilters.Add(combinedFilter);
}

// combine all filters
string final = string.Join(" " + TableOperators.Or + " ", partitionFilters);
var query = new TableQuery<DynamicTableEntity>().Where(final);
var res = table.ExecuteQuery(query);

//do something with results...

The generated filter (with partition Factor = one hour, row Factor =4 minutes) is $filter= (PartitionKey eq ‘1420070400’) and (RowKey eq ‘1420070400’ or RowKey eq ‘1420070640’ or RowKey eq ‘1420070880’ or RowKey eq ‘1420071120’ or RowKey eq ‘1420071360’ or RowKey eq ‘1420071600’ or RowKey eq ‘1420071840’ or RowKey eq ‘1420072080’ or RowKey eq ‘1420072320’ or RowKey eq ‘1420072560’ or RowKey eq ‘1420072800’ or RowKey eq ‘1420073040’ or RowKey eq ‘1420073280’ or RowKey eq ‘1420073520’ or RowKey eq ‘1420073760’) or (PartitionKey eq ‘1420074000’) and (RowKey eq ‘1420074000’ or RowKey eq ‘1420074240’ or RowKey eq ‘1420074480’ or RowKey eq ‘1420074720’ or RowKey eq ‘1420074960’ or RowKey eq ‘1420075200’ or RowKey eq ‘1420075440’ or RowKey eq ‘1420075680’ or RowKey eq ‘1420075920’ or RowKey eq ‘1420076160’ or RowKey eq ‘1420076400’ or RowKey eq ‘1420076640’ or RowKey eq ‘1420076880’ or RowKey eq ‘1420077120’ or RowKey eq ‘1420077360’)

So, what are good PartitionKey/RowKey factors? It’s up to you and depends on your context but there are two constraints. First, a table service entity is limited to 1 MB with a maximum of 255 properties (including the PartitionKey, RowKey, and Timestamp). If you have data every second, 4 mins (60*4 = 240) seems to be a good RowKey Factor, but if you have several points per second it won’t be pertinent. The second problem is the filter length. Having too many partitions and rows will create a very long filter that can be rejected by the table service (HTTP 414 “Request URI too long”). In this case, you can execute partitions scans by removing row keys in the produced filter but this will be less efficient (depends on the number of rows in each partition)

Pros Cons
Highly scalable Point queries are not always possible (Uri length limits)
Fast inserts (if required) Granularity should be defined at the beginning (how many rows/partition ? how many points ?)
Dev tools can’t be used with dynamic columns

To conclude, Microsoft Azure Table Storage is a fast and very powerful service that you should not forget to use for your application, even if they are not hosted on Azure. However you should use proper designs for your tables, very far from traditional relational designs. Choosing a good couple of PartitionKey/RowKey is very important. We covered in this article a simple use case (time series) but there are so many scenarios… In terms of pricing, it’s very cheap ; but your choice should not be driven by costs here, but by scalability and performance.

Source code is available here : Basic and Advanced 

How to avoid 26 API requests on your page?

The problem

Create an applications relying on web APIs seems to be quite popular these days. There is already an impressive collection of ready-to-use public APIs (check it at http://www.programmableweb.com ) that we can consume to create mashup, or add features into our web sites.

In the meantime, it has never been so easy to create your own REST-like API with node.js, asp.net web api, ruby or whatever tech you want. It’s also very common to create your own private/restricted API for your SPA, Cross-platform mobiles apps or your own IoT device. The naïve approach, when building our web API, is to add one API method for every feature ; at the end, we got a well-architectured and brilliant web API following the Separation of Concerns principle : one method for each feature. Let’s put it all together in your client … it’s a drama in terms of web performance for the end user and there is never less than 100s requests/sec  on staging environment; Look at your page, there are 26 API calls on your home page !

I talk here about web apps but it’s pretty the same for mobile native applications .RTT and latency is much more important than bandwidth speed. It’s impossible to create a responsive and efficient application with a chatty web API.

The proper approach

At the beginning of December 2014, I attended at the third edition in APIdays in Paris. There was an interesting session –among the others- on Scenario Driven Design by @ijansch.

The fundamental concept in any RESTful API is the resource. It’s an abstract concept and it’s merely different from a data resource. A resource should not be a raw data model (the result of an SQL query for the Web) but should be defined with client usage in mind. “REST is not an excuse to expose your raw data model. “ With this kind of approach, you will create dump clients and smart APIs with a thick business logic.

A common myth is “Ok, but we don’t know how our API is consumed, that’s why we expose raw data”. Most of the times, it’s false. Let’s take the example of twitter timelines. They are list of tweets or messages displayed in the order in which they were sent, with the most recent on top. This is a very common feature and you can see timelines in every twitter client. Twitter exposes a timeline API and API clients just have to call this API to get timelines. Especially, clients don’t have to compute timelines by themselves, by requesting XX times the twitter API for friends, tweets of friends, etc …

I think this is an important idea to keep in mind when designing our APIS. generally,  We don’t need to be so RESTful (Wwhat about HATEOS ?). Think more about API usability and scenarios, that RESTfullness.

The slides of this session are available here.

Another not-so-new approach: Batch requests

Reducing the number of request from a client is a common and well-known Web Performance Optimization technique. Instead of several small images, it’s better to use sprites. Instead of many js library files, it’s better to combine them. Instead of several API calls, we can use batch requests.

Batch requests are not REST-compliant, but we already know that we should sometimes break the rules to have better performance and better scaliblity.

If you find yourself in need of a batch operation, then most likely you just haven’t defined enough resources., Roy T. Fieldin, Father of REST

What is a batch request ?

A batch request contains several different API requests into a single POST request. HTTP provides a special content type for this kind of scenario: Multipart. On server-side, requests are unpacked and dispatched to the appropriate API methods. All responses are packed together and sent back to the client as a single HTTP response.

Here is an example of a batch request:

Request

POST http://localhost:9000/api/batch HTTP/1.1
Content-Type: multipart/mixed; boundary="1418988512147"
Content-Length: 361

--1418988512147
Content-Type: application/http; msgtype=request

GET /get1 HTTP/1.1
Host: localhost:9000


--1418988512147
Content-Type: application/http; msgtype=request

GET /get2 HTTP/1.1
Host: localhost:9000


--1418988512147
Content-Type: application/http; msgtype=request

GET /get3 HTTP/1.1
Host: localhost:9000


--1418988512147--

Response

HTTP/1.1 200 OK
Content-Length: 561
Content-Type: multipart/mixed; boundary="91b1788f-6aec-44a9-a04f-84a687b9d180"
Server: Microsoft-HTTPAPI/2.0
Date: Fri, 19 Dec 2014 11:28:35 GMT

--91b1788f-6aec-44a9-a04f-84a687b9d180
Content-Type: application/http; msgtype=response

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

"I am Get1 !"
--91b1788f-6aec-44a9-a04f-84a687b9d180
Content-Type: application/http; msgtype=response

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

"I am Get2 !"
--91b1788f-6aec-44a9-a04f-84a687b9d180
Content-Type: application/http; msgtype=response

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

"I am Get3 !"
--91b1788f-6aec-44a9-a04f-84a687b9d180--

Batch requests are already supported by many web Frameworks and allowed by many API providers:  Asp.net web api, a node.js module, Google Cloud platform, Facebook, Stackoverflow,Twitter …

Batch support in  asp.net web api

To support batch requests in your asp.net web API, you just have to add a new custom route  :

config.Routes.MapHttpBatchRoute(
routeName: "batch",
routeTemplate: "api/batch",
batchHandler: new DefaultHttpBatchHandler(GlobalConfiguration.DefaultServer)
);

Tip : the DefaultBatchHandler doesn’t provide a way to limit the number of requests in a batch. To avoid performance issues, we may want to limit to 100/1000/… concurrent requests. You have to create your own implementation by inheriting DefaultHttpBatchHandler.

This new endpoint will allow client to send batch requests and you have nothing else to do on server-side. On client side,  to send batch requests, you can use this jquery.batch, batchjs, angular-http-batcher module, …

I will not explain all details here but there is an interesting feature provided by DefaultHttpBatchHandler : the property ExecutionOrder allow to choose between sequential or no sequential processing order. Thanks to the TAP programming model, it’s possible to execute API requests in parallel (for true async API methods)

Here is the result of async/sync batch requests for a pack of three 3 web methods taking one second to be processed.

batch

Finally, Batch requests are not a must-have feature but it’s certainly something to keep in mind. It can help a lot in some situations. A very simple demo application is available here. Run this console app or try to browse localhost:9000/index.html. From my point of view here are some Pros/Cons of this approach.

Pros Cons
Better client performance (less calls) May increase complexity of client code
Really easy to implement on server side Hide real clients scenarios, not REST compliant
Parallel requests processing on server side Should limit batch size at server level for public API
Allow to use GET, POST, PUT, DELETE, … Browser cache may not work properly

Towards a better local caching strategy

Why Caching?

A while ago, I explained to some of my co-workers the benefits of caching. I’m always surprised to see how this technique is so misunderstood by some developers.

In computing, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere.

The thing is that caching is already present everywhere: CPU, Disk, Network, Web, DNS, … it’s one of the oldest programming techniques, available in any programming language and frameworks. You may think that it was mandatory with –only- 8 Ko of RAM two decades ago, but don’t too naive: it’s still a pertinent approach in our always-connected world : more data, more users, more clients, real-time, …

In this article, I will focus only on application caching through System.Runtime.Caching. Nothing really new here, but I just want to review 3 basics caching strategies, that you can see in popular and oss projects; it’s important to have solid foundations. Even if the language is C#, many concepts listed here are also valid in others languages.

Local Caching Strategies

By Local/InMemory cache, I mean that data is held locally on the computer running an instance of an application. System.Web.Caching.Cache, System.Runtime.Caching.MemoryCache, EntLib CacheManager are well-known local cache.

There is no magic with caching and there is a hidden trade-off: caching means working with stale data. Should I increase the cache duration? Should I keep short TTL value? It’s never easy to answer to these questions, because it simply depends on your context: topology of data, number of clients, user load, database activity…

When implementing a local caching strategy, there is an important list of questions to ask yourself :

  • How long the item will be cached?
  • Is data coherence important?
  • How long it takes to reload the data item?
  • Does the number of executed queries on the data source matter?
  • Does caching strategy impact the end user ?
  • What is the topology of data: Reference data, Activity data, session data, … ?

The –very- basic interface we will implement in those 3 following examples contains a single method.

 public interface ICacheStrategy
 {
 /// <summary>
 /// Get an item from the cache (if cached) else reload it from data source and add it into the cache.
 /// </summary>
 /// <typeparam name="T">Type of cache item</typeparam>
 /// <param name="key">cache key</param>
 /// <param name="fetchItemFunc">Func<typeparamref name="T"/> used to reload the data from the data source (if missng from cache)</param>
 /// <param name="durationInSec">TTL value for the cache item</param>
 /// <param name="tokens">list of string to generate the final cache key</param>
 /// <returns></returns>
 T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens);
 }

Basic Strategy

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            var cacheKey = this.CreateKey(key, tokens);
            var item = this.Cache.Get<T>(cacheKey);
            if (this.IsDefault(item))
            {
                item = fetchItemFunc();
                this.Cache.Set(cacheKey, item, durationInSec, false);
            }
            return item;
        }

This is similar to Read-Through caching. The caller will always get an item back, coming from the cache itself or the data source. When a cache client asks for an entry, and that item is not already in the cache, the strategy will automatically fetch it from the underlying data source, then place it in the cache for future use and finally will return the loaded item to the caller.

Double-checked locking

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            string cacheKey = this.CreateKey(key, tokens);
            var item = this.Cache.Get<T>(cacheKey);

            if (this.IsDefault(item))
            {
                object loadLock = this.GetLockObject(cacheKey, SyncLockDuration);
                lock (loadLock)
                {
                    item = this.Cache.Get<T>(cacheKey);
                    if (this.IsDefault(item))
                    {
                        item = fetchItemFunc();
                        this.Cache.Set(cacheKey, item, durationInSec);
                    }
                }
            }

            return item;
        }

This version introduces a locking system. A global synchronization mechanism (a single object for every cache item) is not efficient here, that’s why there is a dedicated synchronization object per cache item (depending on the cache key). The double-checked locking is also really important here to avoid useless/duplicated requests on the data source.

Refresh ahead strategy

The full implementation is available here.

        public T Get<T>(string key, Func<T> fetchItemFunc, int durationInSec, params string[] tokens)
        {
            // code omitted for clarity

            // not stale or don't use refresh ahead, nothing else to do =&gt; back to double lock strategy
            if (!item.IsStale || staleRatio == 0) return item.DataItem;
            // Oh no, we're stale - kick off a background refresh

            var refreshLockSuccess = false;
            var refreshKey = GetRefreshKey(cachekey);

            // code omitted for clarity

            if (refreshLockSuccess)
            {
                var task = new Task(() =>
                {
                    lock (loadLock)
                    {
                        // reload logic
                    }
                });
                task.ContinueWith(t =>
                {
                    if (t.IsFaulted) Trace.WriteLine(t.Exception);
                });
                task.Start();
            }
            return item.DataItem;
        }

In this implementation it’s possible to configure a stale ratio, enabling an automatic and asynchronous refresh on any recently accessed cache entry before its expiration. The application/end user will not feel the impact of a read against a potentially slow cache store when the entry is reloaded due to expiration. If the object is not in the cache and If the object is accessed after its expiration time, it’s similar to the double-checked locking strategy.

Refresh-ahead is especially useful if objects are being accessed by a large number of users. Values remain fresh in the cache and the latency that could result from excessive reloads from the cache store is avoided.

Experimental Results

To see the impact of each strategy, I’ve committed code Github. This program simulates fake workers, that get the same item from the cache during 60 sec for each strategy. A fake reload method taking one sec is used to simulate access to the datasource.  All cache hits, cache misses and reloads are recorded. This may not be the best program of the world, but it’s fairly enough to illustrate this article.

Strategy Number of Gets/Reloads Avg. Get Time (ms) < 100 ms
Basic 3236 / 181 59.17 ms 94.15 %
Double-checked locking 3501 / 11 51.22 ms 94.10 %
Refresh Ahead 3890 / 11 0.20 ms 100%

Between the basic and the double-checked locking strategy, there is an important improvement in terms of reloads from the data source, with nearly the same avg. response time. Between the double-checked locking and refresh-ahead strategy, the number of reloads is exactly the same but the response time is greatly improved. It’s very easy to use a cache, but be sure to use the appropriate pattern that fits correctly to your use cases.

Bonus : Local Cache invalidation

One year ago, I’ve posted an article on Betclic Tech Blog about implementing local memory cache invalidation with Redis PubSub feature. The idea is fairly simple: catch invalidation messages at application level to remove or more items from a local cache. The current version is more mature and a nuget is available here.  This can be easily used in many ways:

  • Invalidate items, such as remove them from the cache
  • Mark items as stale, to use Background reloading with external event..

To conclude

We covered here only some variations of the cache-aside pattern. I hope that you’re now more aware of the possibilities and troubles you may have with a local cache. It’s very easy and efficient to use a local cache, so be sure to use the appropriate pattern that fits correctly to your use cases.

By the way, in an era of cloud applications, fast networks, low latencies, a local cache is –still- very important. Nothing, absolutely nothing is faster that accessing local memory. One natural evolution often cited for a local cache, is a distributed cache. As we’ve seen here, this is not always the unique solution, but there is another story.

The full source code is available on Github.