Category Archives: project

Storing Time Series in Redis

Last time, I explored how to store time series in Microsoft Azure Table Service. This time I’ll do the same but in Redis. It’s is a very popular key-value store (but not only) and I highly encourage you to review it if you still don’t know it. In addition to my latest post it’s important to note that Time Series have very special properties.

  • Append only
    Most of the time, collectors or emitters append data “at the end”.
  • Readonly mode
    Time Series are never updated. Points are just read to create beautiful charts, compute statistics, etc … Most probably, latest points are more important than the others : it’s common to use last minute, last hour, last day in charts.
  • Big Data
    A simple collector that runs every second creates 86 400 points per day,  2 592 000 per months, 31 536 000 per year… and this is just for a single metric.

There are many ways to implement Time series in Redis, for example by using sorted sets, lists or even hashes. None is really better and the choice depends on your context (yes, again). A few weeks ago, for one of my needs, I needed a very efficient way to store thousands of thousands of data points.  Here are the details of this implementation, heavily inspired by a reposity of @antirez https://github.com/antirez/redis-timeseries

How it works

The main idea is to use the APPEND command.

If key already exists and is a string, this command appends the value at the end of the string. If key does not exist it is created and set as an empty string, so APPEND will be similar to SET in this special case.

redistsdatapoints
Every time series is stored into a single string containing multiple data points that are mostly ordered in time. So, the library appends every data point as a serialized string terminated by a special character (#).Internally, every data point contains two parts: a timestamp and a value (also delimited by a special character (‘:’).

Now to make things a little bit more efficient, Key Names are generated via a basic algorithm. The key name contains a constant (user-defined) and a rounded timestamp, so we will not add all the points to the same key. The rounding factor depends on your configuration settings. If you set it to one hour, all data points inserted between 4PM and 5PM will go to the same key.

Reading data is done via GETRANGE. Why not a basic GET ? It’s mainly because each key can have a lot of data points. I don’t want to risk an OutOfMemoryException, Excessive Large Heap Allocations or GC … Depending of the rounding factor, it is also possible to hit redis several times. For example, if the rounding factor is set to one hour and you want the last 4 hours.

Pros Cons
Very space efficient Inserts at the end only
Fast inserts (APPEND is O(1)) Unordered list
Up to 512 MB Range queries in a serie not supported (requires fixed-size)
TTL on keys

Introducing RedisTS (Redis Time Series)

I’ve committed an implementation on github and a nuget package is available here. The usage is pretty trivial and you can inspect the test project or samples to understand how to use it.

Note : RedisTS depends on StackExchange.Redis. An open ConnectionMultiplexer is required and redists will never open a new connection or close the current.

At first, you have to create a client you’re your time series.

var options= new TimeSeriesOptions(3600 * 1000, 1, TimeSpan.FromDays(1));
var client = TimeSeriesFactory.New(db, "msts", options);

This operations is cheap and the client is –more or less- stateless. Once you have a client, you can add one or more datapoints. This is typically done via a background task or a scheduled task.

//here I add new datapoint. Date : now and Value : 123456789
client.AddAsync(DateTime.UtcNow, 123456789);

That’s all on client side. Now if you insert data points from one location, it’s expected to read them from another one, maybe to display beautiful charts or compute statistics. For example, if you want to get all the data points since one hour for the serie “myts”, you can use the following piece of code :

client.RangeAsync(DateTime.UtcNow.AddHours(-1), DateTime.UtcNow);

That’s all for today. I hope this will help you and feedback is heavily encouraged. Thanks

How speedy.js is your web site ?

As a performance officer, I recently watch by a presentation from Lara Callender Swanson about how Etsy moved towards a culture of performance and mobile web by educating, incentivizing and empowering everyone who works at Etsy.

Inspired by a repo on github and StackExchange‘s Miniprofiler, I’ve created to very simple script to display Navigation Timing stats at the top of a web page.

Navigation Timing is a JavaScript API for accurately measuring performance on the web. The API provides a simple way to get accurate and detailed timing statistics—natively—for page navigation and load events. It has always been a small challenge to measure  the time it takes to fully load a page, but Navigation Timing API now make this easy for all of us.

It’s important to understand that Navigation Timing data is very similar to network stats in developer tools.

speedy.js

Can I use … ?

Navigation Timing API is now supported by all major browsers (Can I use …?). Google Analytics and RUM services use it since a long time ago.  In case, it’s not supported by your browser an error message will be displayed.

nospeedy.js

No message => don’t hesitate to create an issue on github

Mobile ready ?

This is maybe the most interesting part. Developer tools are not available on mobile/tablet version so you don’t have any chance to evaluate page load time and to explain why it may be slow.

On production ?

Of course, it’s not recommended to display this kind of data to your users, but you may find several ways to use it on production. There are browser extensions to inject custom javascripts into any website (Cjs, Greasemonkey) ; Fiddler allows you to automatically inject scripts into a page (stackoverflow)

Here is an example on my stackoverflow profile

speedyonso

To conclude, don’t forget that Performance is a feature ! Displaying page load time on each page and to everyone is a great chance to detect performance issues early. Does a page violate your SLA? I think it’s now a little easier with this script.

Introducing Toppler

I recently created a new repository (https://github.com/Cybermaxs/Toppler ) and I would like to share with you the idea behind this project.

It’s been officially one year since I’ve discovered Redis and like every fan I can see many possibilities here and there. I’m also quite surprised that this DB is so little known in Microsoft stack but I’m sure this will change in a few months as Redis Cache will become the default caching service in Azure. But Redis is not just a cache! This Key-value Store has unique features and possibilities. Give it a chance.

So what is Toppler ? It’s just a very small package built on the top of StackExchange.Redis that helps you to count hits and get emitted events to build Rankings/Leaderboard.

Here are a few use cases where Toppler could help you

  • You want a counter for various events (an item is viewed, a game is played, … ) and get statistics about emitted events for custom time ranges (today, last week, this month, …)
  • You want to implement a leaderboard with single or incremental updates.
  • You want to track events in custom dimensions and get statistics for one, two, .. or all dimensions
  • You want to provide basic recommendations by combining most emitted events and random items for custom time ranges

How does it work?

One of the most important aspects in Toppler is the Granularity. Each hit is stored in a range of sorted sets representing different granularities of time (e.g. seconds, minutes, …).

The granularity class has 3 properties (Factor, Size, TTL) that allow to compose a smart key following this pattern [PREFIX]:[GRAN_NAME]:[ TS_ROUNDED_BY_FACTOR_AND_SIZE]:[TS_ROUNDED_BY_FACTOR] where [PREFIX] is the combination of the configured namespace with the current dimension and [TS_ROUNDED_XX] is the rounded unix timestamp for a given granularity.

Here are the values for the 4 default Granularities

Factor TTL Size
Second 1 7200 3600
Minute 60 172800 1440
Hour 3600 1209600 168
Day 86400 63113880 365

A TTL is assigned to each key (using Redis EXPIREAT) to keep a reasonable DB space usage.

So, a hit emitted at 17/07/2014 14:23:18 (UTC) will create/update these keys

  • [NAMSPACE]:[DIMENSION]:second:1405605600:1405606998
  • [NAMSPACE]:[DIMENSION]:minute:1405555200:1405606980
  • [NAMSPACE]:[DIMENSION]:hour:1405555200:1405605600
  • [NAMSPACE]:[DIMENSION]:day:1387584000:1405555200

When an event is emitted, the number of hits (often 1) is added to the target Sorted Set via the ZINCRBY command.

The retrieval of results use the same logic to recompose keys as the granularity and resolution are parameters of the Ranking method, but we use the ZUNIONSTORE command to combine all results in a single sorted set. This allows to store the result of the query (like a cache) or to apply a weight function.

Show Me the code !

topplerintro

It’s just a very basic example and many additional options are available to emit events (when, in which dimension, how many hits …) or compute statistics (single/multi dimensions, caching, granularity & resolution, weight function …).

The project is currently in Beta  so please be indulgent and patient ; Feel free to contact me, create issues, tell me what’s wrong … Thanks.

Acknowledgements