Last time, I explored how to store time series in Microsoft Azure Table Service. This time I’ll do the same but in Redis. It’s is a very popular key-value store (but not only) and I highly encourage you to review it if you still don’t know it. In addition to my latest post it’s important to note that Time Series have very special properties.
- Append only
Most of the time, collectors or emitters append data “at the end”.
- Readonly mode
Time Series are never updated. Points are just read to create beautiful charts, compute statistics, etc … Most probably, latest points are more important than the others : it’s common to use last minute, last hour, last day in charts.
- Big Data
A simple collector that runs every second creates 86 400 points per day, 2 592 000 per months, 31 536 000 per year… and this is just for a single metric.
There are many ways to implement Time series in Redis, for example by using sorted sets, lists or even hashes. None is really better and the choice depends on your context (yes, again). A few weeks ago, for one of my needs, I needed a very efficient way to store thousands of thousands of data points. Here are the details of this implementation, heavily inspired by a reposity of @antirez https://github.com/antirez/redis-timeseries
How it works
The main idea is to use the APPEND command.
If key already exists and is a string, this command appends the value at the end of the string. If key does not exist it is created and set as an empty string, so APPEND will be similar to SET in this special case.
Every time series is stored into a single string containing multiple data points that are mostly ordered in time. So, the library appends every data point as a serialized string terminated by a special character (#).Internally, every data point contains two parts: a timestamp and a value (also delimited by a special character (‘:’).
Now to make things a little bit more efficient, Key Names are generated via a basic algorithm. The key name contains a constant (user-defined) and a rounded timestamp, so we will not add all the points to the same key. The rounding factor depends on your configuration settings. If you set it to one hour, all data points inserted between 4PM and 5PM will go to the same key.
Reading data is done via GETRANGE. Why not a basic GET ? It’s mainly because each key can have a lot of data points. I don’t want to risk an OutOfMemoryException, Excessive Large Heap Allocations or GC … Depending of the rounding factor, it is also possible to hit redis several times. For example, if the rounding factor is set to one hour and you want the last 4 hours.
|Very space efficient||Inserts at the end only|
|Fast inserts (APPEND is O(1))||Unordered list|
|Up to 512 MB||Range queries in a serie not supported (requires fixed-size)|
|TTL on keys|
Introducing RedisTS (Redis Time Series)
Note : RedisTS depends on StackExchange.Redis. An open ConnectionMultiplexer is required and redists will never open a new connection or close the current.
At first, you have to create a client you’re your time series.
var options= new TimeSeriesOptions(3600 * 1000, 1, TimeSpan.FromDays(1)); var client = TimeSeriesFactory.New(db, "msts", options);
This operations is cheap and the client is –more or less- stateless. Once you have a client, you can add one or more datapoints. This is typically done via a background task or a scheduled task.
//here I add new datapoint. Date : now and Value : 123456789 client.AddAsync(DateTime.UtcNow, 123456789);
That’s all on client side. Now if you insert data points from one location, it’s expected to read them from another one, maybe to display beautiful charts or compute statistics. For example, if you want to get all the data points since one hour for the serie “myts”, you can use the following piece of code :
That’s all for today. I hope this will help you and feedback is heavily encouraged. Thanks