A Comprehensive Analysis of Open-Source Time Series Databases (2)

By Zhaofeng Zhou (Muluo)


KairosDB was originally a branch that was forked from OpenTSDB version 1.x, with the goal of implementing secondary development based on the OpenTSDB code to meet new functional requirements. One of its refinements is to support pluggable storage engines. For example, support for H2 facilitates local development and testing, instead of being strongly coupled with HBase like OpenTSDB. In earlier versions, HBase was also its primary storage engine. However, in the subsequent storage optimizations, HBase was gradually replaced by Cassandra, which made it the first time series database developed based on Cassandra. In the latest versions, HBase is no longer supported, because some attributes unique to Cassandra but not available in HBase are used for storage optimization.

The overall architecture is similar to that of OpenTSDB, and both use a mature database as the bottom-layer storage engine. The main logic is only a thin logic layer above the storage engine layer. The deployment architecture of this logic layer is a stateless component that can be easily scaled horizontally.

In terms of functional differences, it performs secondary development on OpenTSDB 1.x, which is also to optimize some features of OpenTSDB or develop some features not available in OpenTSDB. I’ll outline some of the major functional differences I about:

  1. Pluggable storage engine: Earlier versions of OpenTSDB are strongly coupled with HBase. To pursue the ultimate performance, even an asynchronous HBase Client (which is now output as an independent open-source project: AsyncHBase) was developed. As a result, the entire code is written in asynchronous-driven mode, which not only increases the complexity of the code and reduces the readability, but also increases the difficulty of supporting multiple storage engines. KairosDB strictly defines the API interface of the storage layer. The overall logic and storage layer are less coupled, which makes it easier to scale multiple storage engines. The latest version of OpenTSDB can also support Cassandra and BigTable, but it can not be described as an architecture supporting pluggable storage engines in terms of the overall architecture.

Storage Model

The main design feature is the use of UID encoding. This greatly saves storage space, and many queries are optimized by using the HBase filter based on the fixed byte count attribute of UID encoding. However, UID encoding also has many defects. First, the mapping table of metric/tagkey/tagvalue to UID needs to be maintained. All data point writes and reads need to be converted through the mapping table. The mapping table is usually cached in the TSD or client, which increases additional memory consumption. Second, due to the UID encoding, the number of metric/tagkey/tagvalue has an upper limit, depending on the number of bytes used by UID, and conflicts may occur in UID allocation, which affects writing.

Essentially, the UID encoding optimization adopted by the OpenTSDB storage model mainly solves two problems:

  1. Optimize storage space: UID encoding solves the problem of redundant storage space caused by repeated storage of row keys.

To solve these two problems, KairosDB adopts a different method that doesn’t require UID encoding, and these problems are avoided. Let’s take a look at the storage model of KairosDB first. It is mainly composed of the following three tables:

  1. DataPoints: To store all original data points. Each data point is also composed of metric, tags, timestamp and value. The time span of a row of data in this table is three weeks. That is, all data points within three weeks are stored in the same row, while the time span of rows within OpenTSDB is only one hour. The composition of RowKey is similar to that of OpenTSDB. The structure is <metric><timestamp><tagk1><tagv1><tagk2>tagv2>...<tagkn><tagvn>. The difference is that the metric, tagkey and tagvalue all store the original values, instead of UIDs.

The KairosDB storage model takes advantage of Cassandra’s wide tables. In the bottom-layer file storage format of HBase, each column corresponds to a key value, and the key is the rowkey of the row. Therefore, each column in an HBase row stores the same rowkey repeatedly. This is the main reason why UID encoding can save a lot of storage space, and also the reason why the compaction policy (to merge all columns in a row into one column) can be adopted to further compact the storage space after the UID encoding. The bottom-layer file storage format of Cassandra is different from that of HBase. Each column in a row of Cassandra does not store the rowkey repeatedly, so UID encoding is not required. One of the optimization solutions to reduce storage space in Cassandra is to reduce the number of rows, which is why it stores three weeks of data instead of one hour of data per row. For more information about the reasons for these two solutions, see Hbase File Format and Cassandra File Format.

Using Cassandra’s wide tables, even without UID encoding, the storage space is not much worse than OpenTSDB with UID encoding. The following is the official explanation:

For one we do not use IDs for strings. The string data (metric names and tags) are written to row keys and the appropriate indexes. Because Cassandra has much wider rows there are far fewer keys written to the database. Not much space is saved by using id’s and by not using id’s we avoid having to use any kind of locks across the cluster.

As mentioned, Cassandra has wider rows. The default row size in OpenTSDB HBase is 1 hour. Cassandra is set to 3 weeks.

The query optimization method adopted is also different from that of OpenTSDB. The following is the entire process of querying within KairosDB:

  1. Find the row keys in all DataPoints tables based on the query criteria.
  • If you have a custom plugin, you can obtain all row keys from the plugin. (The plugin allows you to scale and use an external indexing system to index the row keys. For example, using ElasticSearch)

Find all data from the DataPoints table based on the row keys

Compared with OpenTSDB that scans directly on data tables to filter row keys, KairosDB can absolutely reduce the amount of data scanned by using index tables. In the case of limited tagkey and tagvalue combinations under the metric, the query efficiency is greatly improved. KairosDB also provides a QueryPlugin method, which can scale and use external components to index row keys. For example, ElasticSearch or other indexing systems can be used, because indexing is the best query solution after all. This is also the biggest improvement of Heroic over KairosDB.


The official KairosDB documents contain sections on how to configure the auto-rollup. But in the discussion group, the description of the auto-rollup is as follows:

First off Kairos does not do any aggregation on ingest. Ingest is direct to the storage on purpose — performance.

Kairos aggregation is done after the fact at query time. The rollups are queries that are ran and the results are saved back as a new metric. Right now the rollups are all configured on a per kairos node basis. We plan on changing this in the future.

Right now Kairos does not share any state with other Kairos nodes. They have very little state on the node (except for rollups).

As for consistency it is up to you on how much you want or how important the data is to you.

In summary, the auto-rollup solution provided by KairosDB is still relatively simple to implement. It is a configurable stand-alone component that can be started at a scheduled time, read out the written data, and then write the data again after aggregation. It is indeed very primitive, with low availability and performance.

However, it is better than nothing. Auto-rollup support is a trend for all TSDBs, and it is also a key function that increases functional differences and improves core competency.


The previous section mainly analyzes KairosDB, the first TSDB built on Cassandra, so I’ll continue to analyze other TSDBs built on Cassandra.

BlueFlood is also a TSDB built on Cassandra. From this PPT, you can see that there are three main core components in the overall architecture:

  • Ingest module: To process data writing.

Compared with KairosDB, its data model is slightly different from that of other TSDBs, mainly in:

  • The tenant dimension is introduced: This is an innovation. If a service-oriented TSDB is to be built, the tenant dimension is essential.

Due to deficiencies in the BlueFlood model, tag query optimization doesn’t need to be considered. Instead, all efforts are devoted to the optimization of other features, such as auto-rollup. It is much better than KairosDB and OpenTSDB in terms of auto-rollup support. The auto-rollup features are:

  • Only fixed Intervals are supported: 5 min, 20 min, 60 min, 4 hour, and 1 day.

From its 2014 introduction PPT, we can see several function points regarding its future planning:

  • ElasticSearch indexer and discovery: Currently, this function has been implemented, but only metric indexes are supported. After tags are introduced in the future, it may also be used for tag indexes.

In summary, if you do not need a support for tags and have strong demand for rollups, BlueFlood is a better choice than KairosDB. On the contrary, KairosDB should be chosen.


The third Cassandra-based TSDB to be introduced is Heroic, which ranks 19th on the DB-Engines. Although it lags behind BlueFlood and KairosDB, I think its design and implementation are the best. For more information about its origins, see this article or this PPT, both of which introduce valuable experiences and lessons.

Spotify chose KairosDB to replace the bottom layer of the old monitoring system in TSDBs, such as OpenTSDB, InfluxDB, and KairosDB, before deciding to develop Heroic. However, problems with queries in KairosDB soon became apparent. The main problem is that KairosDB has no index on metrics and tags, so queries became very slow after the significance of metrics and tags reach a certain number of levels. Therefore, the biggest motivation for Spotify to develop Heroic is to solve the query problems in KairosDB. They adopted a solution of using ElasticSearch as an index to optimize the query engine, while the solution for data writing and data table are completely consistent with KairosDB.

Its features are briefly summarized as follows:

  1. A complete data model, which fully complies with metric2.0 specifications.

If you need a TSDB to support a complete data model and want to obtain efficient index queries, then Heroic is the choice.


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.