A Comprehensive Analysis of Open-Source Time Series Databases (2)
By Zhaofeng Zhou (Muluo)
KairosDB was originally a branch that was forked from OpenTSDB version 1.x, with the goal of implementing secondary development based on the OpenTSDB code to meet new functional requirements. One of its refinements is to support pluggable storage engines. For example, support for H2 facilitates local development and testing, instead of being strongly coupled with HBase like OpenTSDB. In earlier versions, HBase was also its primary storage engine. However, in the subsequent storage optimizations, HBase was gradually replaced by Cassandra, which made it the first time series database developed based on Cassandra. In the latest versions, HBase is no longer supported, because some attributes unique to Cassandra but not available in HBase are used for storage optimization.
The overall architecture is similar to that of OpenTSDB, and both use a mature database as the bottom-layer storage engine. The main logic is only a thin logic layer above the storage engine layer. The deployment architecture of this logic layer is a stateless component that can be easily scaled horizontally.
In terms of functional differences, it performs secondary development on OpenTSDB 1.x, which is also to optimize some features of OpenTSDB or develop some features not available in OpenTSDB. I’ll outline some of the major functional differences I about:
- Pluggable storage engine: Earlier versions of OpenTSDB are strongly coupled with HBase. To pursue the ultimate performance, even an asynchronous HBase Client (which is now output as an independent open-source project: AsyncHBase) was developed. As a result, the entire code is written in asynchronous-driven mode, which not only increases the complexity of the code and reduces the readability, but also increases the difficulty of supporting multiple storage engines. KairosDB strictly defines the API interface of the storage layer. The overall logic and storage layer are less coupled, which makes it easier to scale multiple storage engines. The latest version of OpenTSDB can also support Cassandra and BigTable, but it can not be described as an architecture supporting pluggable storage engines in terms of the overall architecture.
- Support for values of various data types and custom types: OpenTSDB only supports numeric values, while KairosDB supports values of numeric and string types, as well as custom numeric types. In some scenarios, a metric value is not a simple value. For example, if you want to count the TopN at this time point, the corresponding metric value may be a set of string values. Scalable types make it easier to meet new requirements down the road. The first and second differences show that the first major refinement of KairosDB based on OpenTSDB is to make the functional model and code architecture of OpenTSDB more flexible.
- Support for auto-rollup: At present, most TSDBs are moving in the direction of supporting pre-aggregation and auto-rollup. OpenTSDB is one of the few TSDBs that do not support the function. In the latest version of OpenTSDB, even the storage of multi-precision data is not supported. However, the implementation method for the auto-rollup function supported by KairosDB is still relatively primitive, which is explained in detail in the following sections.
- Different storage models: For TSDBs, storage is a “core in the core”. UID compaction optimization is applied on the OpenTSDB storage model to optimize the query and storage. For KairosDB, a different idea is adopted, which takes advantage of Cassandra’s wide tables. This is also the most important reason why HBase was replaced by Cassandra. This is explained in detail in the following sections.
The main design feature is the use of UID encoding. This greatly saves storage space, and many queries are optimized by using the HBase filter based on the fixed byte count attribute of UID encoding. However, UID encoding also has many defects. First, the mapping table of metric/tagkey/tagvalue to UID needs to be maintained. All data point writes and reads need to be converted through the mapping table. The mapping table is usually cached in the TSD or client, which increases additional memory consumption. Second, due to the UID encoding, the number of metric/tagkey/tagvalue has an upper limit, depending on the number of bytes used by UID, and conflicts may occur in UID allocation, which affects writing.
Essentially, the UID encoding optimization adopted by the OpenTSDB storage model mainly solves two problems:
- Optimize storage space: UID encoding solves the problem of redundant storage space caused by repeated storage of row keys.
- Optimize queries: Based on the fixed byte length attribute of the tagkey and tagvalue encoded by UID, the HBase FuzzyRowFilter is used for query optimization in specific scenarios.
To solve these two problems, KairosDB adopts a different method that doesn’t require UID encoding, and these problems are avoided. Let’s take a look at the storage model of KairosDB first. It is mainly composed of the following three tables:
- DataPoints: To store all original data points. Each data point is also composed of metric, tags, timestamp and value. The time span of a row of data in this table is three weeks. That is, all data points within three weeks are stored in the same row, while the time span of rows within OpenTSDB is only one hour. The composition of RowKey is similar to that of OpenTSDB. The structure is
<metric><timestamp><tagk1><tagv1><tagk2>tagv2>...<tagkn><tagvn>. The difference is that the metric, tagkey and tagvalue all store the original values, instead of UIDs.
- RowKeyIndex: This table stores the mapping of all row keys in the DataPoints table corresponding to all metrics. That is, all row keys written on the same metric are stored in the same row and sorted by time. The table is mainly used for queries. When filtering is performed by tagkey or tagvalue, all eligible row keys in the time period to be queried are filtered out from this table first, and then the data is queried in the DataPoints table.
- StringIndex: This table contains three rows of data, and each row stores all metrics, tag keys, and tag values respectively.
The KairosDB storage model takes advantage of Cassandra’s wide tables. In the bottom-layer file storage format of HBase, each column corresponds to a key value, and the key is the rowkey of the row. Therefore, each column in an HBase row stores the same rowkey repeatedly. This is the main reason why UID encoding can save a lot of storage space, and also the reason why the compaction policy (to merge all columns in a row into one column) can be adopted to further compact the storage space after the UID encoding. The bottom-layer file storage format of Cassandra is different from that of HBase. Each column in a row of Cassandra does not store the rowkey repeatedly, so UID encoding is not required. One of the optimization solutions to reduce storage space in Cassandra is to reduce the number of rows, which is why it stores three weeks of data instead of one hour of data per row. For more information about the reasons for these two solutions, see Hbase File Format and Cassandra File Format.
Using Cassandra’s wide tables, even without UID encoding, the storage space is not much worse than OpenTSDB with UID encoding. The following is the official explanation:
For one we do not use IDs for strings. The string data (metric names and tags) are written to row keys and the appropriate indexes. Because Cassandra has much wider rows there are far fewer keys written to the database. Not much space is saved by using id’s and by not using id’s we avoid having to use any kind of locks across the cluster.
As mentioned, Cassandra has wider rows. The default row size in OpenTSDB HBase is 1 hour. Cassandra is set to 3 weeks.
The query optimization method adopted is also different from that of OpenTSDB. The following is the entire process of querying within KairosDB:
- Find the row keys in all DataPoints tables based on the query criteria.
- If you have a custom plugin, you can obtain all row keys from the plugin. (The plugin allows you to scale and use an external indexing system to index the row keys. For example, using ElasticSearch)
- If you have no custom plugin, you can find all the row keys in the RowKeyIndex table based on the metric and time range. (You can narrow the query range based on the range of column names (metric+startTime, metric+endTime))
Find all data from the DataPoints table based on the row keys
Compared with OpenTSDB that scans directly on data tables to filter row keys, KairosDB can absolutely reduce the amount of data scanned by using index tables. In the case of limited tagkey and tagvalue combinations under the metric, the query efficiency is greatly improved. KairosDB also provides a QueryPlugin method, which can scale and use external components to index row keys. For example, ElasticSearch or other indexing systems can be used, because indexing is the best query solution after all. This is also the biggest improvement of Heroic over KairosDB.
The official KairosDB documents contain sections on how to configure the auto-rollup. But in the discussion group, the description of the auto-rollup is as follows:
First off Kairos does not do any aggregation on ingest. Ingest is direct to the storage on purpose — performance.
Kairos aggregation is done after the fact at query time. The rollups are queries that are ran and the results are saved back as a new metric. Right now the rollups are all configured on a per kairos node basis. We plan on changing this in the future.
Right now Kairos does not share any state with other Kairos nodes. They have very little state on the node (except for rollups).
As for consistency it is up to you on how much you want or how important the data is to you.
In summary, the auto-rollup solution provided by KairosDB is still relatively simple to implement. It is a configurable stand-alone component that can be started at a scheduled time, read out the written data, and then write the data again after aggregation. It is indeed very primitive, with low availability and performance.
However, it is better than nothing. Auto-rollup support is a trend for all TSDBs, and it is also a key function that increases functional differences and improves core competency.
The previous section mainly analyzes KairosDB, the first TSDB built on Cassandra, so I’ll continue to analyze other TSDBs built on Cassandra.
BlueFlood is also a TSDB built on Cassandra. From this PPT, you can see that there are three main core components in the overall architecture:
- Ingest module: To process data writing.
- Rollup module: To perform automatic pre-aggregation and precision reduction.
- Query module: To process data queries.
Compared with KairosDB, its data model is slightly different from that of other TSDBs, mainly in:
- The tenant dimension is introduced: This is an innovation. If a service-oriented TSDB is to be built, the tenant dimension is essential.
- Tags are not supported: this is quite surprising. Most TSDBs basically use Tags as an indispensable part of the model, BlueFlood, however, does not support tags on the model. This may be a trade-off because how tag query optimization has not been determined. In this case, BlueFlood simply does not support tags. Anyway, it is fully compatible to scale tags in the future. BlueFlood has currently used ElasticSearch to build metric indexes. I believe its future solution to build tag indexes should also be based on ElasticSearch. Tags are not introduced until this plan is fully supported.
Due to deficiencies in the BlueFlood model, tag query optimization doesn’t need to be considered. Instead, all efforts are devoted to the optimization of other features, such as auto-rollup. It is much better than KairosDB and OpenTSDB in terms of auto-rollup support. The auto-rollup features are:
- Only fixed Intervals are supported: 5 min, 20 min, 60 min, 4 hour, and 1 day.
- Distributed rollup service: Rollup tasks can be scheduled in a distributed manner, and rollup data is obtained through offline batch scanning.
From its 2014 introduction PPT, we can see several function points regarding its future planning:
- ElasticSearch indexer and discovery: Currently, this function has been implemented, but only metric indexes are supported. After tags are introduced in the future, it may also be used for tag indexes.
- Cloud files exporter for rollups: This method is more optimized for offline computation. Reading a large volume of historical data from rollups does not affect online services.
- Apache Kafka exporter for rollups: This method is more advanced than offline computation. Rollups can be done by StreamCompute, with higher real-time performance.
In summary, if you do not need a support for tags and have strong demand for rollups, BlueFlood is a better choice than KairosDB. On the contrary, KairosDB should be chosen.
The third Cassandra-based TSDB to be introduced is Heroic, which ranks 19th on the DB-Engines. Although it lags behind BlueFlood and KairosDB, I think its design and implementation are the best. For more information about its origins, see this article or this PPT, both of which introduce valuable experiences and lessons.
Spotify chose KairosDB to replace the bottom layer of the old monitoring system in TSDBs, such as OpenTSDB, InfluxDB, and KairosDB, before deciding to develop Heroic. However, problems with queries in KairosDB soon became apparent. The main problem is that KairosDB has no index on metrics and tags, so queries became very slow after the significance of metrics and tags reach a certain number of levels. Therefore, the biggest motivation for Spotify to develop Heroic is to solve the query problems in KairosDB. They adopted a solution of using ElasticSearch as an index to optimize the query engine, while the solution for data writing and data table are completely consistent with KairosDB.
Its features are briefly summarized as follows:
- A complete data model, which fully complies with metric2.0 specifications.
- The data storage model is consistent with that of KairosDB, and ElasticSearch is used to optimize the query engine. (This is the biggest existing problem of other TSDBs except InfluxDB, such as KairosDB, OpenTSDB, and BlueFlood, and is one of its core competencies)
- The auto-rollup function is not supported, which is one of its defects.
If you need a TSDB to support a complete data model and want to obtain efficient index queries, then Heroic is the choice.