A Comprehensive Analysis of Open-Source Time Series Databases (1)

By Zhaofeng Zhou (Muluo)

Popular Open-Source Time Series Databases

OpenTSDB

OpenTSDB is a distributed and scalable time series database, which supports a writing rate of up to millions of entries per second, and supports data storage with millisecond-level precision, and preserves data permanently without sacrificing precision. The superior writing performance and storage capability are due to the bottom layer depending on HBase. HBase uses the LSM tree storage engine and distributed architecture to provide superior writing capability, while superior storage capability results from the bottom layer depending on the fully horizontally scaled HDFS. OpenTSDB is deeply dependent on HBase, and many subtle optimizations have been made to it according to the attributes of the bottom-layer storage structure of HBase. In the latest version, the support for BigTable and Cassandra has also been scaled.

Archietecture

Data Model

OpenTSDB is modeled by indicators, and a data point contains the following components:

  • Timestamp: the unix timestamp in seconds or milliseconds, representing the specific time at that point in time.
  • Tag: one or more tags. Tags are used to describe the different dimensions of the subject. A tag consists of a tag key and a tag value. The tag key is the dimension, and the tag value is the value of the dimension.
  • Value: the value of the indicator. Currently, only values of the numerical type are supported.

Storage Model

The following is a brief summary of the key optimization ideas:

  • Optimize the number of key values: With a good understanding of the bottom-layer storage model of HBase, you’d know that each column in the row corresponds to a key value when stored, thus reducing the number of rows and columns can save significant storage space and improve query efficiency.
  • Optimize queries: The server side filter of HBase is used to optimize multi-dimensional queries, and the groupby and precision reduction queries are optimized by using pre-aggregation and rollup.
  • The tagvalue is ‘static’, and the corresponding uniqueID is ‘001’
  • The metric is ‘proc.loadavg. 1m’, and the corresponding uniqueID is ‘052’

DataTable

The second key table is a data table with the following structure:

<metric><timestamp><tagk1><tagv1><tagk2>tagv2>...<tagkn><tagvn>

Optimize Queries

HBase only provides simple query operations, including single-row queries and range queries. For a single row query, a complete rowkey must be provided. For a range query, the range of the rowkey must be provided, and all data within this range can be obtained by scanning. In general, the speed of a single-row query is very fast, while the speed of a range query depends on the size of the scanning range. It’s usually fine to scan tens of thousands of rows, but if you scan hundreds of millions of rows, the reading delay is much higher.

Summary

The advantage of OpenTSDB lies in the ability to write and store data, owed largely to the bottom layer’s dependency on HBase. The disadvantage lies in the lack of ability to query and analyze data. Although many optimizations have been made in queries, these optimizations are not applicable to all query scenarios. Some people may say that OpenTSDB is the worst among several time series databases to be compared this time in terms of the tagvalue filtering query optimization. The support for pre-aggregation and auto-rollup is not available on groupby and downsampling queries. However, the OpenTSDB API is the richest in terms of functionality, which makes the OpenTSDB API a benchmark.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.