Table Store Time Series Data Storage — Architecture


What Is Time Series Data?

Time Series Data Model

  • Individual or group (WHO): Describes the subject that produces the data. This subject can be a person, a monitoring metric, or an object. It generally describes that an individual has multi-dimensional attributes, and a certain unique ID can be used to locate an individual. For example, using a person’s ID to locate a person, and using a device ID to locate a device. Individuals can also be located through multi-dimensional attributes. For example, by using clusters, machine IDs, and process names to locate a process.
  • Time (WHEN): Time is the most important feature of time series data and is a key attribute that distinguishes it from other data.
  • Location (WHERE): A location is usually located using a two-dimensional coordinate of latitude and longitude; and by a three-dimensional coordinate of latitude, longitude and altitude in fields related to scientific computing, such as meteorology.
  • Status (WHAT): Used to describe the status of a specific individual at a certain moment. The time series data for monitoring is usually a numerical description status, while trace data uses an event-expressed status in which there are different expressions for different scenarios.
  • Metric: Used to describe the monitoring metric.
  • Tags: Used to locate the monitored object, which is described using one or more tags.
  • Timestamp: The time point when the monitoring value is collected.
  • Value: The collected monitoring value, which is usually numeric.
  • Name: Defines the type of the data.
  • Tags: Describes the metadata of the individual.
  • Location: The spatio-temporal information of the data.
  • Timestamp: The timestamp when the data is generated.
  • Values: The value or status corresponding to the data. Multiple values or statuses can be provided, which do not necessarily have to be numeric.

Time Series Data Query, Computing, and Analysis

Time Series Data Processing Procedure

  • Data model: For the standard definition of time series data, the collected time series data must conform to the definition of the model, including all the characteristic attributes of the time series data.
  • Stream computing: Pre-aggregation, downsampling, and post-aggregation for the time series data.
  • Data storage: The storage system provides high-throughput, massive volume, and low-cost storage, and supports separation of cold/hot data, as well as efficient range query.
  • Metadata retrieval: Provides the storage and retrieval of timeline metadata in the order of tens of millions to hundreds of millions, and supports different retrieval methods (multidimensional filtering and location query).
  • Data analysis: Provides time series analysis and computing capabilities for time series data.

Open Source Time Series Databases

  • Data Storage: All the databases utilize distributed NoSQL (LSM engine) storage, including open source distributed databases, such as HBase and Cassandra, and cloud platforms such as BigTable, as well as self-developed storage engines.
  • Aggregation:Pre-aggregation is solely reliant on external stream computing engines, such as Storm or Spark Streaming. At the post-aggregation level, query for post-aggregation is an interactive process, so it is generally not reliant on the stream computing engine. Different time series databases provide a single-threaded simple method or a concurrent computing method. Automatic downsampling is also a post-aggregation process, but is a stream process instead of an interactive process. This computing is suitable for the stream computing engine, but is not implemented in this way.
  • Metadata storage and retrieval:The classic OpenTSDB does not have dedicated metadata storage, and does not support the retrieval of metadata. The metadata is obtained and queried by scanning the row keys of the data table. KairosDB uses a table for metadata storage in Cassandra, but retrieval efficiency is very low because the table needs to be scanned. The secondary development of Heroic was based on KairosDB. It uses Elasticsearch for metadata storage and indexing, and supports better metadata retrieval. InfluxDB and Prometheus implement indexing independently, but indexing is not easy, and requires timeline metadata in the order of tens of millions to hundreds of millions. In an earlier version, InfluxDB implemented an in-memory metadata index which is more restrictive. For example, the scale of the timeline is limited by the size of the memory, and the in-memory index structure has to scan all timeline metadata, causing a longer node failover time.
  • Data analysis: Except for Elasticsearch, most TSDBs are not equipped with analysis capabilities, aside from query and analysis capabilities for post-aggregation. This is an important advantage that allows Elasticsearch to keep a foothold in the field of time series analysis.

Table Store Time Series Data Storage





Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Goal Propagation Explained: How to Determine the Influence of each Product Quality on Another

Data Analysis on FIFA international world cup data

All the Pandas read_csv() you should know to speed up your data analysis

What Is The Value Of Your Data?

Covid-19 and raw materials — The many shapes of instability

Tauson — Bouzkova Live Stream]~

Online live stream search engine

Feature Transformers: Hidden Gems

Trader’s log journal and performance indicators.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

How can I manage my Kafka Artefacts?

Community Star Series | 1 Don’t know how to use Apache DolphinScheduler?

Standardising Workflows & Crontabs with Airflow

Data Transfer in Edge