Table Store Time Series Data Storage Architecture

What Is Time Series Data?

Time Series Data Model

  1. Individual or group (WHO): Describes the subject that produces the data, which can be a person, a monitoring metric, or an object. It generally describes that an individual has multi-dimensional attributes, and a certain unique ID can be used to locate an individual, for example, using a person ID to locate a person, and using a device ID to locate a device. Individuals can also be located through multi-dimensional attributes, for example, using cluster, machine ID, and process name to locate a process.
  2. Time (WHEN): Time is the most important feature of time series data and is a key attribute that distinguishes it from other data.
  3. Location (WHERE): A location is usually located by a two-dimensional coordinate of latitude and longitude, and by a three-dimensional coordinate of latitude, longitude and altitude, in the field of scientific computing such as meteorology.
  4. Status (WHAT): Used to describe the status of a specific individual at a certain moment. The time series data for monitoring is usually a numerical description status, and the trace data is an event-expressed status, in which there are different expressions for different scenarios.
  1. Metric: Used for describing the monitoring metric.
  2. Tags: Used for locating the monitored object, which is described by one or more tags.
  3. Timestamp: The time point when the monitoring value is collected.
  4. Value: The collected monitoring value, which is usually numeric.
  1. Name: Defines the type of the data.
  2. Tags: Describes the metadata of the individual.
  3. Location: The location of the data.
  4. Timestamp: The time stamp when the data is generated.
  5. Values: The value or status corresponding to the data. Multiple values or statuses can be provided, which are not necessarily numeric.

Time Series Data Query, Computing, and Analysis

Time Series Data Processing Procedure

  1. Data model: For the standard definition of time series data, the collected time series data must conform to the definition of the model, including all the characteristic attributes of the time series data.
  2. Stream computing: Pre-aggregation, downsample, and post-aggregation for the time series data.
  3. Data storage: The storage system provides high-throughput, massive, and low-cost storage, supporting separation of cold/hot data, and efficient range query.
  4. Metadata retrieval: Provides the storage and retrieval of time series metadata in the order of 10 million to 100 million, and supports different retrieval methods (multidimensional filtering and location query).
  5. Data analysis: Provides time series analysis and computing capabilities for time series data.

Open Source Time Series Databases

  1. Data Storage: All the databases utilize distributed NoSQL (LSM engine) storage, including distributed databases such as HBase and Cassandra, and cloud products such as BigTable, and self-developed storage engines.
  2. Aggregation: Pre-aggregation can only rely on external stream computing engines, such as Storm or Spark Streaming. At the post-aggregation level, query for post-aggregation is an interactive process, so it generally does not rely on the stream computing engine. Different time series databases provide a single-threaded simple method or a concurrent computing method. Automatic downsample is also a post-aggregation process, but is but a stream process instead of an interactive process. This computing is suitable for the stream computing engine, but is not implemented in this way.
  3. Metadata storage and retrieval: The classic OpenTSDB does not have a dedicated metadata storage, and does not support the retrieval of metadata. The metadata is retrieved and queried by scanning the row keys of the data table. KairosDB uses a table for metadata storage in Cassandra, but retrieval efficiency is very low, because the table needs to be scanned. Heroic was developed based on KairosDB. It uses Elasticsearch for metadata storage and indexing, and supports better metadata retrieval. InfluxDB and Prometheus implement indexing independently, but indexing is not easy, and requires time series metadata in the order of 10 million to 100 million. InfluxDB implemented a memory-based metadata indexing in an earlier version, which has a number of restrictions, for example, the size of the memory will limit the size of the time series, and the construction of the memory index needs to scan all time series metadata, causing a long failover time for the node.
  4. Data analysis: Most TSDBs do not naturally have analysis capabilities other than query and analysis capabilities for post-aggregation, except for Elasticsearch, this is an important advantage for it to keep a foothold in the time series field.

Table Store Time Series Data storage

Summary

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Granger-Causality Should be Renamed

A short introduction to distance measures in Machine Learning

페이지 랭크 알고리즘(Page Rank Algorithm)

$1M spent on a predictive model/data science w/ $0 value and no user engagement?

How machines understand our language: an introduction to Natural Language Processing

Why ETL Needs Open Source to Address the Long Tail of Integrations

Data Science Effect on Professional Sports

How big MNC’s like Facebook stores, manages and manipulate Thousands of Terabytes of data?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Community Star Series | 1 Don’t know how to use Apache DolphinScheduler?

CITIC Industrial Cloud — Apache ShardingSphere Enterprise Applications

Architecture of object-based storage and S3 standard specifications

Apache Hadoop’s Core: HDFS and MapReduce — Brief Summary