PostgreSQL Time-Series Data Case: Automatic Compression over Time

Background

One of the most important features of the time-series database is compression over time. For example, the data from the last day is compressed into a point of 5 minutes, and the data from the last week is compressed into a point of 30 minutes.

The PostgreSQL compression algorithm can be customized. For example, simple mean compression, maximum compression, and minimum compression; or compression based on the revolving door compression algorithm.

[[To be continued] SQL Streaming Case — Revolving Door Compression (Pre-Calculation and Post-Calculation Related Sliding Window Processing Example)](https://github.com/digoal/blog/blob/master/201712/20171205_01.md)

Implementation of the Revolving Door Data Compression Algorithm in PostgreSQL — Application of Streaming Compression in the IoT, Monitoring, and Sensor Scenarios

This article introduces a simple compression scenario, which is similar to an RRD database compressed into average, maximum, minimum, sum, number of records, and other dimensions according to the time dimension.

It also introduces advanced SQL usages such as window query, year-on-year comparison and period-over-period comparison UDF (including KNN calculation), and write uniformly by time.

Design

Detail Table

Compression Table

1. 5-minute compression table

2. 30-minute compression table

3. 5-minute compression statement

4. 30-minute compression statement

DEMO

1. 100 million pieces of detailed test data are written and distributed over 10 days.

2. 5-minute compression scheduling. For data from the last day, the following SQL is scheduled every 1 hour.

3. 30-minute compression scheduling. For data from the last week, the following SQL is scheduled every 1 hour.

Summary

1. Group time by the interval and use integer division + multiplication.

Example:

5 minutes:

30 minutes:

2. Generate evenly distributed time-series data. The write time can be evenly distributed to the corresponding interval by using the interval and generate_series of PG.

3. One of the most important features of the time-series database is compression over time. For example, the data from the last day is compressed into a point of 5 minutes, and the data from the last week is compressed into a point of 30 minutes.

The PostgreSQL compression algorithm can be customized. For example, simple mean compression, maximum compression and minimum compression, or compression based on the revolving door compression algorithm.

This article introduces a simple compression scenario, which is similar to an RRD database compressed into average, maximum, minimum, sum, number of records, and other dimensions according to the time dimension.

Add scheduling:

PostgreSQL Scheduled Task Method 2

PostgreSQL Oracle Compatibility- DBMS_JOBS — Daily Maintenance — Timing Tasks (pgagent)

4. After compression, it contains the values of interval, maximum, minimum, average, and points. These values can be used to draw graphics.

5. Combined with the window function of PG, it is easy to draw year-on-year and period-over-period graphs. SQL examples are as follows:

Index and acceleration

Compound type, returning the value of the period-over-period comparison

Obtain the period-over-period value function that returns a record of the specified SID and HID near a certain point in time, including the KNN algorithm

Year-on-year comparison, week-over-week comparison, and month-over-month comparison (these values can also be generated automatically to avoid calculation at each query):

6. Combined with the linear regression of PG, predictive indicators can be drawn. The following examples describe this in detail:

PostgreSQL Multiple Linear Regression — Stock Forecast 2

Using Linear Regression Analysis in PostgreSQL to Make Predictions — Example 2 to Predict the Closing Price of a Stock in the Next Few Days

PostgreSQL Linear Regression — Stock Price Forecast 1

Using Linear Regression Analysis in PostgreSQL — Implementing Data Prediction

7. The compression table is inherited to the detail table, to facilitate the development. This way, you no longer need to write the UNION SQL, just look up the detail table and get all the data (including compressed data).

Related cases

Time-Out Streaming — No Data Exception Monitoring for Incoming Messages

Alibaba Cloud RDS PostgreSQL Varbitx Practice — Stream Tags (Async Batch Consume with Atomicity Stream/Batch Computing) — Tagging People with Any Tags (up to Trillions) in Milliseconds

PostgreSQL Streaming Statistics — “insert on conflict” Implements Streaming unique visitor (distinct), Min, Max, Avg, Sum, Count

HTAP Database PostgreSQL Scenarios and Performance Test — №32 (OLTP) High Throughput Data Input/Output (Storage, Row Scanning, No Indexes) — Async Batch Consume with Atomicity (JSON + Functional Stream Computing)

HTAP Database PostgreSQL Scenarios and Performance Tests — №31 (OLTP) High Throughput Data Input/Output (Storage, Row Scanning, No Indexes) — Async Batch Consume with Atomicity (Paralleled Testing for High Throughput Read/Write)

HTAP Database PostgreSQL Scenarios and Performance Tests — №27 (OLTP) IoT — FEED Logs, Stream Computing, and Async Batch Consume with Atomicity (CTE)

PostgreSQL-Based Streaming PipelineDB can Achieve 10 million/s Real-Time Statistics

Reference:https://www.alibabacloud.com/blog/postgresql-time-series-data-case-automatic-compression-over-time_594813?spm=a2c41.12889636.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.