Alibaba Cloud RDS PostgreSQL — Time Series Data Optimization

PostgreSQL is one of the most popular open-source databases. One advantage of PostgreSQL is that it can be optimized in many ways, such as data merging and data cleaning. Data merging and cleaning are necessary for several situations.

For example:

  1. When table details are updated (insert, update, delete), we need first to merge, and then obtain the newest value of each PK swiftly.
  2. When we have a large number of sensors which are continuously reporting data, we need to gather each sensor’s latest reading promptly.
  3. We can use window query for this kind of operation, but we need to quickly retrieve batch data.

In general, there are four approaches to optimizing time sequence data.

  1. We can use recursion when there are few unique values and an unknown range.
  2. We can use subquery when there are few unique values, and their range is already determined.
  3. Window query is more appropriate than the above methods when there are many unique values.
  4. Stream computing is the best for all scenarios.

This document will only compare the first three methods. Stream computing does not need to be compared because it is the most powerful method in all scenarios.

Recursion vs. Subquery vs. Window Query

In our comparison, we will use a database with 5 million unique values as our data and compare these methods in the following situations.

Recursion

Scenario I. A large variety of unique values are available (1 million unique values)

Step 1: Create a table

Step 2: Construct data

insert into test select ceil(random()*1000000), md5(random()::text), clock_timestamp() from generate_series(1,5000000);

Step 3. Create an index

create index idx_test_1 on test (id, crt_time desc);

Step 4: Recursive query efficiency

Scenario II. Few unique values are available (1,000 unique values)

Step 1: Create a table

Step 2: Construct data

insert into test select ceil(random()*1000), md5(random()::text), clock_timestamp() from generate_series(1,5000000);

Step 3: Create an index

Subquery

Scenario I. A large variety of unique values are available (1 million unique values)

Step 1: Subquery query efficiency

Subquery is inefficient if the value range of an ID is too broad.

A unique ID table needs to be maintained. Here we use generate_series as a replacement for testing.

Scenario II. Few unique values are available (1,000 unique values)

Step 1: Subquery query efficiency

Query statement changes

Window Query

Scenario I. A large variety of unique values are available (1 million unique values)

Step 1: Window query efficiency

Scenario II. Few unique values are available (1,000 unique values)

Step 1: Window query efficiency

Query statement stays unchanged

Comparison Diagram of Horizontal Efficiency

Conclusion

With the rise of the IoT, the world is generating increasing amounts of time-sequenced data, and in situations where we have to provide services based on that data, calculating the newest values in the data and those in the sliding window is crucial.

PostgreSQL is the best choice in open-source databases as it provides several solutions to the same problems. With reference to data optimization methods, we can conclude that:

  1. Recursion is suitable when there are few unique values but with an unknown range.
  2. Subquery is suitable when there are few unique values, and the range is determined.
  3. Window query is more appropriate than subquery when there are many unique values.
  4. Stream computing is the best option for all scenarios.

Visit the PostgreSQL official documentation to learn more.

Reference:

https://www.alibabacloud.com/blog/alibaba-cloud-rds-postgresql-%E2%80%93-time-series-data-optimization_593721?spm=a2c41.11633719.0.0

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store