Optimizing Time Series Querying on Alibaba Cloud RDS for PostgreSQL

By Digoal

Data merging and data cleaning are required in many scenarios. Example scenarios:

1. For recorded table change details (insert, update, and delete), you must merge details and quickly obtain the latest value of each PK based on these details.
2. When we have a large number of sensors all continuously reporting data, we need to quickly obtain each sensor’s latest reading.

We can use window query for this kind of operation, but how can we make it faster and quickly retrieve batch data?

PostgreSQL is the best open-source database. It’s optimized in ways you probably haven’t even thought of yet.

Time Sequence Data Value Optimization

Here is a quick summary of the common methods of optimizing time sequence data querying:

1. Recursion is used when there are few unique values and an unknown range.
2. Use subquery when the number of unique values is relatively small and you know the specific range of the unique values.
3. Window query is more appropriate than the above method when there are many unique values.
4. However, stream computing is even better in the same scenarios.

This document will compare the first three methods.

Method 4, stream computing does not need to be compared because it is the most powerful. It is the best method in all scenarios. Wait for the pipelineDB plugin. Alibaba Cloud RDS PG 10 will then be integrated with the pipelineDB functionality.

Recursion vs Subquery vs Window

Take 5 million pieces of data as an example and compare these methods in the following situations.

1. Recursion

Scenario I. a Large Variety of Unique Values Are Available (1 Million Unique Values)

1. Create a table

2. Construct data

3. Create an index

4. Recursive query efficiency

Scenario 2: A Small Number of Unique Values (1,000 Unique Values)

1. Create a table

2. Construct data

3. Create an index

4. Recursive query efficiency

Scenario 1: A Large Number of Unique Values (1 Million Unique Values)

1. Subquery query efficiency

Subquery is inefficient if the value range of an ID is too broad.

A unique ID table needs to be maintained. Here we use generate_series as a replacement for testing.

Scenario 2: A Small Number of Unique Values (1,000 Unique Values)

1. Subquery efficiency

Query statement changes

2. Window Query

Scenario 1: A Large Number of Unique Values (1 Million Unique Values)

1. Window query efficiency

Scenario 2: A Small Number of Unique Values (1,000 Unique Values)

1. Window query efficiency

Efficiency Comparison Table

Data volumeNumber of Unique ValuesWindow Query (ms)Subquery (ms)Recursive Query (ms)5 million1 million6,4462,8926,7065 million1,0006,17679

Summary

With the rise of the Internet of Things, the world is generating increasing amounts of time sequenced data, and in situations where we have to provide services based on that data, calculating the newest values in the data and those in the sliding window is crucial.

PostgreSQL is the best choice in open-source databases as it provides several solutions to the same problems. It leaves you free to choose the most appropriate solution for you and your individual needs.

1. Use recursion when the number of unique values is relatively small and the range of the unique values is unknown.

2. Use subquery when the number of unique values is relatively small and the range of the unique values is determined. For example, if the total range is 1 million pieces of data, but only 500,000 pieces of data are included in this batch, then the performance is optimal if you have the IDs for these 500,000 entries. Otherwise you need to scan 1 million pieces of data. Another example is that there are a total of 100 million users, but an interval includes only tens of thousands of active users.

3. Window query is more appropriate if the number of unique values is relatively large.

4. Steaming computing is better than method 3 if the number of unique values is relatively large.

Reference:https://www.alibabacloud.com/blog/optimizing-time-series-querying-on-alibaba-cloud-rds-for-postgresql_594867?spm=a2c41.12952090.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store