How PostgreSQL UDF Can Accelerate the Performance of Compound Queries of Single- and Multi-value Fields

Background

In applications, especially for businesses that use PostgreSQL multi-value columns (arrays, full-text search, and JSON), a single-value column also needs to be queried in addition to multi-value columns.

Therefore, in cases, when both types of queries coexist, the database selects a single column or multiple columns to compose the index based on the cost. However, it cannot reach the best performance but only simplifies the user’s index design. Let’s consider the following example to understand this better.

Since gid has 100 values and c1 has 10 values (the value range is less than 1,000), the user may query by gid or c1, or by combining both. When user queries by combining both fields, the current method is not efficient. This includes btree_gin.

This leads to a critical question, how to accelerate the query?

Demo

Let’s have a quick look at the following demo to understand different acceleration methods:

Step 1. Create a function to generate a random number.

Step 2. Now, create a test table that contains a single-value column and a multi-value column.

Step 3. Write 10 millions pieces of data.

Traditional Acceleration Method 1

Now, implementing the traditional acceleration method, create a compound index of single-value columns and multi-value columns.

Determine the performance of this compound query as shown below.

In case, you do not seem to have the motivation to optimize further, you will observe that this performance appears to be quite OK.

Multiple index conditions are also used. In fact, this index is a bitmap scan performed after the two indexes are combined internally.

Traditional Acceleration Method 2

The second method lets you create a single-value column index and a multi-value column index, separately.

The actual effect is worse than that of GIN compound indexes.

Acceleration Method that Improves the Performance by Over 100 Times

This article focuses on the method that is applicable for scenarios where the compound queries of single-value columns and multi-value columns are used. In fact, we use the UDF and indexing expressions functions of PostgreSQL.

UDF combines a single-value column and a multi-value column into a new multi-value column. Indexes on expressions are created for this UDF with the purpose to remove the internal BITMAP combination and use only one inverted tree. This inverted tree contains the values of the single-value and multi-value columns. Follow the steps listed below to implement this acceleration method:

Step 1. Create a UDF and combine the gid and c1 values as shown in the following example.

Step 2. Create indexes on expressions.

Step 3. An expression query is used during the query. Therefore, you need to change the SQL statement as shown below.

The query is as follows.

As a result, performance improves significantly.

Acceleration Method for a Compound Query of a Single-Value Column and Full-Text Search

Similarly, combine a single-value column with a full-text search column to accelerate performance.

The preceding example shows how performance significantly improves in this scenario as well.

Conclusion

Following points sum up the highlights of this article:

  • For a partition table, each partition defines the corresponding index. However, there can be many partitions in a case when a single-value type contains a large number of values, which may not be a good thing.
  • For partition indexes, currently, PostgreSQL does not support creating a compound index with multiple trees for a single table. One tree is built with a single-value column, and the value points to another tree. Another tree is a GIN inverted tree built with multi-value columns.
  • As mentioned in this article, use UDF to combine a single-value column and a multi-value column into a new multi-value column. This new column contains the attributes of the single-value column. This achieves the same effect as that of partition tables or partition indexes. Performance is significantly improved.

When a multi-value column itself contains the attributes of a single-value column, you do not need to create a compound index for a single-value column and a multi-value column. Instead, you only need to create a multi-value column index.

On the other hand, when a multi-value column does not contain the attributes of a single-value column, you can use UDF if you have a compound query of single-value columns and multi-value columns. This combines a single-value column and a multi-value column into a new multi-value column and creates a GIN inverted index for this new multi-value column. This significantly improves performance. In this use case, the speed is increased by over 100 times.

Build your own PostgreSQL solution on Alibaba Cloud with ApsaraDB for RDS PostgreSQL.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.