First Experience with PostgreSQL 13 Beta 1

Image for post
Image for post

By Zhuodao

Background

Starting from PostgreSQL 10, the community has been able to release a major revision every year, demonstrating their amazing vitality and creativity.

Image for post
Image for post

On May 21, 2020, the PostgreSQL Global Development Group announced that the first beta release of PostgreSQL 13 was available for download. This release contains previews of all features that will be available in the final release of PostgreSQL 13, though some details of the release could change before then. Let’s look at the new features and changes found in PostgreSQL 13.

New Features in PostgreSQL 13

In the community’s published document, we can see that the new features of PostgreSQL 13 are divided into the following sections:

  • Functionality

We will introduce these sections and carry out realistic tests to look at specific effects.

Functionality

There are many new features in PostgreSQL 13 that help improve the overall performance of PostgreSQL while making it easier to develop applications.

Deduplication for B-Tree Indexes

If table columns are not unique, many identical values may exist. The corresponding B-tree indexes may also have many duplicated index entries. In PostgreSQL 13, B-tree indexes refer to GIN indexes to link up ctids of the rows to which the same key points. This enhancement not only shrinks the index size, but it also reduces many unnecessary splits and improves the lookup speed. This enhancement is called deduplication for B-tree indexes. In addition, the feature is conducted asynchronously. This operation is performed only when a B-tree index needs to be split. This reduces the overhead of this feature.

The following section describes how to use deduplication for B-tree indexes:

Image for post
Image for post

As you can see, you must set the deduplicate_items storage parameter to “on” when you create a B-tree index to enable deduplication. In PostgreSQL 13, this storage parameter is on by default.

To test the effect of this feature, we compared PostgreSQL 12 and PostgreSQL 13.

PostgreSQL 12:

Image for post
Image for post
Image for post
Image for post

PostgreSQL 13:

Image for post
Image for post
Image for post
Image for post

Based on the images above, we can see the following aspects:

  • In PostgreSQL 13, the size of a B-tree index that has the same data type as a B-tree index in PostgreSQL 12 and has no repeated values is the same as that in PostgreSQL 12.

In addition to data-level duplication, deduplication for B-tree indexes in PostgreSQL 13 is also effective to deal with different snapshots of repeated keys in a B-tree index due to the MVCC implementation. However, deduplication cannot be used in the following cases:

  • text, varchar, and char cannot use deduplication when a nondeterministic collation is used. Case and accent differences must be preserved among equal datums.

Overall, these data types are unsupported primarily because extra conditions are required to determine whether keys of indexes are the same, in addition to values.

B-tree indexes are a default index type in PostgreSQL. In the community, several major versions of PostgreSQL have continuously optimized its occupied space and execution efficiency. A certain degree of deduplication was implemented to B-tree indexes in PostgreSQL 12.

Incremental Sorting

PostgreSQL 13 adds incremental sorting. This improvement derives from a native algorithm. When a group of data sets (X,Y) are sorted as X and Y columns, if the current data set has been sorted by the X column, specific sorting is shown in the following figure:

(1, 5)
(1, 2)
(2, 9)
(2, 1)
(2, 5)
(3, 3)
(3, 7)

At this time, you only need to group the data sets by the X column and sort Y columns in each group to obtain a result set based on sorting by X and Y columns, as shown in the following figure:

(1, 5) (1, 2)
(2, 9) (2, 1) (2, 5)
(3, 3) (3, 7)
=====================
(1, 2)
(1, 5)
(2, 1)
(2, 5)
(2, 9)
(3, 3)
(3, 7)

The advantages of this algorithm are obvious, especially for large data sets, because it can reduce the volume of data sorted each time. You can use certain policy control to make the data volume in sorting better fit the current work_mem every time. Moreover, in PostgreSQL’s waterfall model executor, we can obtain some result sets without sorting all data, which is very suitable for Top-N queries with the Limit keyword.

Of course, the optimizer in a database is much more complex than the preceding scenario. If each group is large and a small number of groups exist, the cost of incremental sorting is relatively high. If each group is small and a large number of groups exist, it takes less time and energy to use the previously sorted results based on the incremental sorting feature. To neutralize the effects of the two cases, two patterns are used in PostgreSQL 13:

  1. Fetching a relatively safe number of rows does not require checking previous sort keys for full sorting. The security here is based on some cost considerations.

In PostgreSQL, the optimizer prefers pattern 1 and uses pattern 2 heuristically.

The following part describes how to use incremental sorting and compares a query plan where this feature is enabled and a query plan where this feature is disabled.

postgres=# create table t (a int, b int, c int);
CREATE TABLE
postgres=# insert into t select mod(i,10),mod(i,10),i from generate_series(1,10000) s(i);
INSERT 0 10000
postgres=# create index on t (a);
CREATE INDEX
postgres=# analyze t;
ANALYZE
postgres=# set enable_incrementalsort = off;
SET
postgres=# explain analyze select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=231.50..231.50 rows=1 width=16) (actual time=2.814..2.815 rows=1 loops=1)
-> Sort (cost=231.50..231.75 rows=100 width=16) (actual time=2.813..2.813 rows=1 loops=1)
Sort Key: a, b, (sum(c))
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=230.00..231.00 rows=100 width=16) (actual time=2.801..2.804 rows=10 loops=1)
Group Key: a, b
Peak Memory Usage: 37 kB
-> Seq Scan on t (cost=0.00..155.00 rows=10000 width=12) (actual time=0.012..0.951 rows=10000 loops=1)
Planning Time: 0.169 ms
Execution Time: 2.858 ms
(10 rows)
postgres=# set enable_incrementalsort = on;
SET
postgres=# explain analyze select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=133.63..146.52 rows=1 width=16) (actual time=1.177..1.177 rows=1 loops=1)
-> Incremental Sort (cost=133.63..1422.16 rows=100 width=16) (actual time=1.176..1.176 rows=1 loops=1)
Sort Key: a, b, (sum(c))
Presorted Key: a, b
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
-> GroupAggregate (cost=120.65..1417.66 rows=100 width=16) (actual time=0.746..1.158 rows=2 loops=1)
Group Key: a, b
-> Incremental Sort (cost=120.65..1341.66 rows=10000 width=12) (actual time=0.329..0.944 rows=2001 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 3 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
Pre-sorted Groups: 3 Sort Method: quicksort Average Memory: 71kB Peak Memory: 71kB
-> Index Scan using t_a_idx on t (cost=0.29..412.65 rows=10000 width=12) (actual time=0.011..0.504 rows=3001 loops=1)
Planning Time: 0.164 ms
Execution Time: 1.205 ms
(15 rows)

Enhanced Partitioning

There are more improvements added to PostgreSQL’s partitioning functionality in this release, including an increased number of cases where a join directly between partitioned tables can occur, which can improve overall query execution time. Partitioned tables now support BEFORE row-level triggers, and a partitioned table can now be fully replicated through logical replication without having to publish individual partitions.

Others

  • Queries with OR clauses or IN/ANY constant lists can use extended statistics (created by using CREATE STATISTICS), which can lead to better planning and performance gains.

Administration

Parallel VACUUM

One of the most anticipated features of PostgreSQL 13 is the ability for the VACUUM command to process indexes in parallel. This feature was unavailable in earlier versions. Therefore, VACUUM processing time was very long for large tables. PostgreSQL 13 supports parallel VACUUM for indexes, but there are many restrictions.

  • Currently, this feature only applies to indexes. Each index can be allocated with a vacuum worker.

We compared PostgreSQL 13 and PostgreSQL 12 in terms of VACUUM. The results are below:

=================================PG 13 parallel vacuum===============================
postgres=# create table testva(id int,info text);
CREATE TABLE
Time: 2.334 ms
postgres=# insert into testva select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
Time: 1448.098 ms (00:01.448)
postgres=# create index idx_testva on testva(id);
CREATE INDEX
Time: 364.988 ms
postgres=# create index idx_testva_info on testva(info);
CREATE INDEX
Time: 873.416 ms
postgres=# vacuum (parallel 4) testva;
VACUUM
Time: 114.846 ms
=================================PG 12 normal vacuum===============================
postgres=# create table testva(id int,info text);
CREATE TABLE
Time: 5.817 ms
postgres=# insert into testva select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
Time: 3023.958 ms (00:03.024)
postgres=# create index idx_testva on testva(id);
CREATE INDEX
Time: 631.632 ms
postgres=# create index idx_testva_info on testva(info);
CREATE INDEX
Time: 1374.849 ms (00:01.375)
postgres=# vacuum testva;
VACUUM
Time: 216.944 ms

According to this comparison, the VACUUM speed of PostgreSQL 13 is much higher than PostgreSQL 12. However, there is still room for improving parallel VACUUM. The good news is that the community is actively discussing block-level parallel VACUUM in emails. For more information, visit the link.

Others

  • The reindexdb command has also added parallelism with the new — jobs flag, which lets you specify the number of concurrent sessions to use when you reindex a database.

These features greatly improve PostgreSQL database administration, especially the pg_rewind, which is not detailed here. In another article, we will describe how this feature improves functionality.

Security

PostgreSQL continues to improve on its security capabilities in this latest release, introducing several features to help further deploy PostgreSQL safely.

  • libpq, the connection library that powers psql and many PostgreSQL connection drivers, includes several new parameters to help secure connections. PostgreSQL 13 introduces the channel_binding connection parameters, which lets a client specify that they want to require the channel binding functionality as a part of SCRAM. Additionally, a client that is using a password-protected TLS certificate can now specify its password by using the sslpassword parameter. PostgreSQL 13 also adds support for DER encoded certificates.

Other Highlights

  • PostgreSQL 13 continues to improve operability on Windows. Users that run PostgreSQL on Windows now have the option to connect over UNIX domain sockets.

For more information about other features of PostgreSQL 13, visit the link.

Summary

Although PostgreSQL 13 did not introduce the planned TDE and zheap features, it still has many eye-catching features, including deduplication for B-tree indexes, parallel VACUUM, and disk usage by hash aggregation. Moreover, you can use a new tool called pg_verifybackup to verify the integrity of a backup and use pg_rewind to configure standby PostgreSQL instances. pg_rewind can also use the restore_command of the target instance to fetch needed write-ahead logs. If you are interested, you can download the source code, compile it, analyze the implementation of the features you are interested in. You may get better ideas.

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store