First Experience with PostgreSQL 13 Beta 1
By Zhuodao
Background
Starting from PostgreSQL 10, the community has been able to release a major revision every year, demonstrating their amazing vitality and creativity.
On May 21, 2020, the PostgreSQL Global Development Group announced that the first beta release of PostgreSQL 13 was available for download. This release contains previews of all features that will be available in the final release of PostgreSQL 13, though some details of the release could change before then. Let’s look at the new features and changes found in PostgreSQL 13.
New Features in PostgreSQL 13
In the community’s published document, we can see that the new features of PostgreSQL 13 are divided into the following sections:
- Functionality
- Administration
- Security
- Other highlights
We will introduce these sections and carry out realistic tests to look at specific effects.
Functionality
There are many new features in PostgreSQL 13 that help improve the overall performance of PostgreSQL while making it easier to develop applications.
Deduplication for B-Tree Indexes
If table columns are not unique, many identical values may exist. The corresponding B-tree indexes may also have many duplicated index entries. In PostgreSQL 13, B-tree indexes refer to GIN indexes to link up ctids of the rows to which the same key points. This enhancement not only shrinks the index size, but it also reduces many unnecessary splits and improves the lookup speed. This enhancement is called deduplication for B-tree indexes. In addition, the feature is conducted asynchronously. This operation is performed only when a B-tree index needs to be split. This reduces the overhead of this feature.
The following section describes how to use deduplication for B-tree indexes:
As you can see, you must set the deduplicate_items storage parameter to “on” when you create a B-tree index to enable deduplication. In PostgreSQL 13, this storage parameter is on by default.
To test the effect of this feature, we compared PostgreSQL 12 and PostgreSQL 13.
PostgreSQL 12:
PostgreSQL 13:
Based on the images above, we can see the following aspects:
- In PostgreSQL 13, the size of a B-tree index that has the same data type as a B-tree index in PostgreSQL 12 and has no repeated values is the same as that in PostgreSQL 12.
- In PostgreSQL 13, the size of a B-tree index that has the same data type as a B-tree index in PostgreSQL 12 and has many repeated values is smaller than that in PostgreSQL 12.
- Compared with PostgreSQL 12, PostgreSQL 13 requires reading fewer index pages and is more efficient if an index has a large number of repeated values and the execution plan uses index only scan.
In addition to data-level duplication, deduplication for B-tree indexes in PostgreSQL 13 is also effective to deal with different snapshots of repeated keys in a B-tree index due to the MVCC implementation. However, deduplication cannot be used in the following cases:
- text, varchar, and char cannot use deduplication when a nondeterministic collation is used. Case and accent differences must be preserved among equal datums.
- numeric cannot use deduplication. Numeric display scale must be preserved among equal datums.
- jsonb cannot use deduplication because the jsonb B-Tree operator class uses numeric internally.
- float4 and float8 cannot use deduplication These types have distinct representations for -0 and 0, which are nevertheless considered equal. This difference must be preserved.
Overall, these data types are unsupported primarily because extra conditions are required to determine whether keys of indexes are the same, in addition to values.
B-tree indexes are a default index type in PostgreSQL. In the community, several major versions of PostgreSQL have continuously optimized its occupied space and execution efficiency. A certain degree of deduplication was implemented to B-tree indexes in PostgreSQL 12.
Incremental Sorting
PostgreSQL 13 adds incremental sorting. This improvement derives from a native algorithm. When a group of data sets (X,Y) are sorted as X and Y columns, if the current data set has been sorted by the X column, specific sorting is shown in the following figure:
(1, 5)
(1, 2)
(2, 9)
(2, 1)
(2, 5)
(3, 3)
(3, 7)
At this time, you only need to group the data sets by the X column and sort Y columns in each group to obtain a result set based on sorting by X and Y columns, as shown in the following figure:
(1, 5) (1, 2)
(2, 9) (2, 1) (2, 5)
(3, 3) (3, 7)
=====================
(1, 2)
(1, 5)
(2, 1)
(2, 5)
(2, 9)
(3, 3)
(3, 7)
The advantages of this algorithm are obvious, especially for large data sets, because it can reduce the volume of data sorted each time. You can use certain policy control to make the data volume in sorting better fit the current work_mem every time. Moreover, in PostgreSQL’s waterfall model executor, we can obtain some result sets without sorting all data, which is very suitable for Top-N queries with the Limit keyword.
Of course, the optimizer in a database is much more complex than the preceding scenario. If each group is large and a small number of groups exist, the cost of incremental sorting is relatively high. If each group is small and a large number of groups exist, it takes less time and energy to use the previously sorted results based on the incremental sorting feature. To neutralize the effects of the two cases, two patterns are used in PostgreSQL 13:
- Fetching a relatively safe number of rows does not require checking previous sort keys for full sorting. The security here is based on some cost considerations.
- All rows are fetched, grouped, and sorted based on previous sort keys.
In PostgreSQL, the optimizer prefers pattern 1 and uses pattern 2 heuristically.
The following part describes how to use incremental sorting and compares a query plan where this feature is enabled and a query plan where this feature is disabled.
postgres=# create table t (a int, b int, c int);
CREATE TABLE
postgres=# insert into t select mod(i,10),mod(i,10),i from generate_series(1,10000) s(i);
INSERT 0 10000
postgres=# create index on t (a);
CREATE INDEX
postgres=# analyze t;
ANALYZE
postgres=# set enable_incrementalsort = off;
SET
postgres=# explain analyze select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=231.50..231.50 rows=1 width=16) (actual time=2.814..2.815 rows=1 loops=1)
-> Sort (cost=231.50..231.75 rows=100 width=16) (actual time=2.813..2.813 rows=1 loops=1)
Sort Key: a, b, (sum(c))
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=230.00..231.00 rows=100 width=16) (actual time=2.801..2.804 rows=10 loops=1)
Group Key: a, b
Peak Memory Usage: 37 kB
-> Seq Scan on t (cost=0.00..155.00 rows=10000 width=12) (actual time=0.012..0.951 rows=10000 loops=1)
Planning Time: 0.169 ms
Execution Time: 2.858 ms
(10 rows)postgres=# set enable_incrementalsort = on;
SET
postgres=# explain analyze select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=133.63..146.52 rows=1 width=16) (actual time=1.177..1.177 rows=1 loops=1)
-> Incremental Sort (cost=133.63..1422.16 rows=100 width=16) (actual time=1.176..1.176 rows=1 loops=1)
Sort Key: a, b, (sum(c))
Presorted Key: a, b
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
-> GroupAggregate (cost=120.65..1417.66 rows=100 width=16) (actual time=0.746..1.158 rows=2 loops=1)
Group Key: a, b
-> Incremental Sort (cost=120.65..1341.66 rows=10000 width=12) (actual time=0.329..0.944 rows=2001 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 3 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
Pre-sorted Groups: 3 Sort Method: quicksort Average Memory: 71kB Peak Memory: 71kB
-> Index Scan using t_a_idx on t (cost=0.29..412.65 rows=10000 width=12) (actual time=0.011..0.504 rows=3001 loops=1)
Planning Time: 0.164 ms
Execution Time: 1.205 ms
(15 rows)
Enhanced Partitioning
There are more improvements added to PostgreSQL’s partitioning functionality in this release, including an increased number of cases where a join directly between partitioned tables can occur, which can improve overall query execution time. Partitioned tables now support BEFORE row-level triggers, and a partitioned table can now be fully replicated through logical replication without having to publish individual partitions.
Others
- Queries with OR clauses or IN/ANY constant lists can use extended statistics (created by using CREATE STATISTICS), which can lead to better planning and performance gains.
- PostgreSQL 13 allows hash aggregation to use disk storage (enable_hashagg_disk=on) for large aggregation result sets. Previously, hash aggregation was avoided if it was expected to use more than work_mem memory.
- There is also the addition of the .datetime() function for jsonpath queries, which will automatically convert a date-like or time-like string to the appropriate PostgreSQL date or time data type.
- It is also easier now to generate random UUIDs because the gen_random_uuid() function can be used without having to enable any extensions.
Administration
Parallel VACUUM
One of the most anticipated features of PostgreSQL 13 is the ability for the VACUUM command to process indexes in parallel. This feature was unavailable in earlier versions. Therefore, VACUUM processing time was very long for large tables. PostgreSQL 13 supports parallel VACUUM for indexes, but there are many restrictions.
- Currently, this feature only applies to indexes. Each index can be allocated with a vacuum worker.
- For FULL vacuum, parallel processing is not supported.
- The parallel option is valid only for tables with at least two indexes.
We compared PostgreSQL 13 and PostgreSQL 12 in terms of VACUUM. The results are below:
=================================PG 13 parallel vacuum===============================
postgres=# create table testva(id int,info text);
CREATE TABLE
Time: 2.334 ms
postgres=# insert into testva select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
Time: 1448.098 ms (00:01.448)
postgres=# create index idx_testva on testva(id);
CREATE INDEX
Time: 364.988 ms
postgres=# create index idx_testva_info on testva(info);
CREATE INDEX
Time: 873.416 ms
postgres=# vacuum (parallel 4) testva;
VACUUM
Time: 114.846 ms
=================================PG 12 normal vacuum===============================
postgres=# create table testva(id int,info text);
CREATE TABLE
Time: 5.817 ms
postgres=# insert into testva select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
Time: 3023.958 ms (00:03.024)
postgres=# create index idx_testva on testva(id);
CREATE INDEX
Time: 631.632 ms
postgres=# create index idx_testva_info on testva(info);
CREATE INDEX
Time: 1374.849 ms (00:01.375)
postgres=# vacuum testva;
VACUUM
Time: 216.944 ms
According to this comparison, the VACUUM speed of PostgreSQL 13 is much higher than PostgreSQL 12. However, there is still room for improving parallel VACUUM. The good news is that the community is actively discussing block-level parallel VACUUM in emails. For more information, visit the link.
Others
- The reindexdb command has also added parallelism with the new — jobs flag, which lets you specify the number of concurrent sessions to use when you reindex a database.
- PostgreSQL 13 introduces the concept of a “trusted extension”, which allows a superuser to specify extensions that a user can install in their database so long as they have a CREATE privilege.
- This release includes more ways to monitor activities within a PostgreSQL database. PostgreSQL 13 can now track WAL usage statistics, the progress of streaming-based backups, and the progress of an ANALYZE command.
- pg_basebackup can also generate a manifest that can be used to verify the integrity of a backup by using a new tool called pg_verifybackup.
- It is also now possible to limit the amount of WAL space reserved by replication slots and create
- temporary streaming replication slots for standby PostgreSQL instances.
- A new flag for pg_dump, — include-foreign-data, includes data from servers referenced by foreign data wrappers in the dump output.
- The pg_rewind command also has improvements in PostgreSQL 13. In addition to running this command to automatically perform crash recovery, you can now use it to configure standby PostgreSQL instances by using the — write-recovery-conf flag. pg_rewind can also use the restore_command of the target instance to fetch needed write-ahead logs.
These features greatly improve PostgreSQL database administration, especially the pg_rewind, which is not detailed here. In another article, we will describe how this feature improves functionality.
Security
PostgreSQL continues to improve on its security capabilities in this latest release, introducing several features to help further deploy PostgreSQL safely.
- libpq, the connection library that powers psql and many PostgreSQL connection drivers, includes several new parameters to help secure connections. PostgreSQL 13 introduces the channel_binding connection parameters, which lets a client specify that they want to require the channel binding functionality as a part of SCRAM. Additionally, a client that is using a password-protected TLS certificate can now specify its password by using the sslpassword parameter. PostgreSQL 13 also adds support for DER encoded certificates.
- The PostgreSQL foreign data wrapper (postgres_fdw) also received several enhancements to how it can secure connections, including the ability to use certificate-based authentication to connect to other PostgreSQL clusters. Additionally, unprivileged accounts can now connect to another PostgreSQL database through the postgres_fdw without using a password.
Other Highlights
- PostgreSQL 13 continues to improve operability on Windows. Users that run PostgreSQL on Windows now have the option to connect over UNIX domain sockets.
- The PostgreSQL 13 documentation adds a glossary of terms to help people familiarize themselves with both PostgreSQL and general database concepts. This coincides with a significant rework in the display of functions and operators in tables, which helps to improve readability both on the web and in the PDF documentation.
- The pgbench utility, used for performance testing, now supports the ability to partition its “accounts” table, making it easier to benchmark workloads that contain partitions.
- The psql tool now includes the warn command that is similar to the echo command in terms of outputting data, except warn sends it to stderr. In case you need additional guidance on the PostgreSQL commands, the — help flag now includes a link to https://www.postgresql.org.
For more information about other features of PostgreSQL 13, visit the link.
Summary
Although PostgreSQL 13 did not introduce the planned TDE and zheap features, it still has many eye-catching features, including deduplication for B-tree indexes, parallel VACUUM, and disk usage by hash aggregation. Moreover, you can use a new tool called pg_verifybackup to verify the integrity of a backup and use pg_rewind to configure standby PostgreSQL instances. pg_rewind can also use the restore_command of the target instance to fetch needed write-ahead logs. If you are interested, you can download the source code, compile it, analyze the implementation of the features you are interested in. You may get better ideas.