Designs of a PostgreSQL Global ID Assignment (Data Dictionary) Service

Design Requirements

Functional Requirements

  1. Input strings, and output IDs

Performance Requirements

  1. Support read access of above 1 million QPS
  2. Support bidirectional batch query read operations, and the query latency for a batch of 100 queries is 1 ms
  3. Support write access of about 10,000 QPS
  4. Support batch write operations, and the write latency for a batch of 100 operations is 10 ms

System Requirements

  1. Stable and reliable
  2. Strict data consistency
  3. No data loss
  4. IDs increment from 0, and avoid gaps (gaps should be less than 1%)
  5. Automatic disaster recovery

Design 1: With Locally Unique Group ID

In this design, we have a group ID with one sequence for each group, where the sequence and text are unique within a single group.

  1. Design a UDF that automatically generates a sequence, which has a one-to-one relationship with the group ID, and returns the value of the sequence.
  • create or replace function get_per_gp_id( Text, -- Sequence name prefix int -- The group ID serves as the sequence name suffix ) returns int8 as $$ declare begin return nextval(($1||$2)::regclass); exception when others then execute 'create sequence if not exists '||$1||$2||' start with 0 minvalue 0' ; return nextval(($1||$2)::regclass); end; $$ language plpgsql strict;
  1. Create a test table
  • create table tbl1( gid int, -- Group ID ts text, -- Text sn int8, -- Auto-increment sequence value unique(gid,ts), unique(gid,sn) );
  1. Create a UDF. When the group ID and text are entered, if the text exists, the existing sequence is returned; if the text does not exist, a unique ID is assigned and returned.
  • create or replace function ins1( int, -- Group ID text -- Text ) returns int8 as $$ declare res int8; begin -- Checks whether the text already exists in this group select sn into res from tbl1 where gid=$1 and ts=$2; if found then return res; else -- If it does not exist, an ID is generated insert into tbl1 (gid,ts,sn) values ($1, $2, get_per_gp_id('seq_', $1)) returning sn into res; return res; end if; exception when others then -- An exception may be thrown when another parallel session is generating the sequence. Query again, and return the SN. select sn into res from tbl1 where gid=$1 and ts=$2; if found then return res; else raise ; end if; end; $$ language plpgsql strict;

Design 2: Without Group ID

In this design, we will not have a group ID, and the text and sequence are globally unique.

  1. Create a sequence
  • create sequence seq_tbl2_sn start with 0 minvalue 0;
  1. Create a test table
  • create table tbl2( ts text unique, -- Text sn int8 default nextval('public.seq_tbl2_sn'::regclass) unique -- Sequence );
  1. Create a UDF. When the text is entered, if the text already exists, the sequence corresponding to the text is returned; if the text does not exist, a unique sequence value is assigned and returned.
  • create or replace function ins2( text ) returns int8 as $$ declare res int8; begin -- Check whether the text already exists select sn into res from tbl2 where ts=$1; if found then return res; else -- If it does not exist, an ID is generated insert into tbl2 (ts) values ($1) returning sn into res; return res; end if; exception when others then -- An exception may be thrown when another parallel session is generating the sequence. Query again, and return the SN. select sn into res from tbl2 where ts=$1; if found then return res; else raise ; end if; end; $$ language plpgsql strict;

Design 3: With Globally Unique Group ID

The third approach uses a globally unique group ID. If the dictionary contains 4 billion values or less, use INT4. If the dictionary exceeds 4 billion values, we need to use INT8.

  1. Create a sequence, and set the starting value to the minimum value of INT4
  • create sequence seq_tbl_dict minvalue -2147483648 start with -2147483648;
  1. Create a test table
  • create table tbl_dict( gid int2, -- Group ID ts text, -- Text sn int4 default nextval('public.seq_tbl_dict'::regclass), -- Sequence unique (gid,ts), unique (sn) );
  1. Create a UDF. When the text is entered, if the text already exists, the sequence corresponding to the text is returned; if the text does not exist, a unique sequence value is assigned and returned.
  • create or replace function get_sn(int2, text) returns int as $$ declare res int; begin -- Optimistic query select sn into res from tbl_dict where gid=$1 and ts=$2; if found then return res; end if; -- Inserts one if not found insert into tbl_dict values($1,$2,nextval('public.seq_tbl_dict'::regclass)) on conflict (gid,ts) do nothing returning sn into res; if found then return res; -- In the case of insertion conflicts, it continues the query and returns sn else select sn into res from tbl_dict where gid=$1 and ts=$2; return res; end if; end; $$ language plpgsql strict;

Batch Operation

select ins1(gid, ts) from (values (),(),.....()) as t(gid, ts);    

select ins2(ts) from (values (),(),.....()) as t(ts);
select ins1(id, 'test'||id) from generate_series(1,100) t(id);    

...........
0
(100 rows)

Time: 1.979 ms

Write Operation Stress Test

With group ID

vi test1.sql    
\set gid random(1,10)
\set ts random(1,100000000)
select ins1(:gid, md5(:ts::text));
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 56 -j 56 -T 120

transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 18082960
latency average = 0.232 ms
latency stddev = 0.517 ms
tps = 150680.114138 (including connections establishing)
tps = 150687.227354 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set gid random(1,10)
0.000 \set ts random(1,100000000)
0.230 select ins1(:gid, md5(:ts::text));

Without group ID

vi test2.sql    
\set ts random(1,100000000)
select ins2(md5(:ts::text));
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 56 -j 56 -T 120

transaction type: ./test2.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 11515008
latency average = 0.584 ms
latency stddev = 0.766 ms
tps = 95613.170828 (including connections establishing)
tps = 95618.249995 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set ts random(1,100000000)
0.582 select ins2(md5(:ts::text));

With globally unique group ID

vi test3.sql    
\set gid random(1,10)
\set ts random(1,100000000)
select get_sn(:gid, md5(:ts::text));
pgbench -M prepared -n -r -P 1 -f ./test3.sql -c 56 -j 56 -T 120

transaction type: ./test3.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 7665708
latency average = 0.877 ms
latency stddev = 0.666 ms
tps = 63868.058538 (including connections establishing)
tps = 63875.166407 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set gid random(1,10)
0.000 \set ts random(1,100000000)
0.875 select get_sn(:gid, md5(:ts::text));
postgres=# select * from tbl_dict limit 10;
gid | ts | sn
-----+----------------------------------+-------------
9 | 8021bdb598f73577a063b50bdf0cef31 | -2147483648
3 | e1988c3c7a80dcd1b1c1bdcf2ac31fe7 | -2147483646
7 | 6ee09b73df8ae9bb97a4ebd4c51bd212 | -2147483647
1 | fa8303da6ea2b6e995a1e090fb9cd9f2 | -2147483645
7 | ca1c614104f1ad3af92d8d9a2911a5b6 | -2147483643
8 | 4641dd1162f46e8be5f643facc85df94 | -2147483644
6 | 88250e10f0d27cdebbf5c5eb4a7032a3 | -2147483641
2 | 5718da726fd20d8fd12d56e9bf2d7e9e | -2147483642
1 | 687e553016fe6bd1dba3ca6126b8b5b8 | -2147483639
10 | a4707645d604dd1ad9ba96ff303cf9d9 | -2147483638
(10 rows)

Gap Percentage

Cause of gaps: irreversible use of the sequence. Even if a transaction fails, the consumed sequence value cannot be returned.

postgres=# select gid,count(*),min(sn),max(sn),((max(sn)+1)/count(*)::float8-1)*100||' %' from tbl1 group by gid;    
gid | count | min | max | ?column?
-----+---------+-----+---------+----------
1 | 1790599 | 0 | 1790598 | 0 %
2 | 1793384 | 0 | 1793383 | 0 %
3 | 1791533 | 0 | 1791532 | 0 %
4 | 1792755 | 0 | 1792754 | 0 %
5 | 1793897 | 0 | 1793896 | 0 %
6 | 1794786 | 0 | 1794785 | 0 %
7 | 1792282 | 0 | 1792281 | 0 %
8 | 1790630 | 0 | 1790629 | 0 %
9 | 1791303 | 0 | 1791302 | 0 %
10 | 1790307 | 0 | 1790306 | 0 %
(10 rows)

postgres=# select count(*),min(sn),max(sn),((max(sn)+1)/count(*)::float8-1)*100||' %' from tbl2;
count | min | max | ?column?
----------+-----+----------+------------------------
10877124 | 0 | 10877128 | 4.59680334685686e-05 %
(1 row)

Read Operation Stress Test

As long as it is full, it just returns SN. So you can just slightly modify the stress test script

vi test1.sql    
\set gid random(1,10)
\set ts random(1,10000)
select ins1(:gid, md5(:ts::text));


pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 56 -j 56 -T 120

transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 69229025
latency average = 0.097 ms
latency stddev = 0.040 ms
tps = 574906.288558 (including connections establishing)
tps = 575063.117108 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set gid random(1,10)
0.001 \set ts random(1,10000)
0.098 select ins1(:gid, md5(:ts::text));
vi test3.sql    
\set gid random(1,10)
\set ts random(1,10000)
select * from tbl1 where gid=:gid and ts=md5(:ts::text);

pgbench -M prepared -n -r -P 1 -f ./test3.sql -c 56 -j 56 -T 120

transaction type: ./test3.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 90985807
latency average = 0.074 ms
latency stddev = 0.009 ms
tps = 758067.503368 (including connections establishing)
tps = 758109.672642 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set gid random(1,10)
0.001 \set ts random(1,10000)
0.074 select * from tbl1 where gid=:gid and ts=md5(:ts::text);

Summary

Using UDF, sequence and other PostgreSQL functions, you can implement the “Global ID Assignment Service” design as required at the beginning of this article.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com