Automatic Cleaning and Scheduling of PostgreSQL Rotate Tables — Constraints and Triggers

Image for post
Image for post

Background

Time series scenarios usually have the following requirements:

1) Expired historical data must automatically clear up as fast as possible to prevent write-ahead logging (WAL). Either the drop or truncate method can be used.

2) PG supports partition tables. It is better to use partitions. However, this requires the constant creation of partitions. Currently, pg_pathman automatically creates partitions. But partitions aren't automatically deleted.

So, now the critical question is how to proceed ahead?

Method

Let’s take a look at the proposed solution to fix the issue raised in the preceding section:

1) Specify a fixed number of partition tables (similar to list partitions) in a period. For example, there’s a need for 24 tables ending with hours, 7 weekly tables, and 60 minute-based tables.

2) Trigger and clean (truncate) old data based on data insertion time.

This method fixes the written code without necessitating any further changes and creates all the necessary data tables at one time.

Example

Step 1) Create all detailed partition tables required in a period. Create a total of six tables, each for 10 minutes as shown below.

create table t1_0 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='0'));    
create table t1_1 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='1'));
create table t1_2 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='2'));
create table t1_3 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='3'));
create table t1_4 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='4'));
create table t1_5 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='5'));

Step 2) Create a constraint table or a scheduling table (because it’s not possible to trigger a truncate event for each record). Also, clean data generated more than 10 minutes ago at every 10-minute interval as shown below.

create table t_const(crt_time timestamp primary key);

Step 3) Create a trigger for the partition table, truncate data by period, and write the data into the constraint table.

create or replace function tg() returns trigger as $$  
declare
begin
insert into t_const values (to_timestamp(substring(to_char(NEW.crt_time,'yyyymmddhh24miss'), 1, 11)||'000','yyyymmddhh24miss')) on conflict (crt_time) do nothing;
return null;
end;
$$ language plpgsql strict;

create trigger tg1 after insert on t1_0 for each row execute procedure tg();
create trigger tg1 after insert on t1_1 for each row execute procedure tg();
create trigger tg1 after insert on t1_2 for each row execute procedure tg();
create trigger tg1 after insert on t1_3 for each row execute procedure tg();
create trigger tg1 after insert on t1_4 for each row execute procedure tg();
create trigger tg1 after insert on t1_5 for each row execute procedure tg();

Step 4) Execute the following commands to create a constraint table trigger to trigger the rules for cleaning detailed partitions.

create or replace function tg_truncate() returns trigger as $$  
declare
suffix int := substring(to_char(NEW.crt_time,'yyyymmddhh24miss'), 11, 1)::int;
begin
set lock_timeout = '1s';

for i in 0..5 loop
if i=suffix then
continue;
end if;

if suffix=0 and i=5 then
continue;
end if;

if i=suffix-1 then
continue;
end if;

execute 'truncate t1_'||i;

raise notice 'truncated %', 't1_'||i;
end loop;


return null;
end;
$$ language plpgsql strict;
create trigger tg1 after insert on t_const for each row execute procedure tg_truncate();

With this the automatic cleanup is complete! Do give a try yourself.

The following snippet displays the automatic cleanup.

postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:00:40');  
NOTICE: truncated t1_1
NOTICE: truncated t1_2
NOTICE: truncated t1_3
NOTICE: truncated t1_4
INSERT 0 1

If data is written to the same partition later, automatic cleanup will not be triggered. This complies with requirements. Automatic cleanup is triggered only by the first piece of written data.

postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:00:40');  
INSERT 0 1
postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:01:50');
INSERT 0 1

Summary

Data can be stored by partition. If you use a native partition table, a shared lock is added to the primary table and all its partitions during data writing, and hence the user cannot perform truncate. We recommend using the pg_pathman partition or directly write data into a partition.

Also, note that the truncate method prevents WAL log generation during data cleanup, and hence triggers are used to enable automatic data cleanup.

References

Note: The above articles are in Chinese

Original Source:

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store