Efficiently Implementing Full-table and Full-field Fuzzy Search in Milliseconds with PostgreSQL

Background

Implementation Example of Full-table and Full-field Fuzzy Search

postgres=# create table t(phonenum text, info text, c1 int, c2 text, c3 text, c4 timestamp);    
CREATE TABLE
postgres=# insert into t values ('13888888888','i am digoal, a postgresqler',123,'china','中华人民共和国,阿里巴巴,阿',now());
INSERT 0 1
postgres=# select * from t;
phonenum | info | c1 | c2 | c3 | c4
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2016-04-19 11:15:55.208658
(1 row)
如果不是,你可以这么指定collate 和 ctype  

postgres=# create database test with template template0 lc_collate 'zh_CN.utf8' lc_ctype 'zh_CN.utf8';
create extension pg_trgm;  

create or replace function record_to_text(anyelement) returns text as $$
select $1::text;
$$ language sql strict immutable;

test=# create index idx_t_1 on t using gin (record_to_text(t) gin_trgm_ops) ;
CREATE INDEX

当需要使用分页,或者结果集很大时,建议使用gist
test=# create index idx_t_2 on t using gist (record_to_text(t) gist_trgm_ops) ;
CREATE INDEX
test=# explain select * from t where record_to_text(t) ~ 'digoal';  
QUERY PLAN
-------------------------------------------------------------------
Index Scan using idx_t_2 on t (cost=0.38..8.39 rows=1 width=140)
Index Cond: (record_to_text(t.*) ~ 'digoal'::text)
(2 rows)
先插一堆数据进去  
postgres=# insert into t select * from t;
INSERT 0 4194304
test=# select count(*) from t;
count
---------
4194304
(1 row)

然后插几条不一样的

insert into t values ('13888889999','i am dege, a postgresqler',123,'china','德歌 德哥 刘德华 彭德怀',now());
insert into t values ('13888889999','i am dege, a postgresqler',123,'china','德歌 德哥 刘德华 彭德怀',now());

vacuum analyze t;
test=# explain (analyze,verbose,timing,costs,buffers) select * from t where record_to_text(t) ~ 'dege';  
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Scan using idx_t_2 on public.t (cost=0.41..2.43 rows=1 width=101) (actual time=0.236..0.254 rows=2 loops=1)
Output: phonenum, info, c1, c2, c3, c4
Index Cond: (record_to_text(t.*) ~ 'dege'::text)
Buffers: shared hit=5
Planning time: 0.349 ms
Execution time: 0.301 ms
(6 rows)
test=# /*+ BitmapScan(t idx_t_1) */ explain (analyze,verbose,timing,costs,buffers) select * from t where record_to_text(t) ~ '刘德华' limit 10;  
LOG: available indexes for BitmapScan(t): idx_t_1
LOG: pg_hint_plan:
used hint:
BitmapScan(t idx_t_1)
not used hint:
duplication hint:
error hint:

LOG: pg_hint_plan:
used hint:
not used hint:
BitmapScan(t idx_t_1)
duplication hint:
error hint:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=441.00..442.26 rows=1 width=101) (actual time=0.239..0.255 rows=2 loops=1)
Output: phonenum, info, c1, c2, c3, c4
Buffers: shared hit=4
-> Bitmap Heap Scan on public.t (cost=441.00..442.26 rows=1 width=101) (actual time=0.238..0.252 rows=2 loops=1)
Output: phonenum, info, c1, c2, c3, c4
Recheck Cond: (record_to_text(t.*) ~ '刘德华'::text)
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on idx_t_1 (cost=0.00..441.00 rows=1 width=0) (actual time=0.086..0.086 rows=2 loops=1)
Index Cond: (record_to_text(t.*) ~ '刘德华'::text)
Buffers: shared hit=3
Planning time: 0.494 ms
Execution time: 0.313 ms
(13 rows)


test=# /*+ BitmapScan(t idx_t_1) */ select * from t where record_to_text(t) ~ '刘德华' limit 10;
LOG: available indexes for BitmapScan(t): idx_t_1
LOG: pg_hint_plan:
used hint:
BitmapScan(t idx_t_1)
not used hint:
duplication hint:
error hint:

LOG: pg_hint_plan:
used hint:
not used hint:
BitmapScan(t idx_t_1)
duplication hint:
error hint:

phonenum | info | c1 | c2 | c3 | c4
-------------+---------------------------+-----+-------+-------------------------+----------------------------
13888889999 | i am dege, a postgresqler | 123 | china | 德歌 德哥 刘德华 彭德怀 | 2017-01-06 17:04:42.19215
13888889999 | i am dege, a postgresqler | 123 | china | 德歌 德哥 刘德华 彭德怀 | 2017-01-06 17:04:42.514895
(2 rows)

Time: 1.225 ms

Statement Timeout

postgres=# set statement_timeout = '1s';  
SET

or

test=# /*+ Set(statement_timeout 1s) */ select * from t where record_to_text(t) ~ 'd' limit 10;
LOG: pg_hint_plan:
used hint:
Set(statement_timeout 1s)
not used hint:
duplication hint:
error hint:

LOG: pg_hint_plan:
used hint:
Set(statement_timeout 1s)
not used hint:
duplication hint:
error hint:

phonenum | info | c1 | c2 | c3 | c4
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国,阿里巴巴,阿 | 2017-01-06 16:51:16.840941
(10 rows)

Use of Hint

  • When result sets are returned with cursors: Use GiST
  • When less than 3 characters are entered: Use GiST
  • When a few rows are evaluated: Use GIN

Other Optimization

Read-only Instances

Original Source

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com