PostgreSQL Row-Level Full-Text Search

By Digoal.

In some applications, you may need to search all the fields in a table, and other fields may still require a precise query, while others in contrast may require a fuzzy query or a full-text search.

Now consider this scenario: the selection of drop-down boxes on some front-end pages. The selection of these boxes can actually be a headache for application developers because writing SQL statements can turn out to be quite a big hassle sometimes.

Now consider the code below:

postgres=# create table t(phonenum text, info text, c1 int, c2 text, c3 text, c4 timestamp);  
CREATE TABLE
postgres=# insert into t values ('13888888888','i am digoal, a postgresqler',123,'china','PRC, Alibaba, ah',now());
INSERT 0 1
postgres=# select * from t;
phonenum | info | c1 | c2 | c3 | c4
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba, Ah | 2016-04-19 11:15:55.208658
(1 row)

Now, based on this code, the following is a query the t table, shown above. If any field matches digoal, this following record is returned:

select * from t where phonenum='digoal' or info ~ 'digoal' or c1='digoal' or ......;

Of course, though, each field requires a condition, such as the precise matching condition, and the full-text search condition. However, with that said, Row-level full-text search can greatly simplify this query.

Below, we will go over how you can create a row-level text index to make things a bit easier.

Create a Row-Level Text Index

Consider Jieba as an example. The source code can be found here, and the pg_scws based on SCWS.

All of the above support custom dictionaries. The Installation process for these dictionaries is omitted here. The usage of these dictionaries is as follows:

postgres=# select t::text from t;  
t
-----------------------------------------------------------------------------------------------------------------
(13888888888,"i am digoal, a postgresqler",123,china,PRC, Alibaba, Ah,"2016-04-19 11:15:55.208658")
(1 row)

postgres=# select to_tsvector('jiebacfg',t::text) from t;
to_tsvector
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
' ':6,8,11,13,33 '04':30 '11':34 '123':17 '13888888888':2 '15':36 '19':32 '2016':28 '55.208658':38 'china':19 'digoal':9 'postgresqler':14 'PRC':21 'Alibaba':23
(1 row)

Use t::text to convert rows into a large text.

postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & china') from t;  
?column?
----------
t
(1 row)

postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & post') from t;
?column?
----------
f
(1 row)

To create a row-level text index, an immutable function index is required.

postgres=# create or replace function f1(regconfig,text) returns tsvector as 
$$

select to_tsvector($1,$2);

$$
language sql immutable strict;
CREATE FUNCTION

postgres=# create or replace function f1(text) returns tsvector as
$$

select to_tsvector($1);
$$
language sql immutable strict;
CREATE FUNCTION

postgres=# alter function record_out(record) immutable;
ALTER FUNCTION
postgres=# alter function textin(cstring) immutable;
ALTER FUNCTION
postgres=# create index idx_t_1 on t using gin (f1('jiebacfg'::regconfig,t::text)) ;
CREATE INDEX

Verification:

postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & post') ;  
phonenum | info | c1 | c2 | c3 | c4
----------+------+----+----+----+----
(0 rows)
postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & china') ;
phonenum | info | c1 | c2 | c3 | c4
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba | 2016-04-19 11:15:55.208658
(1 row)

postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;
phonenum | info | c1 | c2 | c3 | c4
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba, Ah | 2016-04-19 11:15:55.208658
(1 row)

postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..1.52 rows=1 width=140)
Filter: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))
(2 rows)

If the number of records is large, the indexes will be used. When the number of records is small, we can use Hint or Switch to force the index:

postgres=# set enable_seqscan=off;  
SET
postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=12.25..16.77 rows=1 width=140)
Recheck Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))
-> Bitmap Index Scan on idx_t_1 (cost=0.00..12.25 rows=1 width=0)
Index Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))
(4 rows)

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.