Creating a Real-Time User Profile Recommendation System with PostgreSQL (1)

By Digoal

User profiles are already widely applied to application reconstruction in marketing. A popular solution is tagging users and discovering target users based on tag combinations.

A profile system may use wide tables and distributed systems. The role of wide tables is to store tags, and each column represents a tag. In fact, this design is not necessarily the best or only one. This document will introduce and discuss the efficiency of an alternative design concept based on PostgreSQL.

Simulation Scenario

Let’s assume that there is a 2B real-time user recommendation system, and each APPID represents one business. Service data is comprised of the APPID, USERIDs, and TAGs. (2B user ID, end user IDs, and tags). The service does not need any data exchange operations across APPIDs, so it merely provides user recommendations within the APPID.

Queries are restricted to within a selected APPID and TAG combination, and USERIDs that meet search criteria are found and pushed to the user. There is a total of about 1 billion data points, and a single APPID has about 100 million users at most. The total amount of TAGs is designed to be 10,000.

Query requirements:

  • Include, not include, or, and.

I’ll introduce four possible solutions and discuss the advantages and disadvantages of each.

Solution 1: Wide Table Solution

The width of a table is usually restricted. Taking PostgreSQL as an example, one record is not allowed to span pages (a variable-length field is stored in TOAST storage to store a column wider than a page, and only points are stored in the page), which restricts the table width.

For example, an 8 KB data block may be stored in nearly 2,000 columns.

If a column is designed for each TAG, a wide table with 10,000 columns is needed.

Similar restrictions seem to exist in other databases. A wide table with 10,000 columns will never meet the requirements unless the database kernel is transformed.

APPID+USERID can be used as the PK and stored into multiple tables so that an unlimited number of TAGs can be stored. For example, a single table with 1,000 columns can store 10,000 TAGs using only 10 tables.

create table t_tags_1(appid int, userid int8, tag1 boolean, tag2 boolean, ...... tag1000 boolean);  
create table t_tags_10(appid int, userid int8, tag9001 boolean, tag9002 boolean, ...... tag10000 boolean);

To improve efficiency, indexes should be created for each tag field, resulting in 10,000 indexes.

If the TAG combination spans multiple tables, it will need a JOIN operation.


No special optimization is required, and almost all databases support this solution.


Performance may suffer, especially when a query is based on a combination of multiple conditions, such as (tag1 and tag2 and (tag4 or tag5) or not tag6).

Solution 2: Array Solution

An array is used to replace TAG columns. This requires the database to support arrays and possess high-efficiency retrieval capability. PostgreSQL fits this requirement perfectly.

Data structure


A single array has a maximum length of 1 GB (supporting nearly 260 million TAGs)

Perform partitioning according to APPIDs, perform sharding randomly

Query syntax

  • Include all the TAGs specified in array2. Array 1 includes all the elements in array 2. (It supports index search.) array1 @> array2


create table t_arr(appid int, userid int8, tags int2[]) with(parallel_workers=128);
create index idx_t_array_tags on t_arr using gin (tags) with (fastupdate=on, gin_pending_list_limit= 1024000000);
create index idx_t_arr_uid on t_arr(userid);

819,200 KB may cache about 10,000 pieces of 80 KB array records. This can be adjusted. Each USERID contains 10,000 TAGs (limit).

insert into t_arr select 1, 2000000000*random(),(select array_agg(10000*random()) from generate_series(1,10000));nohup pgbench -M prepared -n -r -f ./test.sql -P 1 -c 50 -j 50 -t 2000000 > ./arr.log 2>&1 &


This solution is capable of storing more than enough TAGs, into the hundreds of millions (even 10,000 TAGs are considered a lot in the industry).

It supports index search of arrays but does not support indexes.


The data size is still large because a record may include 10,000 TAGs of about 80 KB.

100 million records is about 8 TB, and indexes need about another 8 TB.

Not all databases support arrays.

Solution 3: Bit Solution 1

BITs are used to store TAGs, where 0 and 1 indicate whether or not there is a TAG respectively.

Data structure

APPID, USERID, TAG bit stream

A single BIT field supports a BIT stream with a maximum length of 1 GB (supporting 8.5 billion TAGs)

Each BIT represents a TAG

Perform partitioning according to APPIDs, perform sharding randomly

Query syntax

  • Include all the TAGs specified by bit2 (BITs corresponding to TAGs that require configuration are set to 1, and the rest are set to 0). bitand(bit1,bit2) = bit2


create table t_bit(appid int, userid int8, tags varbit) ; create index idx_t_bit_uid on t_bit(userid);

Each USERID corresponds to 10,000 random bit values

date;for ((i=1;i<=50;i++)); do psql -c "insert into t_bit select 1, 2000000000*random(), \
(select (string_agg(mod((2*random())::int,2)::text,''))::varbit from generate_series(1,10000)) tags \
from generate_series(1,2000000)" ; done; date

127 GB, 245,000 inserts per second, 326 MB/s
Insert speed
(Batch) 245,000 inserts per second, 326 MB/s

TAG update and delete speeds

create or replace function randbit(int) returns varbit as 
select (string_agg(mod((2*random())::int,2)::text,''))::varbit from generate_series(1,$1);$$
language sql strict volatile;
create or replace function zerobit(int) returns varbit as
select (string_agg('0',''))::varbit from generate_series(1,$1);$$
language sql strict immutable;
update t_bit set tags=randbit(10000) where userid=:id;

Update or delete 10,000 records per second with a response time of about 4 ms

Query speed

do language plpgsql 
sql text;
bit1 varbit := randbit(10000);
bit2 varbit := randbit(10000);
bit3 varbit := randbit(10000);
zbit varbit := zerobit(10000);
set max_parallel_workers_per_gather =27;
sql := 'select * from t_bit where bitand(tags,'''||bit1::text||''')='''||bit1::text||''' and bitand(tags,'''||bit2::text||''')>bit''0'' and bitand(tags,'''||bit3::text||''')='''||zbit::text||'''';
raise notice '%', sql;
-- execute sql;

17 seconds with a concurrency of 27.


It can handle more than enough TAGs, up to 850 million (even 10,000 TAGs are considered a lot in the industry).

There are 10,000 TAGs, occupying 10,000 BITs, of about 1.25 KB.

100 million records is about 120 GB, and contains no index.


It doesn’t use the index method, so query performance can only be improved through parallel computing.

PostgreSQL 9.6 supports CPU parallel computing which allows for responses within 20 seconds even with 100 million users. It may, however, consume significant CPU resources, so the degree of parallelism is limited.

Solution 4: Bit Solution 2

So do we have a solution that both save resources and offer the efficiency we need?

The answer is yes.

Because queries are usually used to retrieve matching USERIDs based on combined TAG conditions, a reverse design may make queries more effective, where a TAG consists of a dimension and a USERID.

We need to maintain users under each tag, so there may be a significant amount of data that needs to be updated. A design that combines incremental merging and read time merging should be considered.

The data flow is quick and executed according to the following:

Data > detail list > incremental aggregation > appid, tagid, userid_bits

Two parts of data are merged while reading, one of which is the result of tag computations, and the other is the result of a non-merged detail list.

Of course, if the merging delay can be controlled to within 1 minute, and the service can handle that delay, then there is no need to merge the queries and results will be retrieved directly, which is much faster.


  • Users that include the tags (the result is users with bit 1). userids (bitand) userids


Query efficiency is very high with a query oriented design because the data storage dimensions have changed.


Because bits are used to represent USERIDs, relationships between positions and USERIDs must be mapped.

A USERID dictionary table needs to be maintained, and incremental merger measures are required to reduce data update frequency.

There will be a delay, which can generally be controlled to within 1 minute, but if the service can handle that delay, the results will be much better.

Usually, service USERIDs will expire periodically (such as inactive USERIDs, which may gradually expire over time), so that the USERID dictionary will require periodic maintenance, and USERID bit information will also need to be updated.

The structure is shown in the following figure:


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.