Creating a Real-Time User Profile Recommendation System with PostgreSQL (2)

By Digoal

User profiles are already widely applied to application reconstruction in marketing. A popular solution is tagging users and discovering target users based on tag combinations.

A profile system may use wide tables and distributed systems. The role of wide tables is to store tags, and each column represents a tag. In fact, this design is not necessarily the best or only one. This document will introduce and discuss the efficiency of an alternative design concept based on PostgreSQL.

Implementation Details for Bit Solution 2

ApsaraDB RDS for PostgreSQL’s New Bit Operation Functions

This document will refer to some useful new functions of ApsaraDB RDS for PostgreSQL.

For the built-in BIT operation functions in the database, please refer to the source code

Table Structure Design

Store users in bits

userid int8 indicates that there may be over 4 billion users.

rowid int indicates that a single APPID is not allowed to possess more than 2 billion users. Perform auto-increment from 0, matching the bit subscripts.

appid int indicates that there will be no more than 4 billion users.

In the dictionary table, rowid determines the MAP sequence, which is returned using a window query.

Insert the function of the user dictionary table, which can generate seamless and continuous ROWIDs.

If NULL is returned during the call above, it indicates that the insertion failed. This may be caused by a unique violation, so you can retry on the application end.

Perform stress testing on the function above to determine whether seamless insertion can be achieved. Raise notice can be disabled during the stress testing.


This completely satisfies requirements on insert speed and seamlessness.
Generate 100 million test users, APPID=1, for subsequent tests

Update the table in real time

In order to improve the write performance, data is written into the table in real time, and incrementally merged into the TAG table on the backend.

Generate 15 million pieces of test data (APPID=1, there is a total of 2 billion random USERIDs, the range for tagid addition is 1–10,000, and the range for tagid deletion is 1–1,000)

Tag + userids bitmap table is critical as it will be queried frequently. Data is incrementally merged into this table from t_user_tags.

Generate 10,000 pieces of TAG test data, where each TAG contains BITs of 100 million users. For convenience of subsequent tests

Test Query Performance for Combinations of TAGs

This indicator shows the performance when delineating and returning a user group when a user queries a combination of TAGs.

The testing is simple: Include all, include none, and include any.

1. Users that include the tags

The result is users with bit 1

The following is the test SQL

Performance data

2. Users that do not include the tags

The result is users with bit 0

The following is the test SQL

Performance data

3. Include any of the TAGs

The result is users with bit 1

The following is the test SQL

Performance data

Advanced Functions

  • Combined with bit_posite, multiple users can be forward or reverse retrieved (for example, reversely retrieve 10,000 users. This is useful if, for example, there are 1 million results but the promotion only needs 10,000 users, especially if they must be 10,000 new users).
  • Combined with get_bit, a segment of BITs can be intercepted, and the result can then be easily obtained.

Test Performance When Adding Data

Refers to the performance when adding data to the t_user_tags table.

The test is as follows

Performance data (the QPS of a one-step operation is about 122,000, including adding or deleting TAGs)

Update actions can be divided into two parts, i.e., adding and deleting. Do not merge them to the same record.

Test Performance When Merging Data

Merging data involves three steps

  1. Update the user dictionary table t_userid_dic;
  2. Obtain and delete t_user_tags records in batch;
  3. Merge label data into t_tags.

The above three actions can be completed in one transaction.

Considering that the userids field in a t_tags table contains 100 million bits, about 12.5 MB, updating a single record may take a long time, so the operation should be done in parallel mode with each TAG computed in parallel.

Dictionary update is wrapped into the following function

Retrieve data from the t_user_tags table and update the data dictionary, marking the data to allow merging.

There is no need to execute this operation in parallel, serial execution using an infinite loop in a background process is acceptable.

Since batch operations will generate a large number of AD LOCKs, max_locks_per_transaction should be added and data parameters must be adjusted accordingly


Execute a few times

This means that the system processes about 20,000 records per second

Verify the accuracy of the dictionary update

Although the operation isn’t necessarily run in parallel, its security during parallel execution must be ensured, so next we will go over how to do so.

Verify that the parallel results are secure and reliable

Data merging is wrapped into the following function

Preface: If you need to process updates and mergers for a user dictionary with one function, make sure to do so on the repeatable read isolation level to guarantee that the processed dictionary data will be consistent with the merged data.

We updated the dictionary above, so next we can merge data from t_user_tags into t_tags.

Considering that it may be slow to update T_TAGS, you should increase the degree of parallelism as much as possible so that you can process multiple TAGs at once.

— Do not apply an APPID mode and an APPID+tag mode to the same APPID in parallel processes.

Speed test

Merges about 150,000 records per second.

This verification method requires the data of the merge result to be consistent with the merged data.


If a result is returned, it means that there was an error with the merger.

Query which TAGs belong to a user

Find the rowid corresponding to the userid, and determine whether the tag exists according to the bit at the userid’s rowid.

Example, pay attention to the alignment (or improving the get_bit function, to support an operation without BITs)

The process of querying which TAGs belong to a single user is a heavy operation, and if there are a lot of TAGs and users, then it is recommended to do so through parallel processing.

Configuring parallelism parameters

Returning user data after processing each tag in parallel takes about 0.76 ms.

If a return is executed using a cursor, the first user can then be obtained quickly.


A large APPID is sharded according to the USER segment

APPID + segment shard

If an APPID includes 10,000 TAGs, 100 million users only occupies 120 GB.

Usually, redistribution is only needed when skews appear. PostgreSQL uses postgres_fdw to provide native support for data sharding. The batch of TAGs for a single APPID must be located at one node.

Dictionary Translation

Dictionary translation obtains dictionary values from the bit position. Assume that the dictionary ID is imei+id (id is a seamless auto-incrementing ID). How can we obtain the corresponding imei from the bit position?

You may also use the cursor to improve the instant response speed.

This SQL is fast. It uses index scanning, and only takes 380 milliseconds to query 1 million records from 100 million.


Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store