PostgreSQL Practices — Application of Graph Search in Content Communities

By Digoal

Background

Generally, a content community website may need to record such data: articles, users, and tags.

The relation between the three should also be recorded, including that the tag belongs to the article, the user has read the article, the user has added the article to Favorites, the user has followed a user, and the user is the author of an article.

The ultimate goal is to achieve transparent inquiries. For example: What articles are those who have read this article also reading? Who may have similar interests as me?

However, in a community website, there are typically tens of millions of articles, and close to 10 million users.

How Can This Requirement Be Implemented?

In fact, this requirement can be easily implemented with arrays and smlar in PostgreSQL. We will start the design and stress testing below.

Arrays are used to store forward and reverse relations, tags, and so on.

Smlar is used to query similar arrays (find users with similar interests).

Design

Metadata

1. User table

2. Tag table

3. Article table

Relational Data

1. Forward relation

1.1. Who has read the article?

1.2. Who has added the article to Favorites?

2. Reverse relation

2.1. What articles has the user read? And What tags are included in these articles?

2.2. What articles has the user added to Favorites? And what tags are included in these articles?

Query

1. Other articles that other users who have read this article are reading (except the current article and articles I have read).

The logic is as follows, written as the UDF:

The UDFs are as follows. Indexes can be used for all, and point queries are performed after aggregation. The performance is very good:

2. To determine which users who share my interests in terms of read articles, the GIN index performs very well.

3. Users who share my interests in terms of the tags in read articles

The specific content is similar to that of 2, which is omitted here.

4. Users who share my interests in terms of favorited articles

The specific content is similar to that of 2, which is omitted here.

5. Users who share my interests in terms of the tags in favorited articles

The specific content is similar to that of 2, which is omitted here.

UDFs for Generating Forward and Reverse Relations

Use UDFs to reduce interactions and complete the following types of business logic operations. UDFs can be compiled using plpgsql, which is very simple. The specific content is introduced in this article: https://www.postgresql.org/docs/10/static/plpgsql.html

1. When new articles are created, tags are automatically generated and the tag table is updated or appended.

2. When reading the article, the forward-inverse relation is modified.

The tags information of the article is obtained from arts

3. When favoriting the article, the forward-inverse relation is modified.

The tags information of the article is obtained from arts

Add an Index

Optional index

Fill in Test Data

1. Generate 10 million users

2. Generate 100,000 tags

3. Generate 50 million articles

4. Generate a forward relation, with each article being read by 500 users and favorited by 50 users on average.

5. Generate a reverse relation (in theory, the reverse relation and the forward relation should correspond one-to-one. For the convenience of testing, I will not perform this operation here. The test result is the same.)

Each user reads 1,000 articles on average, involving 500 tags. 100 articles are added to Favorites, involving 50 tags.

Performance Tests

1. Other articles that other users who have read this article are reading (except the current article and articles I have read).

Other users have read about 500,000 other articles. It takes 200 milliseconds to obtain and sort this result.

2. Users who share my interests in terms of read articles

Time: 2.4 milliseconds.

Pre-Computing and Performance Optimization

The above contents (recommending articles and finding users with similar interests) refer to the performance of real-time query, but in fact, these operations can be pre-computed (because the increment of articles is not too large, and the users reading these articles do not change too much). For example, articles are refreshed once a day, so when users with similar interests, and similar articles are recommended to the user, if pre-computing is performed, the results only need to directly queried, and the performance is improved to 0.0N millisecond-level response. For new articles that have not been pre-computed, real-time queries (and a corresponding update to the pre-computing table) can be conducted, which can also respond in milliseconds.

Pre-computing can also operate in another mode. When someone inquires about this article, you can decide whether to re-inquire and update the table according to the last pre-computed time. (That is, the mode of real-time computing + cache + cache timeout.)

The logic is as follows:

Summary

Thirty percent development and seventy percent operation. Content websites are similar to social software, and the operation is the priority. The key link in the operation is a “circle”, which can gather popularity. The formation of a circle often depends on recommendations, and the source of recommendations is behaviors. The content and users to recommend to the target depend on behaviors. This is the principle that birds of a feather flock together.

With PostgreSQL arrays and smlar, it is very easy to implement efficient classification query and recommendations.

1. Arrays are used to store forward and reverse relations, tags, and so on.

2. Smlar is used to query similar arrays (to find users with similar interests).

It is very convenient and efficient in social operation and content operation scenarios.

They can also be used to easily recommend hot users and hot articles, which has been tested in other cases. See the end of this article for details.

References

https://www.postgresql.org/docs/10/static/plpgsql.html

https://www.postgresql.org/docs/10/static/intarray.html

https://github.com/bitnine-oss/agensgraph

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store