PostgreSQL Graph Search Practices — 10 Billion-Scale Graph with Millisecond Response

By Digoal

Background

Graph search is one of many features of PostgreSQL (including StreamCompute, full-text search, schema search, K-V storage, image search, fingerprint search, spatial data, time series data, recommendations, and so on).

Using the CTE syntax, you can easily implement the graph search (N-depth search, shortest path, point, edge property, and so on).

In the graph search, the depth of the tier, whether it is a cycle, and the path, can all be expressed.

Image for post
Image for post
Image for post
Image for post

Example

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

On this basis, the demo is as follows:

  1. Implementing the requirements, such as N-depth search, edge property view, and shortest path search.
  2. Removing cycle points, controlling the depth, and displaying paths.
  3. Generating drawing data.

Create 5 Billion Entries of Test Data

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

1. Create a table with the following structure. Points and edges can be described.

2. Generate test data.

3. About 340 GB of data.

Removing Cyclic Points, Controlling the Depth, and Displaying Paths

1. When a point occurs repeatedly in the path, a cycle occurs.

2. For each recursion, add 1 to the depth.

3. Use an array to store path. Single-column arrays, or multiple-column arrays (row arrays). For multi-column array paths, please see: https://www.postgresql.org/docs/10/static/queries-with.html

The SQL is as follows:

N-depth Search

For n-depth search, enter sg.depth <= N as above SQL.

For example, search for the 3-depth data with ROOT=31208.

Shortest Path

Remove the search depth, and add WHERE condition (filter C2) and LIMIT 1 to the statements querying the recursive table.

The SQL is as follows:

For example, search for the shortest path from 1 to 100.

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

How to Generate Drawing Data

To speed up the response, use the cursor to return.

The response time is fast, reaching the millisecond-level.

Control the Number of Returned Records for Each Tier

The deeper the tier is, the more records are returned. In fact, in graph search, it is not necessary to return all records of each tier (for example, only the top N records with high correlation, or the top N records with weight greater than a specific value are returned), thus controlling the number of records of each tier.

For example, search for 3-depth data with root = 31208, and limit the number of records returned per tier to 100.

The response speed is 3.2 milliseconds. (The reason is very simple. Each tier hits an index, combined with the cluster feature of PG (stored by c1), so the number of blocks can be reduced and performance can be improved further)

For stress testing, TPS is 12,000, and the average response time is 5.2 milliseconds.

Examples of Graph Search UDF Encapsulation

Encapsulating lengthy SQL statements into UDF can simplify the interfaces called by applications.

1. UDF1, to Return Records

Example:

2. UDF2, to Return Cursors

Example:

3. UDF3, to Return the Shortest Path between Two Points

Return the shortest path between two points and limits the depth, beyond which it is considered inaccessible.

For example, search for the shortest path from 1 to 100.

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

Example

4. UDF4, to Return the Shortest Path between Two Points, and Allow Setting the Weight Condition as a Filter Condition and Setting the Search Limit of Each Tier.

Return the shortest path between two points, and limit the depth and the search limit of each tier. If the depth is exceeded, it is considered inaccessible.

Example

Summary

Using the CTE syntax of PostgreSQL, it is very convenient to implement the requirements of graph search, including N-depth search, shortest path search, path, point and edge property (the edge property is stored in JSON to facilitate the architecture design, the display, the control and display of the tier depth, the control of the number of returned results for each tier, and the control of the return sequence and weight of each tier).

In terms of performance, for the PG 10 ON ECS environment, 5 billion point-edge network, N-depth search, and shortest path search, the response times are all in milliseconds (for the 3-depth search with a limit of 100 records per tier, the response time is only 2.1 milliseconds, and the TPS reaches 12,000).

The preceding query can be encapsulated into the plpgsql UDF interface of PostgreSQL to facilitate business calls (only some input conditions need to be exposed).

Original Source

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store