PostgreSQL Graph Search Practices — 10 Billion-Scale Graph with Millisecond Response

By Digoal

Background

Graph search is one of many features of PostgreSQL (including StreamCompute, full-text search, schema search, K-V storage, image search, fingerprint search, spatial data, time series data, recommendations, and so on).

Using the CTE syntax, you can easily implement the graph search (N-depth search, shortest path, point, edge property, and so on).

In the graph search, the depth of the tier, whether it is a cycle, and the path, can all be expressed.

Image for post
Image for post

Example

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

On this basis, the demo is as follows:

  1. Implementing the requirements, such as N-depth search, edge property view, and shortest path search.
  2. Removing cycle points, controlling the depth, and displaying paths.
  3. Generating drawing data.

Create 5 Billion Entries of Test Data

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

1. Create a table with the following structure. Points and edges can be described.

create table a(      
c1 int, -- point 1
c2 int, -- point 2
prop jsonb, -- The properties of the edges corresponding to points 1 and 2 are stored in JSON, including weights, relations and so on.
primary key (c1,c2) -- primary key
);

create index idx_a_2 on a(c1, COALESCE(((prop ->> 'weight'::text))::float8, 0));

2. Generate test data.

vi test.sql      

\set id random(1,10000000)
insert into a select :id, ((width_bucket(:id,1,10000000,2000)-1)*50000 + (random()*50000)::int) from generate_series(1,1000) on conflict (c1,c2) do nothing;
pgbench -M prepared -n -r -P 5 -f ./test.sql -c 50 -j 50 -t 100000

3. About 340 GB of data.

Removing Cyclic Points, Controlling the Depth, and Displaying Paths

1. When a point occurs repeatedly in the path, a cycle occurs.

2. For each recursion, add 1 to the depth.

3. Use an array to store path. Single-column arrays, or multiple-column arrays (row arrays). For multi-column array paths, please see: https://www.postgresql.org/docs/10/static/queries-with.html

The SQL is as follows:

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.c1] as path -- initial path
FROM a AS g
WHERE
c1 = ? -- ROOT node =?
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= ? -- search depth =?
)
SELECT * FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor

N-depth Search

For n-depth search, enter sg.depth <= N as above SQL.

For example, search for the 3-depth data with ROOT=31208.

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.c1] as path -- initial path
FROM a AS g
WHERE
c1 = 31208 -- ROOT node =?
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= 3 -- search depth =?
)
SELECT * FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor

Shortest Path

Remove the search depth, and add WHERE condition (filter C2) and LIMIT 1 to the statements querying the recursive table.

The SQL is as follows:

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.c1] as path -- initial path
FROM a AS g
WHERE
c1 = ? -- ROOT node =?
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
-- AND sg.depth <= ? -- search depth =? It can also be retained to prevent the search from being too deep and thus affecting the performance. For example, the search will not return after a depth of 10
)
SELECT * FROM search_graph
where c2 = ? -- End of the shortest path
limit 1; -- query a recursive table. You can add LIMIT output or use a cursor

For example, search for the shortest path from 1 to 100.

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.c1] as path -- initial path
FROM a AS g
WHERE
c1 = 1 -- ROOT node =?
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
-- AND sg.depth <= ? -- search depth =? It can also be retained to prevent the search from being too deep and thus affecting the performance. For example, the search will not return after a depth of 10
)
SELECT * FROM search_graph
where c2 = 100 -- end of the shortest path
limit 1; -- query a recursive table. You can add LIMIT output or use a cursor

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

How to Generate Drawing Data

To speed up the response, use the cursor to return.

begin;      
declare cur1 cursor for $query;
FETCH 1000 from cur1;
....
close cur1;
end;

The response time is fast, reaching the millisecond-level.

Control the Number of Returned Records for Each Tier

The deeper the tier is, the more records are returned. In fact, in graph search, it is not necessary to return all records of each tier (for example, only the top N records with high correlation, or the top N records with weight greater than a specific value are returned), thus controlling the number of records of each tier.

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
select c1,c2,prop,depth,path,cycle from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 depth, -- initial depth =1
ARRAY[g.c1] path -- initial path
FROM a AS g
WHERE
c1 = ? -- ROOT node =?
-- AND coalesce((prop->>'weight')::float8,0) >= ? -- correlation weight
-- ORDER BY coalesce((prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit ? -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,depth,path,cycle from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 depth, -- depth + 1
path || g.c1 path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= ? -- search depth =?
-- AND coalesce((prop->>'weight')::float8,0) >= ? -- correlation weight
-- ORDER BY coalesce((prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit ? -- How many records are limited at each tier?
) t
)
SELECT * FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor

For example, search for 3-depth data with root = 31208, and limit the number of records returned per tier to 100.

WITH RECURSIVE search_graph(      
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
select c1,c2,prop,depth,path from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 depth, -- initial depth =1
ARRAY[g.c1] path -- initial path
FROM a AS g
WHERE
c1 = 31208 -- ROOT node =?
-- AND coalesce((prop->>'weight')::float8,0) >= ? -- correlation weight
-- ORDER BY coalesce((prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit ? -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,depth,path from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 depth, -- depth + 1
path || g.c1 path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= 3 -- search depth =?
-- AND coalesce((prop->>'weight')::float8,0) >= ? -- correlation weight
-- ORDER BY coalesce((prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit 100 -- the number of records limited per tier
) t
)
SELECT * FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor
c1 | c2 | prop | depth | path | cycle
---------+----------+------+-------+------------------------+-------
31208 | 300008 | | 1 | {31208} | f
31208 | 300040 | | 1 | {31208} | f
31208 | 300046 | | 1 | {31208} | f
31208 | 300050 | | 1 | {31208} | f
31208 | 300061 | | 1 | {31208} | f
31208 | 300082 | | 1 | {31208} | f
31208 | 300093 | | 1 | {31208} | f
.................
3032152 | 30347906 | | 3 | {31208,300008,3032152} | f
3032152 | 30300272 | | 3 | {31208,300008,3032152} | f
3032152 | 30316175 | | 3 | {31208,300008,3032152} | f
3032152 | 30309844 | | 3 | {31208,300008,3032152} | f
3032152 | 30336508 | | 3 | {31208,300008,3032152} | f
3032152 | 30322201 | | 3 | {31208,300008,3032152} | f
3032152 | 30312579 | | 3 | {31208,300008,3032152} | f
(300 rows)

Time: 3.245 ms

The response speed is 3.2 milliseconds. (The reason is very simple. Each tier hits an index, combined with the cluster feature of PG (stored by c1), so the number of blocks can be reduced and performance can be improved further)

QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on search_graph (cost=25460.78.. 25482.78 rows=1100 width=77) (actual time=0.044.. 2.009 rows=300 loops=1)
Output: search_graph.c1, search_graph.c2, search_graph.prop, search_graph.depth, search_graph.path, search_graph.cycle
Buffers: shared hit=522
CTE search_graph
-> Recursive Union (cost=0.58.. 25460.78 rows=1100 width=77) (actual time=0.042.. 1.755 rows=300 loops=1)
Buffers: shared hit=522
-> Limit (cost=0.58.. 402.52 rows=100 width=77) (actual time=0.040.. 0.183 rows=100 loops=1)
Output: g.c1, g.c2, g.prop, 1, (ARRAY[g.c1]), false
Buffers: shared hit=97
-> Index Scan using a_pkey on public.a g (cost=0.58.. 393024.56 rows=97782 width=77) (actual time=0.038.. 0.166 rows=100 loops=1)
Output: g.c1, g.c2, g.prop, 1, ARRAY[g.c1], false
Index Cond: (g.c1 = 31208)
Buffers: shared hit=97
-> Limit (cost=2249.76.. 2502.53 rows=100 width=77) (actual time=0.372.. 0.473 rows=67 loops=3)
Output: g_1.c1, g_1.c2, g_1.prop, ((sg.depth + 1)), ((sg.path || g_1.c1)), ((g_1.c1 = ANY (sg.path)))
Buffers: shared hit=425
-> Nested Loop (cost=2249.76.. 41872589.09 rows=16564685 width=77) (actual time=0.372.. 0.462 rows=67 loops=3)
Output: g_1.c1, g_1.c2, g_1.prop, (sg.depth + 1), (sg.path || g_1.c1), (g_1.c1 = ANY (sg.path))
Buffers: shared hit=425
-> WorkTable Scan on search_graph sg (cost=0.00.. 22.50 rows=167 width=40) (actual time=0.001.. 0.011 rows=35 loops=3)
Output: sg.c1, sg.c2, sg.prop, sg.depth, sg.path, sg.cycle
Filter: ((NOT sg.cycle) AND (sg.depth <= 3))
-> Bitmap Heap Scan on public.a g_1 (cost=2249.76.. 248006.21 rows=99190 width=40) (actual time=0.010.. 0.010 rows=2 loops=104)
Output: g_1.c1, g_1.c2, g_1.prop
Recheck Cond: (g_1.c1 = sg.c2)
Heap Blocks: exact=3
Buffers: shared hit=425
-> Bitmap Index Scan on a_pkey (cost=0.00.. 2224.96 rows=99190 width=0) (actual time=0.009.. 0.009 rows=19 loops=104)
Index Cond: (g_1.c1 = sg.c2)
Buffers: shared hit=422
Planning time: 0.436 ms
Execution time: 2.128 ms
(32 rows)

Time: 3.301 ms

For stress testing, TPS is 12,000, and the average response time is 5.2 milliseconds.

transaction type: ./test.sql    
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 1463760
latency average = 5.239 ms
latency stddev = 1.171 ms
tps = 12196.876360 (including connections establishing)
tps = 12201.718092 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
5.295 WITH RECURSIVE search_graph(

Examples of Graph Search UDF Encapsulation

Encapsulating lengthy SQL statements into UDF can simplify the interfaces called by applications.

1. UDF1, to Return Records

create or replace function graph_search1(    
IN i_root int, -- The node that the search is based on
IN i_depth int default 99999, -- the tier to search (the depth limit)
IN i_limit int8 default 2000000000, -- limit the number of records returned for each tier
IN i_weight float8 default 0, -- limit the weight
OUT o_path int[], -- output: path, an array of IDs
OUT o_point1 int, -- output: point 1 ID
OUT o_point2 int, -- output: point 2 ID
OUT o_link_prop jsonb, -- output: the connection property between the current two points
OUT o_depth int -- output: current depth (tier)
) returns setof record as
$$

declare
sql text;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- current edge property
depth, -- current depth, starting from 1
path -- path, stored using an array
) AS (
select c1,c2,prop,depth,path from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 depth, -- initial depth=1
ARRAY[g.c1] path -- initial path
FROM a AS g
WHERE
c1 = %s -- ROOT node =?
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,depth,path from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 depth, -- depth + 1
path || g.c1 path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= %s -- search depth =?
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
)
SELECT path||c2 as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, depth as o_depth
FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_root, i_weight, i_limit, i_depth, i_weight, i_limit
);

return query execute sql;

end;
$$
language plpgsql strict;

Example:

postgres=# select * from graph_search1(31208, 3, 100, 0);    
o_path | o_point1 | o_point2 | o_link_prop | o_depth
---------------------------------+----------+----------+-------------+---------
{31208,344710} | 31208 | 344710 | | 1
{31208,319951} | 31208 | 319951 | | 1
{31208,340938} | 31208 | 340938 | | 1
{31208,325272} | 31208 | 325272 | | 1
{31208,346519} | 31208 | 346519 | | 1
{31208,314594} | 31208 | 314594 | | 1
{31208,307217} | 31208 | 307217 | | 1
{31208,348009} | 31208 | 348009 | | 1
{31208,300046} | 31208 | 300046 | | 1
{31208,344359} | 31208 | 344359 | | 1
{31208,318790} | 31208 | 318790 | | 1
{31208,321034} | 31208 | 321034 | | 1
{31208,301609} | 31208 | 301609 | | 1
{31208,344339} | 31208 | 344339 | | 1
{31208,314087} | 31208 | 314087 | | 1
......
{31208,319951,3199647,31991191} | 3199647 | 31991191 | | 3
{31208,319951,3199647,31954904} | 3199647 | 31954904 | | 3
{31208,319951,3199647,31986691} | 3199647 | 31986691 | | 3
{31208,319951,3199647,31986448} | 3199647 | 31986448 | | 3
{31208,319951,3199647,31993624} | 3199647 | 31993624 | | 3
{31208,319951,3199647,31997771} | 3199647 | 31997771 | | 3
{31208,319951,3199647,31982764} | 3199647 | 31982764 | | 3
{31208,319951,3199647,31993420} | 3199647 | 31993420 | | 3
{31208,319951,3199647,31962666} | 3199647 | 31962666 | | 3
{31208,319951,3199647,31957536} | 3199647 | 31957536 | | 3
(300 rows)

2. UDF2, to Return Cursors

create or replace function graph_search2(    
IN i_root int, -- The node that the search is based on
IN i_res name, -- cursor name
IN i_depth int default 99999, -- the tier to search (the depth limit)
IN i_limit int8 default 2000000000, -- limit the number of records returned for each tier
IN i_weight float8 default 0 -- limit the weight
) returns refcursor as
$$

declare
sql text;
res refcursor := i_res;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- current edge property
depth, -- current depth, starting from 1
path -- path, stored using an array
) AS (
select c1,c2,prop,depth,path from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 depth, -- initial depth=1
ARRAY[g.c1] path -- initial path
FROM a AS g
WHERE
c1 = %s -- ROOT node =?
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,depth,path from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 depth, -- depth + 1
path || g.c1 path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= %s -- search depth =?
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
)
SELECT path||c2 as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, depth as o_depth
FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_root, i_weight, i_limit, i_depth, i_weight, i_limit
);

open res for execute sql;
return res;

end;
$$
language plpgsql strict;

Example:

postgres=# begin;    
BEGIN
postgres=# select * from graph_search2(31208, 'cur1', 3, 100, 0);
graph_search2
---------------
cur1
(1 row)

postgres=# fetch 10 from cur1;
o_path | o_point1 | o_point2 | o_link_prop | o_depth
----------------+----------+----------+-------------+---------
{31208,344710} | 31208 | 344710 | | 1
{31208,319951} | 31208 | 319951 | | 1
{31208,340938} | 31208 | 340938 | | 1
{31208,325272} | 31208 | 325272 | | 1
{31208,346519} | 31208 | 346519 | | 1
{31208,314594} | 31208 | 314594 | | 1
{31208,307217} | 31208 | 307217 | | 1
{31208,348009} | 31208 | 348009 | | 1
{31208,300046} | 31208 | 300046 | | 1
{31208,344359} | 31208 | 344359 | | 1
(10 rows)

postgres=# close cur1;
CLOSE CURSOR
postgres=# end;
COMMIT

3. UDF3, to Return the Shortest Path between Two Points

Return the shortest path between two points and limits the depth, beyond which it is considered inaccessible.

For example, search for the shortest path from 1 to 100.

create or replace function graph_search3(    
IN i_p1 int, -- The node that the search is based on
IN i_p2 int, -- cursor name
IN i_depth int default 99999, -- the tier to search (the depth limit)
OUT o_path int[], -- output: path, an array of IDs
OUT o_point1 int, -- output: point 1 ID
OUT o_point2 int, -- output: point 2 ID
OUT o_link_prop jsonb, -- output: the connection property between the current two points
OUT o_depth int -- output: current depth (tier)
) returns record as
$$

declare
sql text;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 depth, -- initial depth =1
ARRAY[g.c1] path -- initial path
FROM a AS g
WHERE
c1 = %s -- ROOT node =? -- (the start of the shortest path)
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path -- add a new point to the path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= %s -- search depth =?

)
SELECT
c1 as o_point1,
c2 as o_point2,
path as o_path,
prop as o_link_prop,
depth as o_depth
FROM search_graph
where c2 = %s -- the end of the shortest path
limit 1 -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_p1, i_depth, i_p2);

execute sql into o_point1,o_point2,o_path,o_link_prop,o_depth;
return;
end;
$$
language plpgsql strict;

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

Example

postgres=# select * from graph_search3(31208, 31957536, 3);  
o_path | o_point1 | o_point2 | o_link_prop | o_depth
------------------------+----------+----------+-------------+---------
{31208,319951,3199647} | 3199647 | 31957536 | | 3
(1 row)

4. UDF4, to Return the Shortest Path between Two Points, and Allow Setting the Weight Condition as a Filter Condition and Setting the Search Limit of Each Tier.

Return the shortest path between two points, and limit the depth and the search limit of each tier. If the depth is exceeded, it is considered inaccessible.

create or replace function graph_search4(    
IN i_p1 int, -- The node that the search is based on
IN i_p2 int, -- cursor name
IN i_depth int default 99999, -- the tier to search (the depth limit)
IN i_limit int8 default 2000000000, -- limit the number of records returned for each tier
IN i_weight float8 default 0, -- limit the weight
OUT o_path int[], -- output: path, an array of IDs
OUT o_point1 int, -- output: point 1 ID
OUT o_point2 int, -- output: point 2 ID
OUT o_link_prop jsonb, -- output: the connection property between the current two points
OUT o_depth int -- output: current depth (tier)
) returns record as
$$

declare
sql text;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
cycle -- a cycle or not
) AS (
select c1,c2,prop,depth,path,cycle from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
1 as depth, -- initial depth =1
ARRAY[g.c1] path, -- initial path
false as cycle -- a cycle or not (initially no)
FROM a AS g
WHERE
c1 = %s -- ROOT node =? -- (the start of the shortest path)
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
-- ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,depth,path,cycle from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.depth + 1 as depth, -- depth + 1
path || g.c1 as path, -- add a new point to the path
(g.c1 = ANY(sg.path)) as cycle -- a cycle or not, determining whether the new point is already in the previous path
FROM a AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c1 <> ALL(sg.path)) -- prevent from cycling
AND sg.depth <= %s -- search depth =?
AND coalesce((g.prop->>'weight')::float8,0) >= %s -- correlation weight
-- ORDER BY coalesce((g.prop->>'weight')::float8,0) desc -- ORDER BY can be used, for example, to return the n records with the highest weight.
limit %s -- How many records are limited at each tier?
) t
)
SELECT
c1 as o_point1,
c2 as o_point2,
path as o_path,
prop as o_link_prop,
depth as o_depth
FROM search_graph
where c2 = %s -- the end of the shortest path
limit 1 -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_p1, i_weight, i_limit, i_depth, i_weight, i_limit, i_p2);

execute sql into o_point1,o_point2,o_path,o_link_prop,o_depth;
return;
end;
$$
language plpgsql strict;

Example

postgres=# select * from graph_search4(31208, 31957536, 3, 1000, 0);  
o_path | o_point1 | o_point2 | o_link_prop | o_depth
------------------------+----------+----------+-------------+---------
{31208,319951,3199647} | 3199647 | 31957536 | | 3
(1 row)

Time: 7.900 ms

Summary

Using the CTE syntax of PostgreSQL, it is very convenient to implement the requirements of graph search, including N-depth search, shortest path search, path, point and edge property (the edge property is stored in JSON to facilitate the architecture design, the display, the control and display of the tier depth, the control of the number of returned results for each tier, and the control of the return sequence and weight of each tier).

In terms of performance, for the PG 10 ON ECS environment, 5 billion point-edge network, N-depth search, and shortest path search, the response times are all in milliseconds (for the 3-depth search with a limit of 100 records per tier, the response time is only 2.1 milliseconds, and the TPS reaches 12,000).

The preceding query can be encapsulated into the plpgsql UDF interface of PostgreSQL to facilitate business calls (only some input conditions need to be exposed).

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store