PostgreSQL Graph Search Practices — 10 Billion-Scale Graph with Millisecond Response

17 min readAug 8, 2019

By Digoal

Background

Graph search is one of many features of PostgreSQL (including StreamCompute, full-text search, schema search, K-V storage, image search, fingerprint search, spatial data, time series data, recommendations, and so on).

Using the CTE syntax, you can easily implement the graph search (N-depth search, shortest path, point, edge property, and so on).

In the graph search, the depth of the tier, whether it is a cycle, and the path, can all be expressed.

Example

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

On this basis, the demo is as follows:

Implementing the requirements, such as N-depth search, edge property view, and shortest path search.
Removing cycle points, controlling the depth, and displaying paths.
Generating drawing data.

Create 5 Billion Entries of Test Data

10 million users are created, every 50,000 users are considered as an associated group, and each user is associated with 500 users on average, forming a large-scale network of 5 billion users.

1. Create a table with the following structure. Points and edges can be described.

create table a(      
  c1 int,                -- point 1      
  c2 int,                -- point 2      
  prop jsonb,            -- The properties of the edges corresponding to points 1 and 2 are stored in JSON, including weights, relations and so on.      
  primary key (c1,c2)    -- primary key      
);      
      
create index idx_a_2 on a(c1, COALESCE(((prop ->> 'weight'::text))::float8, 0));

2. Generate test data.

vi test.sql      
      
\set id random(1,10000000)      
insert into a select :id, ((width_bucket(:id,1,10000000,2000)-1)*50000 + (random()*50000)::int) from generate_series(1,1000) on conflict (c1,c2) do nothing;      
pgbench -M prepared -n -r -P 5 -f ./test.sql -c 50 -j 50 -t 100000

3. About 340 GB of data.

Removing Cyclic Points, Controlling the Depth, and Displaying Paths

1. When a point occurs repeatedly in the path, a cycle occurs.

2. For each recursion, add 1 to the depth.

3. Use an array to store path. Single-column arrays, or multiple-column arrays (row arrays). For multi-column array paths, please see: https://www.postgresql.org/docs/10/static/queries-with.html

The SQL is as follows:

WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 as depth,        -- initial depth =1      
          ARRAY[g.c1] as path  -- initial path         
        FROM a AS g       
        WHERE       
          c1 = ?         -- ROOT node =?      
      UNION ALL      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path   -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
          AND sg.depth <= ?    -- search depth =?         
)      
SELECT * FROM search_graph;    -- query a recursive table. You can add LIMIT output or use a cursor

N-depth Search

For n-depth search, enter sg.depth <= N as above SQL.

For example, search for the 3-depth data with ROOT=31208.

WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 as depth,        -- initial depth =1      
          ARRAY[g.c1] as path  -- initial path         
        FROM a AS g       
        WHERE       
          c1 = 31208         -- ROOT node =?      
      UNION ALL      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path   -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
          AND sg.depth <= 3    -- search depth =?         
)      
SELECT * FROM search_graph;    -- query a recursive table. You can add LIMIT output or use a cursor

Shortest Path

Remove the search depth, and add WHERE condition (filter C2) and LIMIT 1 to the statements querying the recursive table.

The SQL is as follows:

WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 as depth,        -- initial depth =1      
          ARRAY[g.c1] as path  -- initial path         
        FROM a AS g       
        WHERE       
          c1 = ?         -- ROOT node =?      
      UNION ALL      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path   -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
--          AND sg.depth <= ?    -- search depth =?   It can also be retained to prevent the search from being too deep and thus affecting the performance. For example, the search will not return after a depth of 10        
)      
SELECT * FROM search_graph      
  where c2 = ?   -- End of the shortest path      
  limit 1;       -- query a recursive table. You can add LIMIT output or use a cursor

For example, search for the shortest path from 1 to 100.

WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 as depth,        -- initial depth =1      
          ARRAY[g.c1] as path  -- initial path         
        FROM a AS g       
        WHERE       
          c1 = 1         -- ROOT node =?      
      UNION ALL      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path   -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
--          AND sg.depth <= ?    -- search depth =? It can also be retained to prevent the search from being too deep and thus affecting the performance. For example, the search will not return after a depth of 10        
)      
SELECT * FROM search_graph      
  where c2 = 100   -- end of the shortest path      
  limit 1;       -- query a recursive table. You can add LIMIT output or use a cursor

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

How to Generate Drawing Data

To speed up the response, use the cursor to return.

begin;      
declare cur1 cursor for $query;      
FETCH 1000 from cur1;      
....      
close cur1;      
end;

The response time is fast, reaching the millisecond-level.

Control the Number of Returned Records for Each Tier

The deeper the tier is, the more records are returned. In fact, in graph search, it is not necessary to return all records of each tier (for example, only the top N records with high correlation, or the top N records with weight greater than a specific value are returned), thus controlling the number of records of each tier.

WITH RECURSIVE search_graph(      
  c1,     -- point 1      
  c2,     -- point 2      
  prop,   -- edge property      
  depth,  -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        select c1,c2,prop,depth,path,cycle from (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 depth,        -- initial depth =1      
          ARRAY[g.c1] path   -- initial path      
        FROM a AS g       
        WHERE       
          c1 = ?             -- ROOT node =?      
          -- AND coalesce((prop->>'weight')::float8,0) >= ?        -- correlation weight      
          -- ORDER BY coalesce((prop->>'weight')::float8,0) desc   -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit ?            -- How many records are limited at each tier?      
        ) t      
      UNION ALL      
        select c1,c2,prop,depth,path,cycle from (      
        SELECT     -- recursive clause       
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 depth,    -- depth + 1      
          path || g.c1 path    -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
          AND sg.depth <= ?    -- search depth =?        
          -- AND coalesce((prop->>'weight')::float8,0) >= ?   -- correlation weight      
          -- ORDER BY coalesce((prop->>'weight')::float8,0) desc   -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit ?            -- How many records are limited at each tier?                 
        ) t      
)      
SELECT * FROM search_graph;    -- query a recursive table. You can add LIMIT output or use a cursor

For example, search for 3-depth data with root = 31208, and limit the number of records returned per tier to 100.

WITH RECURSIVE search_graph(      
  c1,     -- point 1      
  c2,     -- point 2      
  prop,   -- edge property      
  depth,  -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        select c1,c2,prop,depth,path from (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 depth,        -- initial depth =1      
          ARRAY[g.c1] path   -- initial path      
        FROM a AS g       
        WHERE       
          c1 = 31208            -- ROOT node =?      
          -- AND coalesce((prop->>'weight')::float8,0) >= ?        -- correlation weight      
          -- ORDER BY coalesce((prop->>'weight')::float8,0) desc   -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit ?            -- How many records are limited at each tier?      
        ) t      
      UNION ALL      
        select c1,c2,prop,depth,path from (      
        SELECT     -- recursive clause       
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 depth,    -- depth + 1      
          path || g.c1 path    -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
          AND sg.depth <= 3    -- search depth =?        
          -- AND coalesce((prop->>'weight')::float8,0) >= ?   -- correlation weight      
          -- ORDER BY coalesce((prop->>'weight')::float8,0) desc   -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit 100            -- the number of records limited per tier                 
        ) t      
)      
SELECT * FROM search_graph;    -- query a recursive table. You can add LIMIT output or use a cursor     
   c1    |    c2    | prop | depth |          path          | cycle       
---------+----------+------+-------+------------------------+-------      
   31208 |   300008 |      |     1 | {31208}                | f      
   31208 |   300040 |      |     1 | {31208}                | f      
   31208 |   300046 |      |     1 | {31208}                | f      
   31208 |   300050 |      |     1 | {31208}                | f      
   31208 |   300061 |      |     1 | {31208}                | f      
   31208 |   300082 |      |     1 | {31208}                | f      
   31208 |   300093 |      |     1 | {31208}                | f      
.................  
 3032152 | 30347906 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30300272 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30316175 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30309844 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30336508 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30322201 |      |     3 | {31208,300008,3032152} | f      
 3032152 | 30312579 |      |     3 | {31208,300008,3032152} | f      
(300 rows)      
      
Time: 3.245 ms

The response speed is 3.2 milliseconds. (The reason is very simple. Each tier hits an index, combined with the cluster feature of PG (stored by c1), so the number of blocks can be reduced and performance can be improved further)

QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------    
 CTE Scan on search_graph  (cost=25460.78.. 25482.78 rows=1100 width=77) (actual time=0.044.. 2.009 rows=300 loops=1)    
   Output: search_graph.c1, search_graph.c2, search_graph.prop, search_graph.depth, search_graph.path, search_graph.cycle    
   Buffers: shared hit=522    
   CTE search_graph    
     ->  Recursive Union  (cost=0.58.. 25460.78 rows=1100 width=77) (actual time=0.042.. 1.755 rows=300 loops=1)    
           Buffers: shared hit=522    
           ->  Limit  (cost=0.58.. 402.52 rows=100 width=77) (actual time=0.040.. 0.183 rows=100 loops=1)    
                 Output: g.c1, g.c2, g.prop, 1, (ARRAY[g.c1]), false    
                 Buffers: shared hit=97    
                 ->  Index Scan using a_pkey on public.a g  (cost=0.58.. 393024.56 rows=97782 width=77) (actual time=0.038.. 0.166 rows=100 loops=1)    
                       Output: g.c1, g.c2, g.prop, 1, ARRAY[g.c1], false    
                       Index Cond: (g.c1 = 31208)    
                       Buffers: shared hit=97    
           ->  Limit  (cost=2249.76.. 2502.53 rows=100 width=77) (actual time=0.372.. 0.473 rows=67 loops=3)    
                 Output: g_1.c1, g_1.c2, g_1.prop, ((sg.depth + 1)), ((sg.path || g_1.c1)), ((g_1.c1 = ANY (sg.path)))    
                 Buffers: shared hit=425    
                 ->  Nested Loop  (cost=2249.76.. 41872589.09 rows=16564685 width=77) (actual time=0.372.. 0.462 rows=67 loops=3)    
                       Output: g_1.c1, g_1.c2, g_1.prop, (sg.depth + 1), (sg.path || g_1.c1), (g_1.c1 = ANY (sg.path))    
                       Buffers: shared hit=425    
                       ->  WorkTable Scan on search_graph sg  (cost=0.00.. 22.50 rows=167 width=40) (actual time=0.001.. 0.011 rows=35 loops=3)    
                             Output: sg.c1, sg.c2, sg.prop, sg.depth, sg.path, sg.cycle    
                             Filter: ((NOT sg.cycle) AND (sg.depth <= 3))    
                       ->  Bitmap Heap Scan on public.a g_1  (cost=2249.76.. 248006.21 rows=99190 width=40) (actual time=0.010.. 0.010 rows=2 loops=104)    
                             Output: g_1.c1, g_1.c2, g_1.prop    
                             Recheck Cond: (g_1.c1 = sg.c2)    
                             Heap Blocks: exact=3    
                             Buffers: shared hit=425    
                             ->  Bitmap Index Scan on a_pkey  (cost=0.00.. 2224.96 rows=99190 width=0) (actual time=0.009.. 0.009 rows=19 loops=104)    
                                   Index Cond: (g_1.c1 = sg.c2)    
                                   Buffers: shared hit=422    
 Planning time: 0.436 ms    
 Execution time: 2.128 ms    
(32 rows)    
    
Time: 3.301 ms

For stress testing, TPS is 12,000, and the average response time is 5.2 milliseconds.

transaction type: ./test.sql    
scaling factor: 1    
query mode: prepared    
number of clients: 64    
number of threads: 64    
duration: 120 s    
number of transactions actually processed: 1463760    
latency average = 5.239 ms    
latency stddev = 1.171 ms    
tps = 12196.876360 (including connections establishing)    
tps = 12201.718092 (excluding connections establishing)    
script statistics:    
 - statement latencies in milliseconds:    
         5.295  WITH RECURSIVE search_graph(

Examples of Graph Search UDF Encapsulation

Encapsulating lengthy SQL statements into UDF can simplify the interfaces called by applications.

1. UDF1, to Return Records

create or replace function graph_search1(    
  IN i_root int,                       -- The node that the search is based on      
  IN i_depth int  default 99999,       -- the tier to search (the depth limit)    
  IN i_limit int8 default 2000000000,  -- limit the number of records returned for each tier    
  IN i_weight float8 default 0,        -- limit the weight    
  OUT o_path int[],                    -- output: path, an array of IDs    
  OUT o_point1 int,                    -- output: point 1 ID    
  OUT o_point2 int,                    -- output: point 2 ID    
  OUT o_link_prop jsonb,               -- output: the connection property between the current two points    
  OUT o_depth int                      -- output: current depth (tier)    
) returns setof record as 
$$
    
declare    
  sql text;    
begin    
sql := format($_$    
WITH RECURSIVE search_graph(      
  c1,     -- point 1      
  c2,     -- point 2      
  prop,   -- current edge property       
  depth,  -- current depth, starting from 1       
  path  -- path, stored using an array       
) AS (      
        select c1,c2,prop,depth,path from (      
        SELECT                               -- ROOT node query      
          g.c1,                              -- point 1      
          g.c2,                              -- point 2      
          g.prop,                            -- edge property      
          1 depth,                           -- initial depth=1      
          ARRAY[g.c1] path                  -- initial path      
        FROM a AS g       
        WHERE       
          c1 = %s                                                    -- ROOT node =?      
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?      
        ) t      
      UNION ALL      
        select c1,c2,prop,depth,path from (      
        SELECT                               -- recursive clause       
          g.c1,                              -- point 1      
          g.c2,                              -- point 2      
          g.prop,                            -- edge property      
          sg.depth + 1 depth,                -- depth + 1      
          path || g.c1 path                 -- add a new point to the path      
        FROM a AS g, search_graph AS sg      -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2                       -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))                      -- prevent from cycling      
          AND sg.depth <= %s                 -- search depth =?        
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?                 
        ) t      
)      
SELECT path||c2 as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, depth as o_depth    
FROM search_graph;                           -- query a recursive table. You can add LIMIT output or use a cursor     
$_$, i_root, i_weight, i_limit, i_depth, i_weight, i_limit    
);    
    
return query execute sql;    
    
end;    $$
 language plpgsql strict;

Example:

postgres=# select * from graph_search1(31208, 3, 100, 0);    
             o_path              | o_point1 | o_point2 | o_link_prop | o_depth     
---------------------------------+----------+----------+-------------+---------    
 {31208,344710}                  |    31208 |   344710 |             |       1    
 {31208,319951}                  |    31208 |   319951 |             |       1    
 {31208,340938}                  |    31208 |   340938 |             |       1    
 {31208,325272}                  |    31208 |   325272 |             |       1    
 {31208,346519}                  |    31208 |   346519 |             |       1    
 {31208,314594}                  |    31208 |   314594 |             |       1    
 {31208,307217}                  |    31208 |   307217 |             |       1    
 {31208,348009}                  |    31208 |   348009 |             |       1    
 {31208,300046}                  |    31208 |   300046 |             |       1    
 {31208,344359}                  |    31208 |   344359 |             |       1    
 {31208,318790}                  |    31208 |   318790 |             |       1    
 {31208,321034}                  |    31208 |   321034 |             |       1    
 {31208,301609}                  |    31208 |   301609 |             |       1    
 {31208,344339}                  |    31208 |   344339 |             |       1    
 {31208,314087}                  |    31208 |   314087 |             |       1    
......    
 {31208,319951,3199647,31991191} |  3199647 | 31991191 |             |       3    
 {31208,319951,3199647,31954904} |  3199647 | 31954904 |             |       3    
 {31208,319951,3199647,31986691} |  3199647 | 31986691 |             |       3    
 {31208,319951,3199647,31986448} |  3199647 | 31986448 |             |       3    
 {31208,319951,3199647,31993624} |  3199647 | 31993624 |             |       3    
 {31208,319951,3199647,31997771} |  3199647 | 31997771 |             |       3    
 {31208,319951,3199647,31982764} |  3199647 | 31982764 |             |       3    
 {31208,319951,3199647,31993420} |  3199647 | 31993420 |             |       3    
 {31208,319951,3199647,31962666} |  3199647 | 31962666 |             |       3    
 {31208,319951,3199647,31957536} |  3199647 | 31957536 |             |       3    
(300 rows)

2. UDF2, to Return Cursors

create or replace function graph_search2(    
  IN i_root int,                       -- The node that the search is based on      
  IN i_res name,                       -- cursor name    
  IN i_depth int  default 99999,       -- the tier to search (the depth limit)    
  IN i_limit int8 default 2000000000,  -- limit the number of records returned for each tier    
  IN i_weight float8 default 0         -- limit the weight    
) returns refcursor as 
$$
    
declare    
  sql text;    
  res refcursor := i_res;    
begin    
sql := format($_$    
WITH RECURSIVE search_graph(      
  c1,     -- point 1      
  c2,     -- point 2      
  prop,   -- current edge property       
  depth,  -- current depth, starting from 1       
  path  -- path, stored using an array       
) AS (      
        select c1,c2,prop,depth,path from (      
        SELECT                               -- ROOT node query      
          g.c1,                              -- point 1      
          g.c2,                              -- point 2      
          g.prop,                            -- edge property      
          1 depth,                           -- initial depth=1      
          ARRAY[g.c1] path                  -- initial path      
        FROM a AS g       
        WHERE       
          c1 = %s                                                    -- ROOT node =?      
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?      
        ) t      
      UNION ALL      
        select c1,c2,prop,depth,path from (      
        SELECT                               -- recursive clause       
          g.c1,                              -- point 1      
          g.c2,                              -- point 2      
          g.prop,                            -- edge property      
          sg.depth + 1 depth,                -- depth + 1      
          path || g.c1 path                 -- add a new point to the path      
        FROM a AS g, search_graph AS sg      -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2                       -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))                      -- prevent from cycling      
          AND sg.depth <= %s                 -- search depth =?        
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?                 
        ) t      
)      
SELECT path||c2 as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, depth as o_depth    
FROM search_graph;                           -- query a recursive table. You can add LIMIT output or use a cursor     
$_$, i_root, i_weight, i_limit, i_depth, i_weight, i_limit    
);    
    
open res for execute sql;    
return res;    
    
end;    $$
 language plpgsql strict;

Example:

postgres=# begin;    
BEGIN    
postgres=# select * from graph_search2(31208, 'cur1', 3, 100, 0);    
 graph_search2     
---------------    
 cur1    
(1 row)    
    
postgres=# fetch 10 from cur1;    
     o_path     | o_point1 | o_point2 | o_link_prop | o_depth     
----------------+----------+----------+-------------+---------    
 {31208,344710} |    31208 |   344710 |             |       1    
 {31208,319951} |    31208 |   319951 |             |       1    
 {31208,340938} |    31208 |   340938 |             |       1    
 {31208,325272} |    31208 |   325272 |             |       1    
 {31208,346519} |    31208 |   346519 |             |       1    
 {31208,314594} |    31208 |   314594 |             |       1    
 {31208,307217} |    31208 |   307217 |             |       1    
 {31208,348009} |    31208 |   348009 |             |       1    
 {31208,300046} |    31208 |   300046 |             |       1    
 {31208,344359} |    31208 |   344359 |             |       1    
(10 rows)    
    
postgres=# close cur1;    
CLOSE CURSOR    
postgres=# end;    
COMMIT

3. UDF3, to Return the Shortest Path between Two Points

Return the shortest path between two points and limits the depth, beyond which it is considered inaccessible.

For example, search for the shortest path from 1 to 100.

create or replace function graph_search3(    
  IN i_p1 int,                       -- The node that the search is based on      
  IN i_p2 int,                       -- cursor name      
  IN i_depth int  default 99999,     -- the tier to search (the depth limit)      
  OUT o_path int[],                    -- output: path, an array of IDs    
  OUT o_point1 int,                    -- output: point 1 ID    
  OUT o_point2 int,                    -- output: point 2 ID    
  OUT o_link_prop jsonb,               -- output: the connection property between the current two points    
  OUT o_depth int                      -- output: current depth (tier)    
) returns record as 
$$
    
declare    
  sql text;    
begin    
sql := format($_$    
WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
) AS (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 depth,        -- initial depth =1      
          ARRAY[g.c1] path   -- initial path      
        FROM a AS g       
        WHERE       
          c1 = %s         -- ROOT node =?      -- (the start of the shortest path)      
      UNION ALL      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path    -- add a new point to the path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2                      -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))        -- prevent from cycling      
          AND sg.depth <= %s    -- search depth =?      
  
)      
SELECT   
  c1 as o_point1,  
  c2 as o_point2,   
  path as o_path,  
  prop as o_link_prop,  
  depth as o_depth  
FROM search_graph      
  where c2 = %s   -- the end of the shortest path      
  limit 1         -- query a recursive table. You can add LIMIT output or use a cursor      
$_$, i_p1, i_depth, i_p2);  
  
execute sql into o_point1,o_point2,o_path,o_link_prop,o_depth;  
return;  
end;  $$
 language plpgsql strict;

To control the depth, for example, if you can not find the record within a depth of 5, you can simply add the search depth conditions.

Example

postgres=# select * from graph_search3(31208, 31957536, 3);  
         o_path         | o_point1 | o_point2 | o_link_prop | o_depth   
------------------------+----------+----------+-------------+---------  
 {31208,319951,3199647} |  3199647 | 31957536 |             |       3  
(1 row)

4. UDF4, to Return the Shortest Path between Two Points, and Allow Setting the Weight Condition as a Filter Condition and Setting the Search Limit of Each Tier.

Return the shortest path between two points, and limit the depth and the search limit of each tier. If the depth is exceeded, it is considered inaccessible.

create or replace function graph_search4(    
  IN i_p1 int,                       -- The node that the search is based on      
  IN i_p2 int,                       -- cursor name      
  IN i_depth int  default 99999,     -- the tier to search (the depth limit)      
  IN i_limit int8 default 2000000000,  -- limit the number of records returned for each tier    
  IN i_weight float8 default 0,        -- limit the weight    
  OUT o_path int[],                    -- output: path, an array of IDs    
  OUT o_point1 int,                    -- output: point 1 ID    
  OUT o_point2 int,                    -- output: point 2 ID    
  OUT o_link_prop jsonb,               -- output: the connection property between the current two points    
  OUT o_depth int                      -- output: current depth (tier)    
) returns record as 
$$
    
declare    
  sql text;    
begin    
sql := format($_$    
WITH RECURSIVE search_graph(      
  c1,   -- point 1      
  c2,   -- point 2      
  prop, -- edge property      
  depth, -- depth, starting from 1      
  path  -- path, stored using an array      
  cycle  -- a cycle or not      
) AS (      
        select c1,c2,prop,depth,path,cycle from (      
        SELECT    -- ROOT node query      
          g.c1,   -- point 1      
          g.c2,   -- point 2      
          g.prop,   -- edge property      
          1 as depth,        -- initial depth =1      
          ARRAY[g.c1] path,   -- initial path      
          false as cycle         -- a cycle or not (initially no)      
        FROM a AS g       
        WHERE       
          c1 = %s         -- ROOT node =?      -- (the start of the shortest path)      
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          -- ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?   
        ) t      
      UNION ALL      
        select c1,c2,prop,depth,path,cycle from (      
        SELECT     -- recursive clause      
          g.c1,    -- point 1      
          g.c2,    -- point 2      
          g.prop,          -- edge property      
          sg.depth + 1 as depth,    -- depth + 1      
          path || g.c1 as path,    -- add a new point to the path      
          (g.c1 = ANY(sg.path)) as cycle    -- a cycle or not, determining whether the new point is already in the previous path      
        FROM a AS g, search_graph AS sg   -- circular INNER JOIN      
        WHERE       
          g.c1 = sg.c2         -- recursive JOIN condition      
          AND (g.c1 <> ALL(sg.path))       -- prevent from cycling      
          AND sg.depth <= %s    -- search depth =?      
          AND coalesce((g.prop->>'weight')::float8,0) >= %s          -- correlation weight      
          -- ORDER BY coalesce((g.prop->>'weight')::float8,0) desc      -- ORDER BY can be used, for example, to return the n records with the highest weight.      
          limit %s                           -- How many records are limited at each tier?         
        ) t      
)      
SELECT   
  c1 as o_point1,  
  c2 as o_point2,   
  path as o_path,  
  prop as o_link_prop,  
  depth as o_depth  
FROM search_graph      
  where c2 = %s   -- the end of the shortest path      
  limit 1         -- query a recursive table. You can add LIMIT output or use a cursor      
$_$, i_p1, i_weight, i_limit, i_depth, i_weight, i_limit, i_p2);  
  
execute sql into o_point1,o_point2,o_path,o_link_prop,o_depth;  
return;  
end;  $$
 language plpgsql strict;

Example

postgres=# select * from graph_search4(31208, 31957536, 3, 1000, 0);  
         o_path         | o_point1 | o_point2 | o_link_prop | o_depth   
------------------------+----------+----------+-------------+---------  
 {31208,319951,3199647} |  3199647 | 31957536 |             |       3  
(1 row)  
  
Time: 7.900 ms

Summary

Using the CTE syntax of PostgreSQL, it is very convenient to implement the requirements of graph search, including N-depth search, shortest path search, path, point and edge property (the edge property is stored in JSON to facilitate the architecture design, the display, the control and display of the tier depth, the control of the number of returned results for each tier, and the control of the return sequence and weight of each tier).

In terms of performance, for the PG 10 ON ECS environment, 5 billion point-edge network, N-depth search, and shortest path search, the response times are all in milliseconds (for the 3-depth search with a limit of 100 records per tier, the response time is only 2.1 milliseconds, and the TPS reaches 12,000).

The preceding query can be encapsulated into the plpgsql UDF interface of PostgreSQL to facilitate business calls (only some input conditions need to be exposed).

PostgreSQL Graph Search Practices — 10 Billion-Scale Graph with Millisecond Response

Background

Example

Create 5 Billion Entries of Test Data

Removing Cyclic Points, Controlling the Depth, and Displaying Paths

N-depth Search

Shortest Path

How to Generate Drawing Data

Control the Number of Returned Records for Each Tier

Examples of Graph Search UDF Encapsulation

1. UDF1, to Return Records

2. UDF2, to Return Cursors

3. UDF3, to Return the Shortest Path between Two Points

4. UDF4, to Return the Shortest Path between Two Points, and Allow Setting the Weight Condition as a Filter Condition and Setting the Search Limit of Each Tier.

Summary

Original Source

PostgreSQL Graph Search Practices - 10 Billion-Scale Graph with Millisecond Response

digoal July 4, 2019 102 Graph search is one of many features of PostgreSQL (including StreamCompute, full-text search…

Written by Alibaba Cloud

No responses yet