PostgreSQL Family Tree Application Practices — Graph Relation Storage and Search

By Digoal

Background

In a recent episode of the Super Brain (The Brain) show, a contestant from Google showed a system built when working for Google and displayed relationships among figures during the Three Kingdoms period. This is a typical graph application scenario.

Image for post

PostgreSQL is perfect for these scenarios due to its various SQL interfaces and excellent performance.

Similar to the preceding scenario, this document from one of the community member explains how to store family relationships and quickly extract N-tier information. This is similar to the article on social user relationships and risk control enterprise relationships that we wrote about before.

Design Table Structure

1. Create a personal information table that describes personal details such as year and month of birth, address, and city. Because not all personal information can be listed, JSON can be used to show information that is not mentioned. This is very useful.

create table tbl_p_detail  -- Personal information  
(
id int primary key, -- Family member ID
info jsonb, -- Family member description
crt_time timestamp -- Creation time
);

2. Create a relationship metadata table that stores relationships such as father, mother, husband, wife, son, daughter, adopted daughter, stepfather, godfather, and goddaughter.

create table tbl_er_desc  -- Relationship description  
(
id int2 primary key, -- Relationship ID
info text -- Description
);

3. Create a relationship table. In this example, we use two-way redundant storage to ensure the query accuracy (or simplify query statements). For example, store information about a father and a son as two entries.

create table tbl_er     -- What is the family role of ID1 to ID2      
(
c1 int references tbl_p_detail(id),
c2 int references tbl_p_detail(id),
prop int2[], -- Multiple relationships may exist. Therefore, we use array storage. This is an edge. Of course, we can also use JSON to store edges. For more information, refer to another article that I have written.
crt_time timestamp,
check (c1<>c2),
unique (c1,c2)
-- FOREIGN KEY (EACH ELEMENT OF prop) REFERENCES tbl_er_desc(id) -- Array foreign key, which is supported for PostgreSQL 11. The performance is good
);

4. Create indexes to implement acceleration

create index idx_tbl_er_c1 on tbl_er(c1);  
create index idx_tbl_er_c2 on tbl_er(c2);

5. Write some data into the tables

Image for post
insert into tbl_p_detail select generate_series(1,10000);  
insert into tbl_er values (1,2,array[10],now()); -- For example, 1 is the father of 2
insert into tbl_er values (2,1,array[9],now()); -- For example, 2 is the daughter of 1

insert into tbl_er values (1,3,array[10],now()); -- For example, 1 is the father of 3
insert into tbl_er values (3,1,array[9],now()); -- For example, 3 is the daughter of 1

insert into tbl_er values (5,2,array[11],now()); -- For example, 5 is the mother of 2
insert into tbl_er values (2,5,array[9],now()); -- For example, 2 is the daughter of 5

insert into tbl_er values (5,3,array[11],now()); -- For example, 5 is the mother of 3
insert into tbl_er values (3,5,array[9],now()); -- For example, 3 is the daughter of 5


insert into tbl_er values (4,1,array[10],now()); -- For example, 4 is the father of 1
insert into tbl_er values (1,4,array[8],now()); -- For example, 1 is the son of 4

insert into tbl_er values (6,5,array[10],now()); -- For example, 6 is the father of 5
insert into tbl_er values (5,6,array[9],now()); -- For example, 5 is the daughter of 6

insert into tbl_er values (7,1,array[11],now()); -- For example, 7 is the mother of 1
insert into tbl_er values (1,7,array[8],now()); -- For example, 1 is the son of 7

insert into tbl_er values (8,5,array[11],now()); -- For example, 8 is the mother of 5
insert into tbl_er values (5,8,array[9],now()); -- For example, 5 is the daughter of 8

Define Search Functions

1. Search for a family member’s N-tier relationship data and specify the limit on the records per tier. (Generally the volume of family data is not very large. Therefore we can simply limit the tier. It does not even matter if all results for each tier are returned.)

create or replace function graph_search1(      
IN i_root int, -- The node that the search is based on
IN i_depth int default 99999, -- the tier to search (the depth limit)
IN i_limit int8 default 2000000000, -- limit the number of records returned for each tier
OUT o_path int[], -- output: path, an array of IDs
OUT o_point1 int, -- output: point 1 ID
OUT o_point2 int, -- output: point 2 ID
OUT o_link_prop int2[], -- output: the connection property between the two current points
OUT o_link_prop_all text, -- output: the connection property from the starting node to the current node
OUT o_depth int -- output: current depth (tier)
) returns setof record as
$$

declare
sql text;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- current edge property
all_prop, -- properties of all edges
depth, -- current depth, starting from 1
path -- path, stored as an array
) AS (
select c1,c2,prop,all_prop,depth,path from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
g.prop::text as all_prop, -- properties of all edges
1 depth, -- initial depth=1
ARRAY[g.c1, g.c2] path -- initial path
FROM tbl_er AS g
WHERE
c1 = %s -- ROOT node=?
limit %s -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,all_prop,depth,path from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.all_prop || g.prop::text as all_prop, -- properties of all edges
sg.depth + 1 depth, -- depth +1
sg.path || g.c2 path -- Add a new point to the path
FROM tbl_er AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c2 <> ALL(sg.path)) -- Prevent loop, determine whether it is a loop and judge if the new point is already in the previous path
AND sg.depth <= %s -- search depth =?
limit %s -- How many records are limited at each tier?
) t
)
SELECT path as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, all_prop as o_link_prop_all, depth as o_depth
FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_root, i_limit, i_depth, i_limit
);

return query execute sql;

end;
$$
language plpgsql strict;

Example

postgres=# select * from graph_search1(1);  
o_path | o_point1 | o_point2 | o_link_prop | o_link_prop_all | o_depth
-----------+----------+----------+-------------+-----------------+---------
{1,2} | 1 | 2 | {10} | {10} | 1
{1,3} | 1 | 3 | {10} | {10} | 1
{1,4} | 1 | 4 | {8} | {8} | 1
{1,7} | 1 | 7 | {8} | {8} | 1
{1,2,5} | 2 | 5 | {9} | {10}{9} | 2
{1,3,5} | 3 | 5 | {9} | {10}{9} | 2
{1,2,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
{1,2,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,2,5,3} | 5 | 3 | {11} | {10}{9}{11} | 3
{1,3,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
{1,3,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,3,5,2} | 5 | 2 | {11} | {10}{9}{11} | 3
(12 rows)

Time: 1.120 ms

2. Define a similar search function and return cursors. (Because number of relationships on a family pedigree is not very large, a cursor is not necessary. However, it is recommended to use cursors to return results for a social pedigree chart, which usually has a large volume of data.)

create or replace function graph_search2(      
IN i_root int, -- The node that the search is based on
IN i_res name, -- cursor name
IN i_depth int default 99999, -- the tier to search (the depth limit)
IN i_limit int8 default 2000000000 -- Limit the number of records returned for each tier
) returns refcursor as
$$

declare
sql text;
res refcursor := i_res;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- current edge property
all_prop, -- properties of all edges
depth, -- current depth, starting from 1
path -- path, stored using an array
) AS (
select c1,c2,prop,all_prop,depth,path from (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
g.prop::text as all_prop, -- properties of all edges
1 depth, -- initial depth=1
ARRAY[g.c1, g.c2] path -- initial path
FROM tbl_er AS g
WHERE
c1 = %s -- ROOT node=?
limit %s -- How many records are limited at each tier?
) t
UNION ALL
select c1,c2,prop,all_prop,depth,path from (
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
g.prop, -- edge property
sg.all_prop || g.prop::text as all_prop, -- properties of all edges
sg.depth + 1 depth, -- depth + 1
sg.path || g.c2 path -- Add a new point to the path
FROM tbl_er AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c2 <> ALL(sg.path)) -- Prevent loop, determine whether it is a loop and judge if the new point is already in the previous path
AND sg.depth <= %s -- search depth =?
limit %s -- How many records are limited at each tier?
) t
)
SELECT path as o_path, c1 as o_point1, c2 as o_point2, prop as o_link_prop, all_prop as o_link_prop_all, depth as o_depth
FROM search_graph; -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_root, i_limit, i_depth, i_limit
);

open res for execute sql;
return res;

end;
$$
language plpgsql strict;

Example

postgres=# begin;  
BEGIN
Time: 0.096 ms
postgres=# select * from graph_search2(1,'a');
graph_search2
---------------
a
(1 row)

Time: 1.110 ms
postgres=# fetch 10 from a;
o_path | o_point1 | o_point2 | o_link_prop | o_link_prop_all | o_depth
-----------+----------+----------+-------------+-----------------+---------
{1,2} | 1 | 2 | {10} | {10} | 1
{1,3} | 1 | 3 | {10} | {10} | 1
{1,4} | 1 | 4 | {8} | {8} | 1
{1,7} | 1 | 7 | {8} | {8} | 1
{1,2,5} | 2 | 5 | {9} | {10}{9} | 2
{1,3,5} | 3 | 5 | {9} | {10}{9} | 2
{1,2,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
{1,2,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,2,5,3} | 5 | 3 | {11} | {10}{9}{11} | 3
{1,3,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
(10 rows)

Time: 0.256 ms

postgres=# fetch 10 from a;
o_path | o_point1 | o_point2 | o_link_prop | o_link_prop_all | o_depth
-----------+----------+----------+-------------+-----------------+---------
{1,3,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,3,5,2} | 5 | 2 | {11} | {10}{9}{11} | 3
(2 rows)

Time: 0.103 ms

3. Define the shortest path between two points. For example, search for the relationship between family members A and B When the relationship exceeds the N tier, no results are returned to avoid long-time searches. (We can also define a timeout statement to exit from the search when the execution exceeds N seconds.)

create or replace function graph_search3(      
IN i_p1 int, -- point 1
IN i_p2 int, -- point 2
IN i_depth int default 99999, -- the tier to search (the depth limit)
OUT o_path int[], -- output: path, an array of IDs
OUT o_link_prop text, -- output: the connection property between the two current points
OUT o_depth int -- output: current depth (tier)
) returns record as
$$

declare
sql text;
begin
sql := format($_$
WITH RECURSIVE search_graph(
c1, -- point 1
c2, -- point 2
prop, -- edge property
depth, -- depth, starting from 1
path -- path, stored using an array
) AS (
SELECT -- ROOT node query
g.c1, -- point 1
g.c2, -- point 2
g.prop::text, -- edge property
1 depth, -- initial depth =1
ARRAY[g.c1, g.c2] path -- initial path
FROM tbl_er AS g
WHERE
c1 = %s -- ROOT node =? -- (the start of the shortest path)
UNION ALL
SELECT -- recursive clause
g.c1, -- point 1
g.c2, -- point 2
sg.prop::text || g.prop::text, -- edge property
sg.depth + 1 as depth, -- depth + 1
sg.path || g.c2 path -- Add a new point to the path
FROM tbl_er AS g, search_graph AS sg -- circular INNER JOIN
WHERE
g.c1 = sg.c2 -- recursive JOIN condition
AND (g.c2 <> ALL(sg.path)) -- Prevent loop, determine whether it is a loop and judge if the new point is already in the previous path
AND sg.depth <= %s -- search depth =?

)
SELECT
path as o_path,
prop as o_link_prop,
depth as o_depth
FROM search_graph
where c2 = %s -- the end of the shortest path
limit 1 -- query a recursive table. You can add LIMIT output or use a cursor
$_$, i_p1, i_depth, i_p2);

execute sql into o_path,o_link_prop,o_depth;
return;
end;
$$
language plpgsql strict;

Example

postgres=# select * from graph_search3(1,2);  
o_path | o_link_prop | o_depth
--------+-------------+---------
{1,2} | {10} | 1
(1 row)

Time: 0.907 ms
postgres=# select * from graph_search3(1,5);
o_path | o_link_prop | o_depth
---------+-------------+---------
{1,2,5} | {10}{9} | 2
(1 row)

Time: 0.854 ms
postgres=# select * from graph_search3(1,8);
o_path | o_link_prop | o_depth
-----------+-------------+---------
{1,2,5,8} | {10}{9}{9} | 3
(1 row)

Add some relationships and search again

insert into tbl_er values (2,3,array[12],now());  -- For example, 2 is the elder sister of 3  
insert into tbl_er values (3,2,array[13],now()); -- For example, 3 is the younger sister of 2
postgres=# select * from graph_search1(1);
o_path | o_point1 | o_point2 | o_link_prop | o_link_prop_all | o_depth
-------------+----------+----------+-------------+-----------------+---------
{1,2} | 1 | 2 | {10} | {10} | 1
{1,3} | 1 | 3 | {10} | {10} | 1
{1,4} | 1 | 4 | {8} | {8} | 1
{1,7} | 1 | 7 | {8} | {8} | 1
{1,2,3} | 2 | 3 | {12} | {10}{12} | 2
{1,2,5} | 2 | 5 | {9} | {10}{9} | 2
{1,3,2} | 3 | 2 | {13} | {10}{13} | 2
{1,3,5} | 3 | 5 | {9} | {10}{9} | 2
{1,2,3,5} | 3 | 5 | {9} | {10}{12}{9} | 3
{1,2,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
{1,2,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,2,5,3} | 5 | 3 | {11} | {10}{9}{11} | 3
{1,3,2,5} | 2 | 5 | {9} | {10}{13}{9} | 3
{1,3,5,8} | 5 | 8 | {9} | {10}{9}{9} | 3
{1,3,5,6} | 5 | 6 | {9} | {10}{9}{9} | 3
{1,3,5,2} | 5 | 2 | {11} | {10}{9}{11} | 3
{1,2,3,5,8} | 5 | 8 | {9} | {10}{12}{9}{9} | 4
{1,2,3,5,6} | 5 | 6 | {9} | {10}{12}{9}{9} | 4
{1,3,2,5,8} | 5 | 8 | {9} | {10}{13}{9}{9} | 4
{1,3,2,5,6} | 5 | 6 | {9} | {10}{13}{9}{9} | 4
(20 rows)

postgres=# select * from graph_search1(3);
o_path | o_point1 | o_point2 | o_link_prop | o_link_prop_all | o_depth
-------------+----------+----------+-------------+-----------------+---------
{3,1} | 3 | 1 | {9} | {9} | 1
{3,5} | 3 | 5 | {9} | {9} | 1
{3,2} | 3 | 2 | {13} | {13} | 1
{3,1,7} | 1 | 7 | {8} | {9}{8} | 2
{3,1,4} | 1 | 4 | {8} | {9}{8} | 2
{3,1,2} | 1 | 2 | {10} | {9}{10} | 2
{3,5,8} | 5 | 8 | {9} | {9}{9} | 2
{3,5,6} | 5 | 6 | {9} | {9}{9} | 2
{3,5,2} | 5 | 2 | {11} | {9}{11} | 2
{3,2,5} | 2 | 5 | {9} | {13}{9} | 2
{3,2,1} | 2 | 1 | {9} | {13}{9} | 2
{3,1,2,5} | 2 | 5 | {9} | {9}{10}{9} | 3
{3,5,2,1} | 2 | 1 | {9} | {9}{11}{9} | 3
{3,2,5,8} | 5 | 8 | {9} | {13}{9}{9} | 3
{3,2,5,6} | 5 | 6 | {9} | {13}{9}{9} | 3
{3,2,1,7} | 1 | 7 | {8} | {13}{9}{8} | 3
{3,2,1,4} | 1 | 4 | {8} | {13}{9}{8} | 3
{3,1,2,5,8} | 5 | 8 | {9} | {9}{10}{9}{9} | 4
{3,1,2,5,6} | 5 | 6 | {9} | {9}{10}{9}{9} | 4
{3,5,2,1,7} | 1 | 7 | {8} | {9}{11}{9}{8} | 4
{3,5,2,1,4} | 1 | 4 | {8} | {9}{11}{9}{8} | 4
(21 rows)

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store