--

By Digoal

# Background

In terms of elective selection for college students, one elective can be selected by many students and one student may select multiple electives at the same time. Students and electives are in a many-to-many relationship. This situation brings about two problems:

1. How to find the relevant electives of a specific elective and sort these electives by relevancy ( namely, other electives also selected by a student who selects this specific elective)?
2. How to find the relevant electives of a specific elective and sort these electives by relevancy ( namely, use recursion to find other electives also selected by a student who selects this specific elective and electives selected by the students who have selected the “other electives”)?

# Demo

Meeting the first requirement is very easy:

Consider 100 thousand students and 1000 electives. Each student selects 5 electives on average.

Let’s see how to find which electives are relevant and sort these electives by relevancy (what other electives are also selected by students who have selected this elective). Sort the electives by the number of selections in descending order.

1. Create a table to store elective selection information:

create table xuanke(
std_id int8, -- Student ID
cos_id int -- Course
);

2. Insert 500 thousand records:

insert into xuanke select random()*99999, random()*999 from generate_series(1,500000);

3. Use arrays to store electives selected by individual students:

create table xuanke_cos_id (
std_id int8 primary key,
cos_id int[]
);

insert into xuanke_cos_id select std_id, array_agg(cos_id) from xuanke group by 1;

4. Use arrays to store information about students selecting each elective:

create table xuanke_std_id (
cos_id int primary key,
std_id int8[]
);

insert into xuanke_std_id select cos_id, array_agg(std_id) from xuanke group by 1;

5. Obtain all electives selected by students based on one elective they selected, aggregate these electives and display them by relevance:

create or replace function get_cos_id2(int8[]) returns text[] as
\$\$

select array_agg(unnest||':'||cnt order by cnt desc) from
(select unnest(cos_id) as unnest, count (*) as cnt
from xuanke_cos_id where std_id = any (\$1) group by 1
) t;
\$\$
language sql strict;

6. Obtain results:

select cos_id, get_cos_id2(std_id) from xuanke_std_id;

Example results

251 | {251:495,348:9,708:8,372:7,816:7,431:6,184:6,600:6,114:6,649:6, .....

453 | {453:499,519:7,750:7,816:7,375:7,109:7,705:7,650:7,908:7, .....

The preceding query can be performed in milliseconds when parallel computing is used.

# Related Case

To implement the second requirement, a recursive method is required.

WITH RECURSIVE search_graph(
std_id, -- point 1
cos_id, -- point 2
depth, -- depth, starting from 1
path, -- path, stored as an array
cycle -- cycle or not cycle
) AS (
select std_id,cos_id,depth,path,cycle from (
select
std_id,
cos_id,
1 depth,
array[row(std_id,cos_id)] path,
false as cycle
from xuanke
where cos_id=?
) t
UNION ALL
select std_id,cos_id,depth,path,cycle from (
select
g.std_id,
g.cos_id,
sg.depth+1 depth,
sg.path||array[row(g.std_id,g.cos_id)] path,
(row(g.std_id,g.cos_id) = ANY(path)) as cycle
from xuanke as g, search_graph AS sg
where
g.std_id = sg.std_id
AND NOT cycle
-- and sg.depth <= ?
) t
)
SELECT * FROM search_graph;

This involves a very large volume of data and therefore it takes a very long time to return results. In fact, a very deep level does not make much difference.