# Background

Typically when talking about product combinations, you would think of products that are closely related in terms of form and function. For instance, in China, stuffed buns, soybean milk, and tea eggs are common breakfast items. So it wouldn’t come as a surprise if these products are promoted together in a supermarket.

# Limitation of the Statistical Method Described in This Article

The statistical method covered in this article can only be used to calculate the best combination of directly related items (data included in the same order).

# Scenario Simulation

Assume that there is a total of 100,000 item IDs and simulate a batch of users’ order or shopping cart records with each order or shopping cart record containing 5–10 items. Create a total of around 11 million of these records.

`postgres=# create unlogged table buy (pay_id int8, item_id int[]);  CREATE TABLE`

# Construct Data

Create a function that inserts data into the buy table (array of 5–10 items)

`create or replace function f() returns void as \$\$    declare    begin      for i in 5..15 loop        insert into buy (item_id) select array_agg((100000*random())::int8) from generate_series(1,i);      end loop;    end;    \$\$ language plpgsql strict;`
`vi test.sql    select f();        pgbench -M prepared -n -r -P 1 -f ./test.sql -c 100 -j 100 -t 10000      transaction type: ./test.sql  scaling factor: 1  query mode: prepared  number of clients: 100  number of threads: 100  number of transactions per client: 10000  number of transactions actually processed: 1000000/1000000  latency average = 1.155 ms  latency stddev = 1.814 ms  tps = 85204.625725 (including connections establishing)  tps = 85411.351807 (excluding connections establishing)  script statistics:   - statement latencies in milliseconds:           1.158  select f();`
`postgres=# select count(*) from buy;    count     ----------   11000000  (1 row)    postgres=# select * from buy limit 10;   pay_id |                           item_id                              --------+--------------------------------------------------------------          | {6537,76804,33612,75580,8021}          | {72437,66015,2939,56128,7056}          | {40983,79581,15954,21039,6702,90279}          | {93626,8337,13416,69371,4366,75868}          | {84611,56893,25201,74038,59337,62045,59178}          | {97422,48801,69714,77056,17059,79714,21598}          | {42997,50834,57214,52866,83656,76342,5639,93416}          | {53543,24369,31552,28654,38516,63657,86564,11483}          | {58873,23162,23369,55091,32046,29907,31895,65658,5487}          | {39916,6641,85068,55870,27679,91770,46150,12290,48662,71350}  (10 rows)`

# GIN Indexes

`postgres=# create index idx_buy_item on buy using gin(item_id);`

# Split Functions

The purpose of splitting is to split the array in an order into several sets. For example, an order containing five items can be split into 10 (4+3+2+1) sets of 2 items.

`{6537,76804,33612,75580,8021}`
`{6537,76804}    {6537,33612}    {6537,75580}    {6537,8021}    {76804,33612}    {76804,75580}    {76804,8021}    {33612,75580}    {33612,8021}    {75580,8021}`
`WITH RECURSIVE   t(i) AS (    SELECT * FROM unnest('{A,B,C}'::char[])  ),   cte AS (       SELECT i AS combo, i, 1 AS ct        FROM t      UNION ALL        SELECT cte.combo || t.i, t.i, ct + 1        FROM cte, t        WHERE ct <= 3 -- Combine 4 (3+1) times         AND position(t.i in cte.combo) = 0 -- Newly added characters are not included in existing characters  )   SELECT ARRAY(SELECT combo FROM cte ORDER BY ct, combo) AS result;                          result                         ---------------------------------------------------   {A,B,C,AB,AC,BA,BC,CA,CB,ABC,ACB,BAC,BCA,CAB,CBA}  (1 row)`

# Function 1 That Returns a Specified Number of Sets

Assume that the array doesn’t include replicate elements

`create or replace function array_regroup(    i_arr int[], -- Enter an array    i_elems int -- Scramble into sets of fixed length  ) returns setof int[] as \$\$  declare    v_arr_len int := array_length(i_arr, 1); -- The length of the array that has been entered  begin    -- Protection    if i_elems > v_arr_len then      raise notice 'you cann''t return group len % more then %', i_elems, v_arr_len;      return;    elsif i_elems = v_arr_len then      return next i_arr;      return;    elsif i_elems = 1 then      return query select array(select i) from unnest(i_arr) t(i);      return;    end if;      return query    WITH RECURSIVE     t(i) AS (        select array(select i) from unnest(i_arr) t(i)    ),     cte AS (       SELECT i AS combo, i, 1 AS ct        FROM t      UNION ALL        SELECT array(select i from (select unnest(array_cat(cte.combo, t.i)) order by 1) t(i)), t.i, ct + 1        FROM cte, t        WHERE cte.ct <= i_elems-1 -- Combine any times         AND (not cte.combo @> t.i) -- Newly added values are not included in existing value sets    )     SELECT combo FROM cte where array_length(combo,1)=i_elems group by combo;       return;  end;  \$\$ language plpgsql strict;  postgres=# select array_regroup(array[1,2,3],2);   array_regroup   ---------------   {2,3}   {1,2}   {1,3}  (3 rows)`

# Function 2 That Returns All Sets

`create or replace function array_regroup(    i_arr int[], -- Enter an array  ) returns setof int[] as \$\$  declare    v_arr_len int := array_length(i_arr, 1); -- The length of the array that has been entered  begin      return query    WITH RECURSIVE     t(i) AS (        select array(select i) from unnest(i_arr) t(i)    ),     cte AS (       SELECT i AS combo, i, 1 AS ct        FROM t      UNION ALL        SELECT array(select i from (select unnest(array_cat(cte.combo, t.i)) order by 1) t(i)), t.i, ct + 1        FROM cte, t        WHERE cte.ct <= v_arr_len-1 -- Combine any times         AND (not cte.combo @> t.i) -- Newly added values are not included in existing value sets    )     SELECT combo FROM cte group by combo;       return;  end;  \$\$ language plpgsql strict;  postgres=# select array_regroup(array[1,2,3]);   array_regroup   ---------------   {2}   {2,3}   {1,2}   {1}   {1,2,3}   {3}   {1,3}  (7 rows)`

# Function 3 That Returns a Specified Number of Sets and That Only Displays Sets Containing Certain Elements (For Example, Arrays That Contain the Bread ID)

`create or replace function array_regroup(    i_arr int[], -- Enter an array    i_elems int -- Scramble into sets of fixed length    i_arr_contain int[] -- Arrays that contain specified item IDs  ) returns setof int[] as \$\$  declare    v_arr_len int := array_length(i_arr, 1); -- The length of the array that has been entered  begin    -- Protection    if i_elems > v_arr_len then      raise notice 'you cann''t return group len % more then %', i_elems, v_arr_len;      return;    elsif i_elems = v_arr_len then      return next i_arr;      return;    elsif i_elems = 1 then      return query select array(select i) from unnest(i_arr) t(i);      return;    end if;      return query    WITH RECURSIVE     t(i) AS (        select array(select i) from unnest(i_arr) t(i)    ),     cte AS (       SELECT i AS combo, i, 1 AS ct        FROM t      UNION ALL        SELECT array(select i from (select unnest(array_cat(cte.combo, t.i)) order by 1) t(i)), t.i, ct + 1        FROM cte, t        WHERE cte.ct <= i_elems-1 -- Combine any times         AND (not cte.combo @> t.i) -- Newly added values are not included in existing value sets         AND (cte.combo @> i_arr_contain)    )     SELECT combo FROM cte where array_length(combo,1)=i_elems group by combo;       return;  end;  \$\$ language plpgsql strict;  postgres=# select array_regroup(array[1,2,3,4,5],2,array[1]);   array_regroup   ---------------   {1,2}   {1,3}   {1,4}   {1,5}  (4 rows)    Time: 1.150 ms`

# Find the One Best Combination Item for a Single Item

For example, find the best combination item for bread.

`postgres=# select item_id from buy where item_id @> array[6537];  ......   {60573,17248,6537,77857,43349,66208,13656}   {97564,50031,79924,24255,6537,21174,39117}   {24026,78667,99115,87856,64782,8344,73169,41478,63091,29609,6537,71982,75382}   {53094,97465,26156,54181,6537}  (1101 rows)  Time: 5.791 ms    postgres=# explain select item_id from buy where item_id @> array[6537];                                     QUERY PLAN                                      ---------------------------------------------------------------------------------   Bitmap Heap Scan on buy  (cost=457.45..51909.51 rows=55000 width=60)     Recheck Cond: (item_id @> '{6537}'::integer[])     ->  Bitmap Index Scan on idx_buy_item  (cost=0.00..443.70 rows=55000 width=0)           Index Cond: (item_id @> '{6537}'::integer[])  (4 rows)`
`postgres=# select count(*), array_regroup(item_id,2,array[6537]) from buy where item_id @> array[6537] group by 2 order by 1 desc;   count | array_regroup   -------+---------------       3 | {6537,55286}       3 | {6537,48661}       3 | {6537,78337}       3 | {6537,72623}       3 | {6537,81442}       3 | {6537,66414}       3 | {6537,35346}       3 | {6537,79565}       3 | {3949,6537}  ......    Time: 286.859 ms`

# Find the Two Best Combination Items for a Single Item

For example, find the two best combination items for bread.

`postgres=# select count(*), array_regroup(item_id,3,array[6537]) from buy where item_id @> array[6537] group by 2 order by 1 desc;   count |   array_regroup      -------+--------------------       1 | {32,999,6537}       1 | {6537,49957,91533}       1 | {6537,49957,88377}       1 | {6537,49957,57887}       1 | {6537,49957,55192}       1 | {6537,49952,95266}       1 | {6537,49952,56916}       1 | {6537,49945,60492}       1 | {6537,49940,92888}  ......    Time: 1055.414 ms`

# Find the One Best Combination Item throughout the Network

This may take a long time.

`select count(*), array_regroup(item_id,2) from buy group by 2 order by 1 desc limit 10;`

# Find the N Best Combination Items throughout the Network

This may take a long time.

`select count(*), array_regroup(item_id, n) from buy group by 2 order by 1 desc limit 10;`

# Summary

1. This case doesn’t require highly technical methods, and only splits arrays by recommendation level to count the number of occurrences.

# References

https://github.com/DataSystemsLab/recdb-postgresql

# Original Source

https://www.alibabacloud.com/blog/using-hybriddbpostgresql-to-find-the-best-product-combination-for-marketing_594994?spm=a2c41.13103612.0.0

--

--

## More from Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

## Get the Medium app

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com