Public Event Sentiment Analysis in PostgreSQL

Background

Sometimes a business may want to know what will happen following a specific event. For example, will publishing house price regulations lead to large investment in the stock market?

Let’s take a look at how PostgreSQL databases can meet this type of requirement.

Simulate a Realistic Scenario

1. Construct the structure of a table holding events

create table event(  
id serial8 primary key, -- Auto-increment sequence that determines the sequential order of events
class text, -- Event type
info text, -- Event description
crt_time timestamp -- Time when an event occurs
);

2. Insert 10 million event records, which are of 5,000 different event types

insert into event (class, info , crt_time) select (5000*random())::int::text, 'test', clock_timestamp() from generate_series(1,10000000);

3. Create indexes by event type

create index idx_event_class on event (class);

Requirements

Query a specific number of events following a certain event

Simply use the following function to meet the preceding requirement

create or replace function f(  
sql text, -- Query the IDs of the target events to be analyzed
v_class text, -- Type of a target event
v_n int8, -- How many subsequent events are to be checked
v_limit int -- How many times events are analyzed. If all output is required, enter a large value
) returns setof event as
$$

declare
v_id int8;
begin
for v_id in execute sql loop
return query select * from event where id>=v_id order by id limit v_n+1;
v_limit := v_limit - 1;
if v_limit<=0 then
return;
end if;
end loop;
end;
$$
language plpgsql strict;

Query example

Query two events following an event of type 1 and generate analysis results ten times.

postgres=# select * from f(
$$
select id from event where class='1'
$$
, '1', 2, 10);
id | class | info | crt_time
-------+-------+------+----------------------------
1592 | 1 | test | 2017-03-31 15:07:23.77348
1593 | 3032 | test | 2017-03-31 15:07:23.773483
1594 | 3409 | test | 2017-03-31 15:07:23.773486
2784 | 1 | test | 2017-03-31 15:07:23.777265
2785 | 2819 | test | 2017-03-31 15:07:23.777268
2786 | 87 | test | 2017-03-31 15:07:23.777271
12176 | 1 | test | 2017-03-31 15:07:23.807489
12177 | 2586 | test | 2017-03-31 15:07:23.807491
12178 | 4101 | test | 2017-03-31 15:07:23.807494
19398 | 1 | test | 2017-03-31 15:07:23.83072
19399 | 1179 | test | 2017-03-31 15:07:23.830723
19400 | 4237 | test | 2017-03-31 15:07:23.830726
19571 | 1 | test | 2017-03-31 15:07:23.831296
19572 | 4368 | test | 2017-03-31 15:07:23.831299
19573 | 2313 | test | 2017-03-31 15:07:23.831302
24708 | 1 | test | 2017-03-31 15:07:23.847794
24709 | 1327 | test | 2017-03-31 15:07:23.847797
24710 | 4584 | test | 2017-03-31 15:07:23.847799
29756 | 1 | test | 2017-03-31 15:07:23.864234
29757 | 4386 | test | 2017-03-31 15:07:23.864237
29758 | 3044 | test | 2017-03-31 15:07:23.864239
30224 | 1 | test | 2017-03-31 15:07:23.865765
30225 | 4704 | test | 2017-03-31 15:07:23.865768
30226 | 332 | test | 2017-03-31 15:07:23.865771
32004 | 1 | test | 2017-03-31 15:07:23.871554
32005 | 219 | test | 2017-03-31 15:07:23.871557
32006 | 3548 | test | 2017-03-31 15:07:23.871559
36472 | 1 | test | 2017-03-31 15:07:23.886097
36473 | 640 | test | 2017-03-31 15:07:23.8861
36474 | 1139 | test | 2017-03-31 15:07:23.886103
(30 rows)

With the preceding results, we can know what events will happen after the occurrence of event 1.

To use the preceding results for further analysis, use hotspot analysis similar to keywords, use the MADlib machine learning database in PostgreSQL or use plR.

You always have a method to find the correlation between events.

Correlation Analysis of Non-Independent Events

Because the previous analysis targets independent events, which bear no correlation themselves, it is required to use functions to generate results.

Users’ mouse clicks on Taobao (for example, what other items a user clicks after clicking a specific item) is an example of non-independent events.

For non-independent events, the structure design includes associated fields like USERID, so that we can easily establish association.

Trajectory analysis can be performed on non-independent events by using recursive calls in PostgreSQL.

Summary

Recursive syntax in PostgreSQL and PL/pgSQL can perfectly meet the requirements of analyzing events before and after a specific event in public opinion analysis scenarios.

Then hotspot analysis, cluster analysis, MADlib in PostgreSQL or plR can be used to analyze events to find event correlation.

Original Source

https://www.alibabacloud.com/blog/public-event-sentiment-analysis-in-postgresql_594990?spm=a2c41.13103861.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store