Internet of Vehicles — Window Querying with Alibaba Cloud RDS for PostgreSQL

Alibaba Cloud
6 min readMar 16, 2018

Background

Internet of Vehicles (IoV) is one of the hottest topic of research in the field of Internet of Things (IoT). A typical scenario for Internet of Vehicles applications is collecting vehicle’s travel tracks, but tracks of vehicles are usually not reported in real time. Several track records may be accumulated or reported at intervals.

A typical data structure is as follows:

(car_id, pos geometry, crt_time timestamp)

Heavy traffic and traffic signals often occur during the vehicle traveling process. The reported track records may be as follows:

1, position 1, '2017-01-01 12:00:00'  
1, position 1, '2017-01-01 12:00:05 PM'
1, position 1, '2017-01-01 12:00:10 PM'
1, position 1, '2017-01-01 12:00:15 PM'
1, position 1, '2017-01-01 12:00:20 PM'
1, position 2, '2017-01-01 12:00:30 PM'

That is, multiple records in the same position may be uploaded due to heavy traffic or traffic signals.

Therefore, there is a requirement to clean unnecessary wait records from the database. We keep at most two records for a given point, indicating arrival and departure.

This operation can be executed via window function.

Surely, in terms of providing the best efficiency, it is more reasonable to clean tracks from a terminal. Only two records will be kept for the start point of a position.

Implementation Example

1.Design a table structure

create table car_trace (cid int, pos point, crt_time timestamp);

2.Generate 10 million pieces of test data and assume there are 1,000 vehicles (in order to make it easy to repeat the data and test the effect, 25 points are used at the position).

insert into car_trace select random()*999, point((random()*5)::int, (random()*5)::int), clock_timestamp() from generate_series(1,10000000);

3.Create an index

create index idx_car on car_trace (cid, crt_time);

4.Query a data layout

select * from car_trace where cid=1 order by crt_time limit 1000;    
1 | (3,1) | 2017-07-22 21:30:09.84984
1 | (1,4) | 2017-07-22 21:30:09.850297
1 | (1,4) | 2017-07-22 21:30:09.852586
1 | (1,4) | 2017-07-22 21:30:09.854155
1 | (1,4) | 2017-07-22 21:30:09.854425
1 | (3,1) | 2017-07-22 21:30:09.854493

As shown on the list, several pieces of data are repetitive.

5.Filter the records at a single position via the window. At most two records for arrival and departure from the position will be kept.

Two window functions are used here: lead and lag. Lag indicates the previous record of the current record and Leadindicates the next record of the current record.

The method of determining arrival and departure points is as follows:

•The current position is not equal to the previous position, indicating that the record is the current position’s arrival point.

•The current position is not equal to the next position, indicating that the record is the current position’s leaving point.

•The previous position is empty, indicating that the current record is the first record.

•The next position is empty, indicating that the current record is the last record.

select * from   
(
select
*,
lag(pos) over (partition by cid order by crt_time) as lag,
lead(pos) over (partition by cid order by crt_time) as lead
from car_trace
where cid=1
and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735'
) t
where pos <> lag
or pos <> lead
or lag is null
or lead is null;

cid | pos | crt_time | lag | lead
-----+-------+----------------------------+-------+-------
1 | (2,1) | 2017-07-22 21:30:09.83994 | | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)
1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)
1 | (4,4) | 2017-07-22 21:30:09.84179 | (5,2) | (5,2)
1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)
1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)
1 | (0,5) | 2017-07-22 21:30:09.84536 | (1,5) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)
1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.84984 | (3,3) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)
1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)
1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)
1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |
(20 rows)

When track cleaning is not performed, the obtained results are as follows:

select   
*,
lag(pos) over (partition by cid order by crt_time) as lag,
lead(pos) over (partition by cid order by crt_time) as lead
from car_trace
where cid=1
and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735';

cid | pos | crt_time | lag | lead
-----+-------+----------------------------+-------+-------
1 | (2,1) | 2017-07-22 21:30:09.83994 | | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)
1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)
1 | (4,4) | 2017-07-22 21:30:09.84179 | (5,2) | (5,2)
1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)
1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)
1 | (0,5) | 2017-07-22 21:30:09.84536 | (1,5) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)
1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.84984 | (3,3) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.852586 | (1,4) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.854155 | (1,4) | (1,4)
1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)
1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)
1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)
1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)
1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)
1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |
(22 rows)

Lag and lead are used to clear records in the stay process.

Database Optimization for IoV

In a typical scenario, a lot of vehicle IDs are involved in the business. Data gathered for different vehicles is usually written into a database. If no optimization is made, after entering the database, the data of different vehicles may be staggered. That is, the data of different vehicles may be stored in one data block.

A lot of data blocks will be scanned (scanning IO amplification) when the track of a single vehicle is queried. There are two optimization methods to speed up the querying process.

1.Write into the database after the business end gathers grouping and sorting.

For example, after receiving the data submitted by the vehicle terminal, the program groups vehicle IDs, sorts them by time, and writes them into the database (using insert into tbl values (),(),…();). In this way, the data of the same vehicle will fall into the same data block as much as is possible.

2.Use a partition to reorganize the data in the database.

For example, we can store data based on the vehicle ID, every vehicle, or vehicle HASH partition.

The two methods both relate to reorganizing the data based on query requirements to achieve the purpose of decreasing scanning IO. This method is similar to PostgreSQL Securities Industry Database Requirement Analysis and Application (article in Chinese).

Reference:

https://www.alibabacloud.com/blog/Internet-of-Vehicles-%E2%80%93-Window-Querying-with-Alibaba-Cloud-RDS-for-PostgreSQL_p521546?spm=a2c41.11297455.0.0

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com