PostgreSQL Time-Series Database Plug-in TimescaleDB Deployment Practices

Alibaba Cloud
14 min readMay 23, 2019

--

Background

In the real world, the data generated by many businesses have the attribute of time-series data (that is, the data is written sequentially in time dimension, and a large number of requests for time-interval query statistics are also included).

For example, FEED data of the business, time-series data generated by Internet of Things (such as weather sensors and vehicle trajectories) and real-time data from the financial industry.

PostgreSQL UDF and BRIN (block-level index) are ideal for processing time-series data. Specifically, see the two following examples.

Implementation of On-demand Slicing in PostgreSQL — plpgsql Schemaless Implementation of the Automatic Slicing Feature of the TimescaleDB Plug-in

PostgreSQL Time-series Best Practices — Design a Stock Exchange System Database — Alibaba Cloud RDS for PostgreSQL Best Practices

In fact, in PostgreSQL ecology, a time-series plug-in named TimescaleDB has been derived, which is specially used to process time-series data. (Timescale improvements, including improvements to the SQL optimizer (it supports “merge append”, and time shard aggregation is very efficient), rotate interface, and automatic slicing)

Many investors are also interested in TimescaleDB and it has already received an investment of USD 50 million, which indirectly indicates that the time-series database will be very popular with users in the future.

The Advantage of TimescaleDB

First, TimescaleDB is automatically sharded and has no influence from the users’ perspective. When the amount of data is very large, the write performance does not deteriorate. (This mainly refers to disks with lower IOPS. For disks with better IOPS, PG performs OK after writing a large amount of data.)

Secondly, Timescale improves SQL optimizer and adds the execution node of “merge append”. When “group by” is performed on small time shards, it does not need to perform HASH or GROUP operation on the entire timestamp range, but instead performs calculation on shards, which makes it very efficient.

Finally, some APIs have been added to Timescale, making it very efficient for users to write, maintain, and query time-series data, and very easy to maintain the data.

These APIs are as follows: http://docs.timescale.com/v0.8/api

Deploy TimescaleDB

Take CentOS 7.x x64 as an example.

1. Install PostgreSQL

Please see PostgreSQL on Linux Best Deployment Manual

export USE_NAMED_POSIX_SEMAPHORES=1  
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10 --with-segsize=8 --with-wal-segsize=256
LIBS=-lpthread CFLAGS="-O3" make world -j 64
LIBS=-lpthread CFLAGS="-O3" make install-world

2. Install cmake3

epel  

yum install -y cmake3

ln -s /usr/bin/cmake3 /usr/bin/cmake

3. Compile TimescaleDB

git clone https://github.com/timescale/timescaledb/  

cd timescaledb
git checkout release-0.8.0



wget https://github.com/timescale/timescaledb/archive/0.8.0.tar.gz



export PATH=/home/digoal/pgsql10/bin:$PATH
export LD_LIBRARY_PATH=/home/digoal/pgsql10/lib:$LD_LIBRARY_PATH

# Bootstrap the build system
./bootstrap

cd ./build && make

make install


[ 2%] Built target sqlupdatefile
[ 4%] Built target sqlfile
[100%] Built target timescaledb
Install the project...
-- Install configuration: "Release"
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb.control
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.1--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.1.0--0.2.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.2.0--0.3.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.3.0--0.4.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.0--0.4.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.1--0.4.2.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.2--0.5.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.5.0--0.6.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.0--0.6.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.0--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/lib/timescaledb.so

4. Configure postgresql.conf to automatically load the timescale lib library when the database is started

vi $PGDATA/postgresql.conf  
shared_preload_libraries = 'timescaledb'

pg_ctl restart -m fast

5. Create plug-ins for databases that need to use TimescaleDB

psql  
psql (10.1)
Type "help" for help.

postgres=# create extension timescaledb ;

6. Parameters related to TimescaleDB

timescaledb.constraint_aware_append     
timescaledb.disable_optimizations
timescaledb.optimize_non_hypertables
timescaledb.restoring

postgres=# show timescaledb.constraint_aware_append ;
timescaledb.constraint_aware_append
-------------------------------------
on
(1 row)

postgres=# show timescaledb.disable_optimizations ;
timescaledb.disable_optimizations
-----------------------------------
off
(1 row)

postgres=# show timescaledb.optimize_non_hypertables ;
timescaledb.optimize_non_hypertables
--------------------------------------
off
(1 row)

postgres=# show timescaledb.restoring ;
timescaledb.restoring
-----------------------
off
(1 row)

TimescaleDB usage example 1 — perspective analysis of New York taxi data

The first example is the actual New York city taxicab data, http://docs.timescale.com/v0.8/tutorials/tutorial-hello-nyc

The data is real, taken from New York city taxi cabs, http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

1. Download sample data

wget https://timescaledata.blob.core.windows.net/datasets/nyc_data.tar.gz

2. Extract

tar -zxvf nyc_data.tar.gz

3. Create a table, which involves using the create_hypertable API to convert ordinary tables into time-series storage tables.

psql -f nyc_data.sql

Some of the truncated nyc_data.sql content is as follows:

cat nyc_data.sql  

-- 打车数据: 包括时长、计费、路程、上车、下车经纬度、时间、人数等等。

CREATE TABLE "rides"(
vendor_id TEXT,
pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
passenger_count NUMERIC,
trip_distance NUMERIC,
pickup_longitude NUMERIC,
pickup_latitude NUMERIC,
rate_code INTEGER,
dropoff_longitude NUMERIC,
dropoff_latitude NUMERIC,
payment_type INTEGER,
fare_amount NUMERIC,
extra NUMERIC,
mta_tax NUMERIC,
tip_amount NUMERIC,
tolls_amount NUMERIC,
improvement_surcharge NUMERIC,
total_amount NUMERIC
);

This sentence converts the “rides” table into a time-series storage table

SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);

Create an index

CREATE INDEX ON rides (vendor_id, pickup_datetime desc);  
CREATE INDEX ON rides (pickup_datetime desc, vendor_id);
CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX ON rides (passenger_count, pickup_datetime desc);

4. Import test data

psql -c "\COPY rides FROM nyc_data_rides.csv CSV"  
COPY 10906858

5. Execute some test SQL on the “rides” table that has been converted to a time-series storage table, the performance of which is better than PostgreSQL ordinary tables.

What is the average charge for transactions with more than two passengers per day?

-- Average fare amount of rides with 2+ passengers by day  

SELECT date_trunc('day', pickup_datetime) as day, avg(fare_amount)
FROM rides
WHERE passenger_count > 1 AND pickup_datetime < '2016-01-08'
GROUP BY day ORDER BY day;

day | avg
--------------------+---------------------
2016-01-01 00:00:00 | 13.3990821679715529
2016-01-02 00:00:00 | 13.0224687415181399
2016-01-03 00:00:00 | 13.5382068607068607
2016-01-04 00:00:00 | 12.9618895561740149
2016-01-05 00:00:00 | 12.6614611935518309
2016-01-06 00:00:00 | 12.5775245695086098
2016-01-07 00:00:00 | 12.5868802584437019
(7 rows)

6. The performance of some queries is even more than 20 times better

How many transactions are there every day?

-- Total number of rides by day for first 5 days  

SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides
GROUP BY day ORDER BY day
LIMIT 5;

day | count
--------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
(5 rows)

Timescale adds the execution optimization of “merge append”, so it is highly efficient to aggregate by small granularity on time shards. The more data, the more obvious the difference in performance improvement.

For example, TimescaleDB introduces a time-based “merge append” optimization to minimize the number of groups which must be processed to execute the following (given its knowledge that time is already ordered).

For our 100M row table, this results in query latency that is 396x faster than PostgreSQL (82ms vs. 32566ms).

SELECT date_trunc('minute', time) AS minute, max(usage_user)  
FROM cpu
WHERE time < '2017-01-01'
GROUP BY minute
ORDER BY minute DESC
LIMIT 5;

7. Execute some functions specific to TimescaleDB, such as time_bucket, and some acceleration algorithms built into TimescaleDB is also used here.

Every 5-minute interval is a BUCKET, which produces the number of orders generated in each interval.

-- Number of rides by 5 minute intervals  
-- (using the TimescaleDB "time_bucket" function)

SELECT time_bucket('5 minute', pickup_datetime) as five_min, count(*)
FROM rides
WHERE pickup_datetime < '2016-01-01 02:00'
GROUP BY five_min ORDER BY five_min;

five_min | count
---------------------+-------
2016-01-01 00:00:00 | 703
2016-01-01 00:05:00 | 1482
2016-01-01 00:10:00 | 1959
2016-01-01 00:15:00 | 2200
2016-01-01 00:20:00 | 2285
2016-01-01 00:25:00 | 2291
2016-01-01 00:30:00 | 2349
2016-01-01 00:35:00 | 2328
2016-01-01 00:40:00 | 2440
2016-01-01 00:45:00 | 2372
2016-01-01 00:50:00 | 2388
2016-01-01 00:55:00 | 2473
2016-01-01 01:00:00 | 2395
2016-01-01 01:05:00 | 2510
2016-01-01 01:10:00 | 2412
2016-01-01 01:15:00 | 2482
2016-01-01 01:20:00 | 2428
2016-01-01 01:25:00 | 2433
2016-01-01 01:30:00 | 2337
2016-01-01 01:35:00 | 2366
2016-01-01 01:40:00 | 2325
2016-01-01 01:45:00 | 2257
2016-01-01 01:50:00 | 2316
2016-01-01 01:55:00 | 2250
(24 rows)

8. Execute some statistical analysis SQL

The volume of taxi transactions in each city.

-- Join rides with rates to get more information on rate_code  

SELECT rates.description, COUNT(vendor_id) as num_trips FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE pickup_datetime < '2016-01-08'
GROUP BY rates.description ORDER BY rates.description;

description | num_trips
-----------------------+-----------
JFK | 54832
Nassau or Westchester | 967
Newark | 4126
group ride | 17
negotiated fare | 7193
standard rate | 2266401
(6 rows)

Statistics of taxi rides in some cities in January 2016 (including longest distance, shortest distance, average number of passengers, and hours)

-- Analysis of all JFK and EWR rides in Jan 2016  

SELECT rates.description, COUNT(vendor_id) as num_trips,
AVG(dropoff_datetime - pickup_datetime) as avg_trip_duration, AVG(total_amount) as avg_total,
AVG(tip_amount) as avg_tip, MIN(trip_distance) as min_distance, AVG(trip_distance) as avg_distance, MAX(trip_distance) as max_distance,
AVG(passenger_count) as avg_passengers
FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE rides.rate_code in (2,3) AND pickup_datetime < '2016-02-01'
GROUP BY rates.description ORDER BY rates.description;

description | num_trips | avg_trip_duration | avg_total | avg_tip | min_distance | avg_distance | max_distance | avg_passengers
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------
JFK | 225019 | 00:45:46.822517 | 64.3278115181384683 | 7.3334228220728027 | 0.00 | 17.2602816651038357 | 221.00 | 1.7333869584346211
Newark | 16822 | 00:35:16.157472 | 86.4633688027582927 | 9.5461657353465700 | 0.00 | 16.2706122934252764 | 177.23 | 1.7435501129473309
(2 rows)

9. Automatic data sharding and run plan

postgres=# \d+ rides  
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk

其中一个分片的约束如下
Check constraints:
"constraint_1" CHECK (pickup_datetime >= '2015-12-31 00:00:00'::timestamp without time zone AND pickup_datetime < '2016-01-30 00:00:00'::timestamp without time zone)
"constraint_2" CHECK (_timescaledb_internal.get_partition_hash(payment_type) >= 1073741823)
Inherits: rides
-- Peek behind the scenes

postgres=# select count(*) from rides;
count
----------
10906858
(1 row)

Time: 376.247 ms
postgres=# explain select count(*) from rides;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=254662.23..254662.24 rows=1 width=8)
-> Gather (cost=254661.71..254662.22 rows=5 width=8)
Workers Planned: 5
-> Partial Aggregate (cost=253661.71..253661.72 rows=1 width=8)
-> Append (cost=0.00..247468.57 rows=2477258 width=0)
-> Parallel Seq Scan on rides (cost=0.00..0.00 rows=1 width=0)
-> Parallel Seq Scan on _hyper_1_1_chunk (cost=0.00..77989.57 rows=863657 width=0)
-> Parallel Seq Scan on _hyper_1_2_chunk (cost=0.00..150399.01 rows=1331101 width=0)
-> Parallel Seq Scan on _hyper_1_3_chunk (cost=0.00..6549.75 rows=112675 width=0)
-> Parallel Seq Scan on _hyper_1_4_chunk (cost=0.00..12530.24 rows=169824 width=0)
(10 rows)

10. You can also check the shards directly

postgres=# select count(*) from  _timescaledb_internal._hyper_1_1_chunk;  
count
---------
3454961
(1 row)

Slices Are Completely Transparent to Users

Sliced metadata:

postgres=# \dn  
List of schemas
Name | Owner
-----------------------+----------
_timescaledb_cache | postgres
_timescaledb_catalog | postgres
_timescaledb_internal | postgres
public | postgres
(4 rows)

Timescaledb + Postgis Combination — Spatial-Temporal Database

The time-series database timescaleDB plug-in is combined with the spatial-temporal database PostGIS plug-in. PostgreSQL is very good at handling spatial data.

1. Create a spatial database PostGIS

create extension postgis;

2. Add a spatial type field

http://postgis.net/docs/manual-2.4/AddGeometryColumn.html

postgres=# SELECT AddGeometryColumn ('public','rides','pickup_geom',2163,'POINT',2);  
addgeometrycolumn
--------------------------------------------------------
public.rides.pickup_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)

postgres=# SELECT AddGeometryColumn ('public','rides','dropoff_geom',2163,'POINT',2);
addgeometrycolumn
---------------------------------------------------------
public.rides.dropoff_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)

postgres=#
postgres=# \d+ rides
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
pickup_geom | geometry(Point,2163) | | | | main | |
dropoff_geom | geometry(Point,2163) | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk

3. Update the data to the geometry field (It is actually stored as two automatic fields, representing longitude and latitude respectively. In fact, it does not matter whether it is updated or not, because PG supports expression indexes, and you can use these two fields to create expression spatial indexes.)

-- Generate the geometry points and write to table  
-- (Note: These calculations might take a few mins)

UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163);
UPDATE rides SET dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);


vacuum full rides;

4. Examples of Spatio-Temporal Analysis.

How many cars are called every 30 minutes within 400 meters of (lat, long) (40.7589,-73.9851).

-- Number of rides on New Years Eve originating within  
-- 400m of Times Square, by 30 min buckets
-- Note: Times Square is at (lat, long) (40.7589,-73.9851)

SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
FROM rides
WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
AND pickup_datetime < '2016-01-01 14:00'
GROUP BY thirty_min ORDER BY thirty_min;

thirty_min | near_times_sq
---------------------+--------------
2016-01-01 00:00:00 | 74
2016-01-01 00:30:00 | 102
2016-01-01 01:00:00 | 120
2016-01-01 01:30:00 | 98
2016-01-01 02:00:00 | 112
2016-01-01 02:30:00 | 109
2016-01-01 03:00:00 | 163
2016-01-01 03:30:00 | 181
2016-01-01 04:00:00 | 214
2016-01-01 04:30:00 | 185
2016-01-01 05:00:00 | 158
2016-01-01 05:30:00 | 113
2016-01-01 06:00:00 | 102
2016-01-01 06:30:00 | 91
2016-01-01 07:00:00 | 88
2016-01-01 07:30:00 | 58
2016-01-01 08:00:00 | 72
2016-01-01 08:30:00 | 94
2016-01-01 09:00:00 | 115
2016-01-01 09:30:00 | 118
2016-01-01 10:00:00 | 135
2016-01-01 10:30:00 | 160
2016-01-01 11:00:00 | 212
2016-01-01 11:30:00 | 229
2016-01-01 12:00:00 | 244
2016-01-01 12:30:00 | 230
2016-01-01 13:00:00 | 235
2016-01-01 13:30:00 | 238

Example 2 — Sensor Data and Weather Data

http://docs.timescale.com/v0.8/tutorials/other-sample-datasets

No more details are given here.

Common APIs for TimescaleDB

http://docs.timescale.com/v0.8/api

1. Create a Time-Series Table

create_hypertable()

Required Arguments

NameDescriptionmain_tableIdentifier of table to convert to hypertabletime_column_nameName of the column containing time values

Optional Arguments

NameDescriptionpartitioning_columnName of an additional column to partition by. If provided, number_partitions must be set.number_partitionsNumber of hash partitions to use for partitioning_column when this optional argument is supplied. Must be > 0.chunk_time_intervalInterval in event time that each chunk covers. Must be > 0. Default is 1 month.create_default_indexesBoolean whether to create default indexes on time/partitioning columns. Default is TRUE.if_not_existsBoolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE.partitioning_funcThe function to use for calculating a value’s partition.

2. Add a Multi-Level Sharded Field

Hash and interval shards are supported

add_dimension()

Required Arguments

NameDescriptionmain_tableIdentifier of hypertable to add the dimension to.column_nameName of the column to partition by.

Optional Arguments

NameDescriptionnumber_partitionsNumber of hash partitions to use on column_name. Must be > 0.interval_lengthInterval that each chunk covers. Must be > 0.partitioning_funcThe function to use for calculating a value’s partition (see create_hypertable instructions).

3. Delete a Shard

Delete shards before the specified time

drop_chunks()

Required Arguments

NameDescriptionolder_thanTimestamp of cut-off point for data to be dropped, i.e., anything older than this should be removed.

Optional Arguments

NameDescriptiontable_nameHypertable name from which to drop chunks. If not supplied, all hypertables are affected.schema_nameSchema name of the hypertable from which to drop chunks. Defaults to public.cascadeBoolean on whether to CASCADE the drop on chunks, therefore removing dependent objects on chunks to be removed. Defaults to FALSE.

4. Set the Time Interval of Shards

set_chunk_time_interval()

Required Arguments

NameDescriptionmain_tableIdentifier of hypertable to update interval for.chunk_time_intervalInterval in event time that each new chunk covers. Must be > 0.

5. Analysis Function — the First Record

first()

Required Arguments

NameDescriptionvalueThe value to return (anyelement)timeThe timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type)

For example, find the earliest uploaded temperature values for all sensors.

SELECT device_id, first(temp, time)  
FROM metrics
GROUP BY device_id;

This can also be done by using recursive SQL:

Applications of PostgrSQL Recursive SQL — Geeks and Normal People

6. Analysis Function — the Last Record

last()

Required Arguments

NameDescriptionvalueThe value to return (anyelement)timeThe timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type)

For example, find the latest temperature value of each sensor every 5 minutes.

SELECT device_id, time_bucket('5 minutes', time) as interval,  
last(temp, time)
FROM metrics
WHERE time > now () - interval '1 day'
GROUP BY device_id, interval
ORDER BY interval DESC;

This can also be done by using recursive SQL:

Applications of PostgrSQL Recursive SQL — Geeks and Normal People

7. Analysis Function — Histogram

histogram()

Required Arguments

NameDescriptionvalueA set of values to partition into a histogramminThe histogram’s lower bound used in bucketingmaxThe histogram’s upper bound used in bucketingnbucketsThe integer value for the number of histogram buckets (partitions)

For example:

The battery level of 20 to 60 is divided into FIVE BUCKET intervals, and an array of 5 + 2 values (representing the number of records in each bucket interval) is returned. The two values at the beginning and the end indicate how many records are outside the boundary.

SELECT device_id, histogram(battery_level, 20, 60, 5)  
FROM readings
GROUP BY device_id
LIMIT 10;

device_id | histogram
------------+------------------------------
demo000000 | {0,0,0,7,215,206,572}
demo000001 | {0,12,173,112,99,145,459}
demo000002 | {0,0,187,167,68,229,349}
demo000003 | {197,209,127,221,106,112,28}
demo000004 | {0,0,0,0,0,39,961}
demo000005 | {12,225,171,122,233,80,157}
demo000006 | {0,78,176,170,8,40,528}
demo000007 | {0,0,0,126,239,245,390}
demo000008 | {0,0,311,345,116,228,0}
demo000009 | {295,92,105,50,8,8,442}

8. Analysis Function — Time Interval

This is similar to date_trunc, but it is more powerful and can be truncated with any interval. It is easy for users to use.

time_bucket()

Required Arguments

NameDescriptionbucket_widthA PostgreSQL time interval for how long each bucket is (interval)timeThe timestamp to bucket (timestamp/timestamptz/date)

Optional Arguments

NameDescriptionoffsetThe time interval to offset all buckets by (interval)

9. The View Function for Data Overview — Time-Series Table Overview

hypertable_relation_size_pretty()

SELECT * FROM hypertable_relation_size_pretty('conditions');  

table_size | index_size | toast_size | total_size
------------+------------+------------+------------
1171 MB | 1608 MB | 176 kB | 2779 MB

10. The View Function for Data Overview — Shard Size

chunk_relation_size_pretty()

SELECT * FROM chunk_relation_size_pretty('conditions');  

chunk_table | table_size | index_size | total_size
---------------------------------------------+------------+------------+------------
"_timescaledb_internal"."_hyper_1_1_chunk" | 28 MB | 36 MB | 64 MB
"_timescaledb_internal"."_hyper_1_2_chunk" | 57 MB | 78 MB | 134 MB
...

11. View Function for Data Overview — Index Size

indexes_relation_size_pretty()

SELECT * FROM indexes_relation_size_pretty('conditions');  

index_name_ | total_size
--------------------------------------+------------
public.conditions_device_id_time_idx | 1143 MB
public.conditions_time_idx | 465 MB

12. Export Time-Series Metadata

https://raw.githubusercontent.com/timescale/timescaledb/master/scripts/dump_meta_data.sql

psql [your connect flags] -d your_timescale_db < dump_meta_data.sql > dumpfile.txt

Summary

TimescaleDB is a very useful time-series data processing plug-in, hiding the shard logic (it is transparent to users) and providing a large number of API function interfaces and performance optimization. It is great for time-series scenarios.

Combined with PostGIS plug-in, PostgreSQL is more powerful in spatio-temporal processing.

Reference:https://www.alibabacloud.com/blog/postgresql-time-series-database-plug-in-timescaledb-deployment-practices_594814?spm=a2c41.12889588.0.0

--

--

Alibaba Cloud
Alibaba Cloud

Written by Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

No responses yet