Time Series Database vs. Common Database Technologies for IoT
By Alibaba Cloud Database Team
Time series data is a type of data that indicates the changes over time in a physical device, system, application process, or behavior. It is widely used in scenarios such as the Internet of Things (IoT), industrial Internet of Things (IIoT), and basic operation and maintenance (O&M) systems. Alibaba Cloud High-Performance Time Series Database (HiTSDB) supports reliable writing of large-scale time series data, reduces data storage costs, and flexibly completes business data aggregation analysis in real time.
Time Series Data in Real-Life Scenarios
In the following scenarios, we can see that time series data is quite relatable to our everyday life:
- An e-commerce system obtains the transaction amount and payment amount data of each order as well as the product inventory and logistics data.
- A smart electric meter records electricity consumption data per hour and generates billing data in real time.
- A windmill on a high mountain obtains the real-time data of rotational speed and generates the data of wind speed and electricity production.
Are there any exceptions in the call amount of an application or a service? What are the load and resource usage of a server?
These applications rely on a form of data that measures how things change with time. Each data source periodically sends new readings to create measurement results that are collected over time. These measurement results are time series data. A time series data set mainly has the following three characteristics:
- New data is always stored and recorded as a new entry.
- Data is usually stored in chronological order.
- All data is time-stamped.
Therefore, we define time series data as data that uniformly indicates the changes over time in a system, process, or behavior.
Value of Time Series Data
The core difference between time series data and other data is that the former can reflect the “change” itself. When you collect new data for an IoT device, do you overwrite the existing readings or create new readings in a new line? Although both methods help you obtain the status of the system, only the second method can be used to track all statuses of the system.
Therefore, the value of time series data is to record every change of the system in a new line to measure changes, analyze past changes, monitor current changes, and predict future changes.
Value of HiTSDB
Why do we use HiTSDB rather than a common database to manage time series data? In fact, we can also use a common database to manage time series data, but the performance will not be optimal.
The reason why we select HiTSDB but not general database technologies is similar. In summary, scale and availability are the core points.
- Scale: Time series data is accumulated at high speed. For example, a connected car generates several hundreds of GB of data per hour. Relational databases have poor performance in the processing of large datasets. Not only Structured Query Language (NoSQL) databases can process large-scale data very well, but their performance is still inferior to HiTSDB that has been fine-tuned for time series data. In contrast, HiTSDB gives top priority to time and processes large-scale data by improving the real-time query efficiency of interval data. It also improves performance, including the data writing speed per second, number of supported device metrics, data reading efficiency, and high compression ratio for data storage. In the meantime, time series data draws more and more attention in the technical field.
- Data source: DB-Engines report in September 2018
- Availability: HiTSDB usually provides some common features and operations for time series data analysis, including the data retention policy, continuous queries, and flexible time aggregation. Compared to other databases, HiTSDB also has good scalability, such as the commonly used downsampling and aggregate computing for time series data. That is why more and more enterprise developers select HiTSDB in various application scenarios.
Reasons for using Alibaba Cloud HiTSDB
Alibaba covers a wide range of business, such as e-commerce transaction tracking, container indicator monitoring, service monitoring, logistics and delivery tracking, and intelligent device monitoring in smart parks, which strongly demand HiTSDB. Alibaba Cloud HiTSDB has the following advantages:
HiTSDB has efficient throughput. Based on a comparison in actual performance stress testing, the data reading efficiency of HiTSDB is one order of magnitude higher than that of open-source OpenTSDB and InfluxDB. When HiTSDB is used to replace the traditional HBase-based solution for actual business, the total cost of machines is reduced by more than 50%.
Lower Data Storage Costs
Because time series data is continuously written and any data change is recorded in HiTSDB, the capacity of HiTSDB must be PB-level. This cannot be achieved by online transaction processing (OLTP) databases. HiTSDB can achieve the lossless compression efficiency of up to 10:1. It greatly reduces data storage costs for business.
Strong Analytical Capability
The core capability of HiTSDB lies in data analysis. HiTSDB provides professional and comprehensive time series data computing functions. It also supports downsampling, data interpolation, and spatial aggregate computing to meet requirements in various complex business data query scenarios. HiTSDB can complete the aggregate analysis of millions of data points in seconds.
Features of HiTSDB
HiTSDB supports multiple computing capabilities, such as downsampling and aggregate computing.
We can take a downsampling example. The administrator of a park needs to collect the electricity consumption data of all the lights in the park for unified monitoring and analysis, to save and control energy. To check the electricity consumption of the past 24 hours, the administrator can directly obtain raw data from HiTSDB to view the electricity consumption trends. To check the electricity consumption trends of the past three years, the administrator can randomly calculate data at a coarser time granularity such as by day, week, or month. All the downsampled data (such as the average value, sum, maximum value, and minimum value) is calculated based on the raw data of hours by using the time series data computing functions. HiTSDB “undertakes” all the calculation processes, and the calculation results can be directly obtained by applications.
To check the electricity consumption of a specific floor, the administrator only needs to send a request containing the floor information to HiTSDB. Then, the administrator can obtain the electricity consumption of all the lights on the required floor in real time. To check the electricity consumption of Phillips electronics, the administrator only needs to submit the brand value to HiTSDB. Electricity consumption can also be checked by park name. Time series data aggregation provides powerful and flexible capabilities. Users can randomly define any dimension for querying aggregated data and obtain query results of different analysis dimensions in real time, without the need to create any index information.
With the development of the Internet of Vehicles (IoV), intelligent transportation, new retail delivery, and other relevant industries, the data storage and analysis scenarios of geographical location information are emerging. Such scenarios are called “spatio-temporal analysis” in the technical field.
To facilitate global vehicle management, IoV management personnel need to clearly know the data of the day, such as the number of vehicles that are running in the operating area, the number of vehicles that are running out of the operating area, and the driving route of each vehicle. To improve the efficiency of urban management, government management personnel need to be clear on the heat distribution trends of crowd flows in the urban area of the day. To manage delivery staff and optimize delivery routes, delivery management personnel of new retail need to know whether delivery staff members deliver goods in the specified area, as well as the delivery route of each delivery staff member. All these depend on the spatio-temporal analysis capability.
Alibaba Cloud will soon release a spatio-temporal analysis feature of HiTSDB to support storage and analysis of geographical location information. This feature will also meet the business requirements for trajectory tracking and statistical analysis of spatial locations.
Time Series Insights
Data visualization is important to the display of data analysis results. HiTSDB provides a basic visualization feature named time series insights to show users the interactive data analysis process in real time. Without the need to develop code, users can complete data query and analysis, and intuitively view data trends.
Quick Start with Alibaba Cloud HiTSDB
The newly released time series insights feature of HiTSDB enables users to import demo data and quickly experience the interactive time series data analysis capability in the following three steps. To learn how you can quickly set up HiTSDB, visit https://www.alibabacloud.com/help/doc-detail/56329.htm