By Zhang Youdong (Linqing) from the ApsaraDB team
Apache Database for IoT (IoTDB) is a database specifically designed for IoT time series data to provide data collection, storage, and analysis functions. IoTDB provides an integrated solution with high-performance data reading and writing and rich query capabilities on the cloud. It customizes an efficient directory organization structure for IoT scenarios and seamlessly integrates with big data systems, such as Apache Hadoop, Spark, and Flink. It provides lightweight TsFile management on edge nodes. Data on edge nodes can be written to the local TsFile, and basic query capabilities are provided. TsFile data can be synchronized to the cloud.
TsFile is a file format customized for storing time series data on IoT devices. It is organized in a tree directory structure. One TsFile can store the data of multiple devices, and each device contains multiple measurements (metrics.) The following figure shows a TsFile that contains the data of two devices, which are identified as d1 and d2. Each device contains three monitoring metrics: s1, s2, and s3.
The TsFile is a multi-level mapping table.
TsFileMetaData ==> TimeSeriesMetadata ==> ChunkMetadata ==> Chunk.
TsFileMetadatadescribes an entire TsFile, which contains metadata information, such as version information, the location of
MetadataIndexNode, and the total number of chunks.
- MetadataIndexNode contains multiple
TimeSeriesMetadatapoints to the metadata information of a device, the
ChunkMetadatapoints to the ChunkHeader location and corresponds to the final chunk data.
The built-in query engine in IoTDB parses all user commands, generates a plan, submits the plan to the corresponding executor, and returns the result set. Through the query engine, IoTDB provides a JDBC API, which is simple and easy to use.
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLEIoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true);
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71)IoTDB> SELECT status FROM root.ln.wf01.wt01
Total line number = 2
The metadata model of IoTDB is organized in a tree structure. An instance contains multiple
storage groups that are similar to the concept of namespace and database. A
storage group contains multiple
device contains multiple
measurements. The time series data corresponding to
measurements is stored in
TsFile chunks. To facilitate data expiration, each
storage group segments data by time range and stores data in different directories. By default, data is segmented by week.
//Storage Group storage structure
-- [Storage group name 1]
------ [Time partition ID 1]
------ [Time partition ID 2]
-- [Storage group name 2]
The IoTDB storage engine is designed based on the LSM Tree structure. First, the written data is recorded in the WAL. Then, it is written to the memtable in the memory and gradually written to the TsFile on the disk in the background. The TsFile on the disk is compacted based on certain rules to ensure query efficiency.
IoTDB can be deployed on edge nodes and the cloud. Generally, data collected on edge nodes need to be synchronized to a remote end for further analysis and processing. IoTDB provides a synchronization tool to synchronize TsFile data on terminals or devices to the cloud.
IoTDB supports seamless connection with existing big data processing systems, including Hive and Spark. IoTDB provides connectors, such as
spark-iotdb, so Hive and Spark can directly access the TsFile data and IoTDB data.
- IoTDB customizes IoT models, provides JDBC access methods, and supports integrated deployment on the edge and cloud.
- IoTDB provides a Hadoop File system for storage. In addition, it provides multiple connectors that interconnect seamlessly with the existing big data ecosystem.
- TsFile is an open storage format with a simple device model that is easy to understand.
- The IoTDB TsFile structure is currently available only in Java. Its resource usage is high in lightweight edge devices, which limits its application on the terminal side and device side.
- Currently, only the standalone version is available for the cloud, which cannot meet the need to connect massive device data to the cloud.
HDFS or local disks are used for storage. HDFS for storage can ensure the high availability of the storage layer, but not of the computing layer.