Performance Breakthrough — Reconstruction of IoT System Based on Kafka + OTS + MaxCompute

  1. The scope of the online analytical processing (OLAP) and online transaction processing (OLTP) systems is not clearly defined in the system architecture. Java programs are used to collect task statistics for Open Table Service (OTS) tables on a regular basis. (Note that OTS is renamed Table Store in Alibaba Cloud.) The code is complex and the performance is extremely poor, affecting the normal operation of other OLTP systems on the server.
  2. The stored and parsed packet data is not optimized specifically in accordance with OTS pricing rules. A large JSON string contains too many redundant keys, each of which is also excessively long (with 30 character strings on average).
  3. OTS supports each company in designing its own tables. There is a risk that the number of tables created under an instance may exceed the OTS limit, which is 64 tables per instance.
  4. OTS divides a table into partitions based on vehicle-month. The size of a partition can be 30 GB, which is too large and exceeds the 1 GB recommended by OTS.
  5. OTS partitions of a vehicle are continuously distributed but not hashed, and accordingly, concurrent performance is not optimal at the physical machine level.
  6. The code is not optimized for the most important read scenario where packets can be queried by day and vehicle.

Reorganizing the System Architecture

Before optimizing the system, we need to reorganize the system architecture and clearly define the applicable scope of middleware products to be used in the data solution. The following figure shows the clear definition of data that flows in various phases:

  1. Introducing Kafka to provide basic support for the asynchronous decoupling of multiple phases and speed up the response to packets from the terminals.
  2. Bringing in Alibaba Cloud MaxCompute, formerly known as Open Data Processing Service (ODPS), as the basic support for the OLAP system. Transfer complex business analysis to MaxCompute for processing.
  3. Reconstructing OTS models in accordance with OTS pricing principles (this improvement is not discussed in this document).

Optimizing the New Architecture

First, let’s talk about costs. Based on the frequency of data usage, we divide data into three types: online, offline, and archived. Regarded as archived data, packet data reported by vehicles is archived and stored on Object Storage Service (OSS). Online data has a preset life cycle of N months and mainly includes the parsed packet data that needs to be queried in real time. Offline data mainly includes various intermediate results and report data produced after the offline analysis and statistics of the parsed packet data.


After the reconstruction of offline statistical analysis, the system can make the best of MaxCompute’s parallel computing capability and draw support from its powerful functions, especially window functions, to achieve impressive analysis performance. For example, when we help a customer with statistical analysis of a core component, a professional staff member used to spend a day in analyzing a part and it was easy to make mistakes. With the help of MaxCompute’s analytical capability, we can finish the data analysis of nearly 1,000 components in 10 minutes. The following curve chart shows the average values during each data fluctuation period. These values could hardly be calculated manually before and required very complex Java coding. With the support of MaxCompute, the system can now accomplish this task with ease.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: