Storing and Managing Taobao’s Trillions of Orders
By Beilou and Lengzhi
1) Introduction to Taobao’s Trade Order System
Tmall and Taobao trade hundreds of millions of physical and virtual commodities every day. The entire chain of a successful transaction includes many steps, such as member information verification, commodity information retrieval, order creation, inventory deduction, discount application, order payment, logistics information updates, and payment confirmation.
Each link of the chain involves the entry creation and status update in the database. A successful transaction corresponds to hundreds of database transactions in the backend information system. The entire database cluster behind the trade system has tens of billions of transaction reads and writes per day. This not only incurs great performance challenges to the database system but also great storage cost pressure due to the massive data that grows on a daily basis.
Orders, as the most critical information, must be stored in the database permanently, since they may involve transaction disputes, and information on them may be necessary to address inquiries at any time. In the 17 years since Taobao was founded, the total number of order-related database entries has reached trillions, and the disk space consumption is in petabytes. With such a huge dataset, it is a major technical challenge to achieve low latency for users’ queries while maintaining a low storage cost. Meanwhile, historical orders from users are massive in number, but retaining this data is necessary.
2) The Architecture Evolution of Taobao’s Trade Order Database
Since Taobao’s foundation in 2003, the architecture of the trade order database has evolved several times with the increasing traffic.
In the first stage, due to low traffic, an Oracle database was used to store all the order information. The creation of new orders and querying historical orders were performed in the same database.
In the second stage, as the historical order data grew, a single database could no longer meet the performance and capacity requirements. Therefore, the database for trade orders was split. A separate Oracle database was built to store the historical data. Orders older than three months were migrated to this historical database. At the same time, due to the huge amount of data, the query performance could not meet the requirements. Therefore, queries on historical orders were not available at that time. Instead, users could only query orders created within the last three months.
In the third stage, the historical database was migrated to HBase to improve scalability and reduce storage costs. The HBase solution successfully provided business inquiry capability and reduced storage costs. The solution combined primary tables with index tables. Primary tables were used to query order details, while index tables were used to retrieve the order number before querying the order through the ID of the buyer or seller.
However, this solution had a problem that orders were not migrated strictly according to the 90-day period. Many types of orders were not migrated to the history database. As a result, the purchase order list was out of order, rather than being sorted by time in descending order. If a user scrolled down the order list one page after another, the user would find that a recent order was suddenly lost. In fact, the order was still there but not listed in sequence by time.
In the fourth stage, the historical database used the PolarDB-X cluster based on X-Engine. This solution met the storage cost requirements and provided the same search capability as the online database and solved the out-of-order problem.
3) Business Pain Points of Taobao’s Transaction Order Database
Looking back at the evolution of Taobao’s trade database, in the 10 years since a separate historical database was split off, the business team and the database team have dealt with several core challenges:
- Maintain a low storage cost. As massive amounts of data are generated every day we must ensure the storage cost is low.
- Provide abundant query features, such as sorting by time, with low storage costs. To do so, the database needs to support secondary indexes, which must ensure good consistency and high performance.
- Provide a low query latency for a better user experience. Although users query orders that were created 90 days ago much less frequently than they query orders created in the past 90 days, such queries may still take place and affect the user experience. Hence, the long-tail latency needs to be confined.
In 2018, Taobao users raised an increasing number of complaints about disordered orders. Database storage issues caused this problem, and this troubled the users a lot. Therefore, the business team decided to fix this problem. Identified from the preceding analysis, an ideal trade history database needs to meet three requirements: low cost, low latency, and abundant features. By using the InnoDB engine, which is also used for online databases, the solution cannot achieve a low storage cost, whereas, by using HBase, the solution cannot leverage consistent secondary indexes.
4) A History Database Solution Based on X-Engine
In 2018, the proprietary X-Engine was gradually implemented within Alibaba Group. Based on the streamlined feature of the trade business, Alibaba had designed a native architecture that separates hot data from cold data. Cold data in X-Engine is compactly packed in data pages, and all the data blocks are compressed by default. This architecture achieves high performance at a low cost. Therefore, X-Engine was quickly implemented in many internal services, such as the case described in the article, “How X-Engine Supported the Surge in DingTalk Data Volumes”.
When we explore the trade history database solutions, one idea is to merge the online database with the history database. We could leverage the X-Engine’s hot-cold separation capabilities to achieve high-performance access to orders within the last 90 days and low-cost storage of orders from more than 90 days ago. At the same time, the order database also provides functions like secondary indexes; this not only solves the order sorting issue but also simplifies the code for the business layer.
However, the transaction order system has been iterated for ten years under the architecture that separates the online database from the history database. The code of many business systems is compatible with this separated architecture. Hence, considering the risks of business code transformation and migration, we inherited the architecture that separates the online and history databases. The only modification was that we replaced the original HBase cluster with a PolarDB-X cluster, a cluster based on X-Engine.
- The online database uses the legacy MySQL InnoDB cluster, but stores only the data within 90 days. That means orders created 90 days ago will be deleted. Due to the small data volume, the database maintains a high cache hit rate and ensures a low read-write latency.
- Orders older than 90 days are synchronized from the online database to the history database, then are deleted from the online database.
- The historical database is migrated to X-Engine and keeps full data of all trade orders. Any read or write to the orders that were created 90 days ago will go to the history database. At the same time, the historical database accommodates all the migration workloads from the online database.
This solution has similar storage costs for the history database as the HBase solution. At the same time, as the history database creates the same indexes as the online database, the support for sorting orders by time has come back. In addition, read-write latency is low.
5) Database Architecture Reference
Considering the continuity of the historical code architecture at the business level, Taobao’s trade order database adopts a solution that separates the history database on X-Engine from the online database on InnoDB. In this architecture, the X-Engine history database simultaneously handles write operations from the online database and read-write operations to the historical orders that were created 90 days ago.
In fact, the recently written entries are frequently accessed and the access frequency decreases sharply over time, the hot-cold separation mechanism within X-Engine handles this streamline business pattern. Hence, a database cluster on X-Engine alone meets all the requirements.
For a new business or an existing business that requires storing massive streamline data and has not performed the hot-cold separation, we recommend using one X-Engine only to reduce storage costs and simplify access code for the database layer. Both X-Engine-based distributed databases and PolarDB-X enable scale-out capability and reduce costs.
Alibaba Cloud has launched X-Engine, and it has been verified by the internal services of Alibaba. If you need high performance at a low cost, X-Engine is the perfect match. For more information, click here.