A Close-Up Look into Alibaba’s New Generation of Database Technologies
By Zhang Rui.
Pictured above is Zhang Rui, head of the Alibaba Database Technologies Team.
Alibaba’s Double 11 Shopping Festival is certainly a grand project in the Internet age. It has pushed forwards loads of new technological innovations, and has provided Alibaba Cloud with a unique platform to test all sorts of new technologies.
This year Alibaba aims to support the highest possible QPS value at the midnight launch of the event, which is when traffic hits its highest peak. Doing so will allow Alibaba to provide the best user experience possible. However, this requires a powerful solution that offers the greatest levels of elasticity and stability while still being cost effective.
In this article, we will look into how Alibaba Cloud can achieve this with its latest generation of cloud database technologies.
How Can Alibaba Cloud’s Database Achieve Ultimate Elasticity?
Deploying Databases in the Cloud
As you probably all know, making databases elastic is a tough job because databases can be demanding in terms of performance and because the migration of massive amounts of data can also be very costly. The first approach to this problem of making databases more elastic is to deploy them in the cloud. The elasticity of the cloud can help secure the resources demanded by databases.
However, there are several challenge involved with deploying databases in the cloud. Consider these major challenges.
- How can databases be deployed in the cloud, and how can a hybrid cloud be built in a short time?
- How can performance losses caused by virtualization be minimized?
- How can Alibaba Cloud be connected with a customer’s internal network, or private cloud?
Through several years of research and development, we have created a series of solutions to overcome all of the above challenges. First, databases deployed in the cloud can use Alibaba Cloud’s high-performance Elastic Compute Service (ECS) service solution that uses the Storage Performance Development Kit (SPDK), Data Plane Development Kit (DPDK) frameworks, and NVM Express (NVMe) storage. This solution can dramatically minimize virtualization losses. Second, we built the Hybrid Cloud Database Management (HDM) solution, which allows you to easily manage both your on- and off-cloud environments at the same time. Through using this solution, we were able to quickly connect all of our internal hybrid cloud systems in support of the Double 11 shopping festival. And third, you can use Alibaba Cloud’s Virtual Private Cloud (VPC) to achieve a hybrid cloud connectivity solution. Internally, we used it to connect Alibaba Group’s internal network with Alibaba Cloud’s public cloud network.
Elastic Database Scheduling
Sometimes, cloud resources may not be enough for us to achieve the ultimate level of elasticity. However, with the help of offline/online hybrid deployment technologies, databases can use the computing resources of offline clusters to maximize elasticity and minimize costs. Solutions such as containerization and the isolating storage from computing are fundamental to successful offline/online hybrid deployment. This is because Containerization isolates and centrally schedules computing nodes, and isolating storage from computing is an important foundation for the elastic scheduling capability of databases. Major technological advances over the last few years, such as 25G speed networking, RDMA technologies, and high-performance distributed storage, have made isolated storage a reality.
The preceding figure shows the isolated storage database architecture use at Alibaba. This architecture contains a storage, network, and computing layer. The storage layer uses Alibaba’s proprietary distributed storage system, named Apsara Distributed File System (nicknamed Pangu), and the computing nodes of the database are deployed on Alibaba’s exclusive solution, named PouchContainer, where they are connected with the storage nodes through a 25G high-speed network.
Next, for the successful isolation of storage in the above solution, we worked hard to optimize the Apsara Distributed File System. Here’s what we did:
- Shortened response latency: The system’s response latency was shortened to 0.4 ms for read and write operations in single channels, and response latency for the RDMA network was shortened to less than 0.2 ms.
- Asynchronous replication between the second and third replicas: The successful asynchronization of third data replicas helped to provide greater network stability.
- QoS-based throttling: Back-end I/O traffic was controlled based on the front-end service load to guarantee optimal writing performance.
- Fast failover: The failovers of single nodes in a storage cluster was reduced to 5s, which is a speed not seen anywhere else in the industry.
- High availability deployment: Four-rack deployment of single clusters elevated data reliability to a whopping 99.99999999%.
We also worked hard to optimize the database side. Among optimizations, he most important one was the reduction of the traffic volume between computing and storage nodes, which in turn helped to limit the overall impact of network latency on database performance. To do this, we first doubled the database throughput by optimizing the synchronization of redo logs, and then disabled the double write buffer of databases, as Apsara Distributed File System supports atomic writes. Making these changes helped to increase database throughput by 20% and save a total of 100% of network bandwidth resources.
Hybrid Database Deployment for Double 11
The changes of containerization and isolated storage allowed the databases to be stateless and have superior scheduling capabilities. This in turn also meant that, during the peak hours of Double 11, through mounting shared storage to different offline computing clusters, the databases could be scaled quickly and achieve faster elasticity.
Alibaba’s Next-Gen Databases
When it came to developing better databases at Alibaba, we moved from originally using Oracle databases to developing our own in-house solutions. In particular, we developed AliSQL, which is based on MySQL, and the distributed middleware called Taobao Distributed Data Layer (TDDL). Now, starting from 2016, we have been working hard on developing another generation of database technology, one which we call X-DB, with X representing our pursuit for the ultimate level of performance and capabilities.
Alibaba’s business scenarios are highly demanding on database performance:
- Data must be extensible.
- Data must be strongly consistent and constantly available.
- A database needs to store large amounts of important data.
- Data must show distinct lifecycle characteristics, with cold data and hot data being clear distinguishable from each other.
- The logic for handling transactions, storage, and payments must be simple and support high-performance scenarios.
These requirements define several of the important features that we require in the new generation of databases: strong consistency, global deployment capabilities, a distributed structure, high performance, high availability, and the automatic management of the data lifecycle.
Below is the architecture of X-DB:
The XDB architecture, as shown in the figure above, introduces the Paxos distributed consensus protocol stack and can be remotely deployed. Despite an increased network latency, the X-DB database still maintains high performance with a high throughput, equalizing with an isolated host under a three-node mode in the same region. With this architecture, X-DB is also highly tolerant of network jitters.
Now, let’s consider some of the core technologies of X-DB:
- X-Paxos: X-Paxos, designed exclusively at Alibaba, is a high-performance Paxos library, which is the core technology behind achieving a three-node capacity and strong cross-zone and cross-region data consistency. X-Paxos ensures a continuous availability of 99.999%.
- Batching and pipelining: When committing transactions, X-DB ensures that logs are received and committed by most of the database nodes, which is an important foundation for strong consistency. Transaction commitment is a cross-network process that unavoidably increases latency. Maintaining throughput with high latency can be challenging. However, batching and pipelining tries to commit transactions in batches and allows for receiving and confirming data out of sequence. With batching and pipelining, logs are finally committed in sequence. This approach maintains a high throughput despite high latency.
- Asynchronous commitment: The database thread pool is waiting during commitment. To maximize the performance, we have adopted the asynchronous commitment technology to maximize the efficiency of the database thread pool. All these solutions combined keep the throughput of X-DB high in the three-node mode.
Performance Tests Comparing X-DB and MySQL
The above comparison is of our in-house X-DB solution against Oracle’s official Group Replication. This is a standard sysbench test with all three nodes deployed in the same data center. In the Insert scenario, X-DB records a QPS value that is 2.4 times of that of typical MySQL database and with less response time.
Next, this is a standard sysbench test under the remote deployment mode. In the Insert scenario, X-DB (50,366 QPS) shows a commanding advantage by handling nearly 6 times the QPS of MySQL Group Replication (8,481 QPS) within a much shorter response time. X-DB’s response time is 58 ms, 38% of the 150 ms response time of MySQL Group Replication.
By replacing a traditional master/slave mode with the same-region cross-zone three-node deployment mode, we are able to guarantee high data quality and high availability across zones. Features of this solution include strong cross-zone data consistency, zero loss of the unavailable data of a single zone, failback in seconds upon the unavailability of a single zone, and automatically closed failover and failback without the need of third-party components. Plus, all of this can be achieved with no extra cost compared with the traditional master/slave mode.
In cross-region deployment mode, we can achieve active geo-redundancy by using more fundamental database technologies and degrade three-region six-replica (in the master/slave mode) to three-region five-replica (in the three-region five-node four-data center mode). From the business perspective, these features have several clear advantages: strong cross-region data consistency, zero loss of unavailable data in a single region, high performance in cross-region strong synchronization scenarios, and optional prioritized failover to another replica in the same region.
Leading Technologies in Double 11
Use of X-KV in Double 11
X-KV is an exclusive reinforcement technology based on the official MySQL Memcached plugin. After much hard work to optimize it this year, X-KV now supports more data types, including non-unique indexes, combined indexes, the MultiGet function, and Online Schema change. The biggest difference is that it supports SQL conversion through TDDL. The advantages of X-KV are its superior read performance, strong consistency, and low response time. These advantages can reduce overall cost, particularly maintenance costs because applications can be migrated in a transparent manner thanks to X-KV’s support of SQL.
TDDL for X-KV also provides the following features:
- Independent connection pool: SQL and KV connection pools are independent of each other but remain synchronized during changes, which allows applications to quickly switch between two sets of APIs.
- Optimized KV communications protocol: The protocol can be implemented without the involvement of separators.
- Automatic type conversion of results set: Strings can be automatically converted to MySQL strings.
Solutions for the Seller Library Performance Bottleneck
The turnover of the Double 11 Shopping Festival keeps growing. This in turn has kept the synchronization latency of the online shopper library and seller library high in the last couple of years. As a result, sellers weren’t able to process Double 11 orders promptly, and the seller library suffered from poor performance due to many complex searches. In the past, we tried setting independent queues, combining synchronization channels, and optimizing the seller libraries for large sellers, but none of these completely solved the underlying issue.
The answer to this was ESDB, which is a proprietary distributed document database based on Alibaba Cloud Elasticsearch (ES) that supports SQL APIs, allowing applications to seamlessly migrate from SQL to ESDB. With this solution, dynamic secondary hashing was provided to larger sellers, which in turn eliminated the data synchronization bottleneck. At the same time, the ESDB solution also supports complex searches.
The Evolution of the Database Monitoring System
Here are the top four technical challenges for the database monitoring system:
- Massive data: The monitoring system needs to monitor 10 million monitoring metrics every second on average, with up to 14 million during peak hours.
- Complex aggregation logic: The monitoring system needs to aggregate data from multiple dimensions, including different regions, data centers, units, business clusters, and master/slave databases.
- High real-time requirements: The monitoring system screen needs to show the monitored metrics from the previous second.
- Computing resources: The monitoring system needs to use the fewest resources possible for collection and computing to save costs and improve overall performance.
The architecture of the database monitoring system has gone through three generations. First generation consisted of an agent and MySQL database, and the second generation consisted of an agent, DataHub, and distributed NoSQL database, and the third generation consisted of an agent, the real-time computing engine, and HiTSDB. Let’s discuss the third generation below.
HiTSDB, Alibaba’s own in-house time series database, is perfect for storing massive amounts of monitoring data, solving the issues of the previous generations. The real-time computing engine pre-processes performance data every second and stores the data in HiTSDB. This third generation of architecture is highly optimized, making sure that these monitoring capabilities do not degrade performance, even during extreme high-traffic scenarios. Moreover, having the monitoring system always on has helped Alibaba gain greater insights about our databases and locate issues quickly.
Use of CloudDBA during Double 11
Alibaba possesses the most experienced database administrators (DBAs) in the industry, having a massive bank of performance diagnosis data. In the future, we aim to combine the experience of our DBAs and our knowledge in big data and machine intelligence, so that further database analysis and optimizations can be done by machines rather than DBAs three years from now. We see self-diagnosis, self-optimization, and self-O&M as the future trends in database technology.
This year’s Double 11 also saw Alibaba’s pilot run of CloudDBA. By analyzing monitoring data and all SQLs, we achieved SQL self-optimization, in particular the optimization of slow SQL calls, space optimization through the analysis of useless tables and useless indexes, access model optimization with SQL and KV, and the prediction of storage space growth.
Prospect of Double 11 in 2020
Our vision of next year’s Double 11 can be defined in three keywords: Higher, Faster, Smarter.
- Higher: That is, a higher peak value of transactions driven by lower costs. We aim to support higher peak values through even better elasticity, giving users the best possible shopping experience. Our long-term vision is to eliminate any and all system throttling.
- Faster: Faster speeds is the life-long goal of our technicians. We want faster speeds, faster databases, faster storage, faster hardware, faster everything. Speed matters a lot at Alibaba.
- Smarter: By Smarter, we mean increased use of machine intelligence during the Double 11 shopping promotion. Be it databases, scheduling, custom recommendations, or customer services, we hope machine intelligence can make major breakthroughs and play an even bigger role in the annual Double 11 shopping event.