A Close-Up Look into Alibaba’s New Generation of Database Technologies

How Can Alibaba Cloud’s Database Achieve Ultimate Elasticity?

Deploying Databases in the Cloud

As you probably all know, making databases elastic is a tough job because databases can be demanding in terms of performance and because the migration of massive amounts of data can also be very costly. The first approach to this problem of making databases more elastic is to deploy them in the cloud. The elasticity of the cloud can help secure the resources demanded by databases.

  1. How can databases be deployed in the cloud, and how can a hybrid cloud be built in a short time?
  2. How can performance losses caused by virtualization be minimized?
  3. How can Alibaba Cloud be connected with a customer’s internal network, or private cloud?

Elastic Database Scheduling

Sometimes, cloud resources may not be enough for us to achieve the ultimate level of elasticity. However, with the help of offline/online hybrid deployment technologies, databases can use the computing resources of offline clusters to maximize elasticity and minimize costs. Solutions such as containerization and the isolating storage from computing are fundamental to successful offline/online hybrid deployment. This is because Containerization isolates and centrally schedules computing nodes, and isolating storage from computing is an important foundation for the elastic scheduling capability of databases. Major technological advances over the last few years, such as 25G speed networking, RDMA technologies, and high-performance distributed storage, have made isolated storage a reality.

  • Shortened response latency: The system’s response latency was shortened to 0.4 ms for read and write operations in single channels, and response latency for the RDMA network was shortened to less than 0.2 ms.
  • Asynchronous replication between the second and third replicas: The successful asynchronization of third data replicas helped to provide greater network stability.
  • QoS-based throttling: Back-end I/O traffic was controlled based on the front-end service load to guarantee optimal writing performance.
  • Fast failover: The failovers of single nodes in a storage cluster was reduced to 5s, which is a speed not seen anywhere else in the industry.
  • High availability deployment: Four-rack deployment of single clusters elevated data reliability to a whopping 99.99999999%.

Hybrid Database Deployment for Double 11

Alibaba’s Next-Gen Databases

When it came to developing better databases at Alibaba, we moved from originally using Oracle databases to developing our own in-house solutions. In particular, we developed AliSQL, which is based on MySQL, and the distributed middleware called Taobao Distributed Data Layer (TDDL). Now, starting from 2016, we have been working hard on developing another generation of database technology, one which we call X-DB, with X representing our pursuit for the ultimate level of performance and capabilities.

  • Data must be extensible.
  • Data must be strongly consistent and constantly available.
  • A database needs to store large amounts of important data.
  • Data must show distinct lifecycle characteristics, with cold data and hot data being clear distinguishable from each other.
  • The logic for handling transactions, storage, and payments must be simple and support high-performance scenarios.

X-DB Architecture

Below is the architecture of X-DB:

  • X-Paxos: X-Paxos, designed exclusively at Alibaba, is a high-performance Paxos library, which is the core technology behind achieving a three-node capacity and strong cross-zone and cross-region data consistency. X-Paxos ensures a continuous availability of 99.999%.
  • Batching and pipelining: When committing transactions, X-DB ensures that logs are received and committed by most of the database nodes, which is an important foundation for strong consistency. Transaction commitment is a cross-network process that unavoidably increases latency. Maintaining throughput with high latency can be challenging. However, batching and pipelining tries to commit transactions in batches and allows for receiving and confirming data out of sequence. With batching and pipelining, logs are finally committed in sequence. This approach maintains a high throughput despite high latency.
  • Asynchronous commitment: The database thread pool is waiting during commitment. To maximize the performance, we have adopted the asynchronous commitment technology to maximize the efficiency of the database thread pool. All these solutions combined keep the throughput of X-DB high in the three-node mode.

Performance Tests Comparing X-DB and MySQL


Leading Technologies in Double 11

Use of X-KV in Double 11

X-KV is an exclusive reinforcement technology based on the official MySQL Memcached plugin. After much hard work to optimize it this year, X-KV now supports more data types, including non-unique indexes, combined indexes, the MultiGet function, and Online Schema change. The biggest difference is that it supports SQL conversion through TDDL. The advantages of X-KV are its superior read performance, strong consistency, and low response time. These advantages can reduce overall cost, particularly maintenance costs because applications can be migrated in a transparent manner thanks to X-KV’s support of SQL.

  • Independent connection pool: SQL and KV connection pools are independent of each other but remain synchronized during changes, which allows applications to quickly switch between two sets of APIs.
  • Optimized KV communications protocol: The protocol can be implemented without the involvement of separators.
  • Automatic type conversion of results set: Strings can be automatically converted to MySQL strings.

Solutions for the Seller Library Performance Bottleneck

The turnover of the Double 11 Shopping Festival keeps growing. This in turn has kept the synchronization latency of the online shopper library and seller library high in the last couple of years. As a result, sellers weren’t able to process Double 11 orders promptly, and the seller library suffered from poor performance due to many complex searches. In the past, we tried setting independent queues, combining synchronization channels, and optimizing the seller libraries for large sellers, but none of these completely solved the underlying issue.

The Evolution of the Database Monitoring System

Here are the top four technical challenges for the database monitoring system:

  1. Massive data: The monitoring system needs to monitor 10 million monitoring metrics every second on average, with up to 14 million during peak hours.
  2. Complex aggregation logic: The monitoring system needs to aggregate data from multiple dimensions, including different regions, data centers, units, business clusters, and master/slave databases.
  3. High real-time requirements: The monitoring system screen needs to show the monitored metrics from the previous second.
  4. Computing resources: The monitoring system needs to use the fewest resources possible for collection and computing to save costs and improve overall performance.

Use of CloudDBA during Double 11

Alibaba possesses the most experienced database administrators (DBAs) in the industry, having a massive bank of performance diagnosis data. In the future, we aim to combine the experience of our DBAs and our knowledge in big data and machine intelligence, so that further database analysis and optimizations can be done by machines rather than DBAs three years from now. We see self-diagnosis, self-optimization, and self-O&M as the future trends in database technology.

Prospect of Double 11 in 2020

Our vision of next year’s Double 11 can be defined in three keywords: Higher, Faster, Smarter.

  • Higher: That is, a higher peak value of transactions driven by lower costs. We aim to support higher peak values through even better elasticity, giving users the best possible shopping experience. Our long-term vision is to eliminate any and all system throttling.
  • Faster: Faster speeds is the life-long goal of our technicians. We want faster speeds, faster databases, faster storage, faster hardware, faster everything. Speed matters a lot at Alibaba.
  • Smarter: By Smarter, we mean increased use of machine intelligence during the Double 11 shopping promotion. Be it databases, scheduling, custom recommendations, or customer services, we hope machine intelligence can make major breakthroughs and play an even bigger role in the annual Double 11 shopping event.

Original Source



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com