Secret Weapon | How AnalyticDB Strongly Supports Double 11

By ApsaraDB

The 2020 Double 11 Global Shopping Festival has come and gone, but the exploration of technology will never stop. Each year’s Double 11 is not only a carnival for shopaholics but also a big test for personnel working with data. It is a stage for testing the technical level and innovation practices of the Alibaba Cloud Database Technology Team.

Preface

AnalyticDB: Cloud-Native Data Warehouse

AnalyticDB can be used in all data warehouse scenarios, including report querying, online analysis, real-time data warehouses, and extract, transform, load (ETL) operations. AnalyticDB is compatible with MySQL and traditional data warehouse ecosystems with a low threshold for use.

AnalyticDB Fully Supports Double 11

During the 2020 Double 11 Global Shopping Festival, AnalyticDB supported core services of enterprises, such as Jushuitan, 4PX EXPRESS, and EMS, on the public cloud. On the private cloud, it supported various services of China Post Group. AnalyticDB provided ETL for data processing, real-time online analysis, core reports, big screens, and monitoring for these enterprises. It also provided stable offline and online data services for tens of thousands of merchants and tens of millions of consumers.

The Challenges of AnalyticDB

1. Alibaba Group’s Core Transaction Procedure

The first-time application of the newly developed row storage engine for AnalyticDB was excellent. The storage engine supports high-concurrency online search and analysis for tens of millions of QPS. Key technologies include high-concurrency query procedure, brand new row storage, TopN push-down on any column, and combined indexing and intelligent index selection. It also supports over 10,000 QPS and linear extensibility for a single node. With the same resources, the performance of point query, aggregation, and TopN in a single table of the storage engine is 2–5 times higher than open-source ElasticSearch. The storage engine also reduces storage space by 50%, and its writing performance is 5–10 times higher. Moreover, the real-time visibility and high reliability of data are also guaranteed.

2. More Production and Operation Stages

The architecture of the Cainiao data warehouse is listed below:

3. More Import Tasks

4. More Writing Service With High Throughput

5. Online and Offline Hybrid Loads

The Latest Key Technologies of AnalyticDB

Compute-Storage Separation

Products under the AnalyticDB reservation mode are based on the Shared-Nothing architecture, featuring good extensibility and concurrency. The backend uses the compute-storage coupling approach, while the same resources are shared by the compute and storage modules. The storage capacity and computing capability are related to the number of nodes. Users can adjust their resource requirements by increasing or reducing the number of nodes. However, users cannot freely arrange the computing and storage resources to meet the requirements of different business loads. In addition, the adjustment of the node number is often coupled with a large amount of data migration. This will take a long time and have an impact on the running loads of the current system. Moreover, computing and storage resources cannot be flexibly arranged, which leads to cost performance problems.

Embracing the elastic capabilities of cloud platforms, AnalyticDB adopted a new elastic mode for products and the new compute-storage separation architecture at the backend. AnalyticDB provides a service-oriented Serverless storage layer, and its computing layer can be independently and elastically scaled while maintaining the performance of the reserved mode. By decoupling compute from storage, users can flexibly scale computing resources and storage capacity to control total costs. There will be no need for data migration in terms of the scaling of computing resources, thus providing the ultimate elastic experience for users.

Hot/Cold Data Hierarchy

The high cost-effectiveness of data storage is one of the core competitiveness of cloud data warehouses. AnalyticDB provides an enterprise-level hierarchy of hot and cold data. AnalyticDB independently selects hot and cold storage media based on the granularity of tables and table dual partitions. For example, all table data can be stored in an SSD or HDD. Users can choose different storage types according to the business requirements, and the hot and cold data policies can be randomly converted. In addition, the space for hot and cold data in the Serverless storage layer is charged in pay-as-you-go mode. In the future, AnalyticDB will implement intelligent hot and cold data partitioning, that is, automatic pre-heating of cold data based on users’ business access model.

Definitions of Hot and Cold Storage

A full hot-data table (all partitions in the SSD), a full cold-data table (all partitions in the OSS), and a hybrid table (some partitions in the SSD and some in the OSS) can be flexibly defined by users. Thus, the balance between performance and cost can be realized. The following is an example of a game log table. With the hybrid partitioning policy, the data of the latest seven days is stored in hot partitions, and the data before that is stored in cold partitions.

create table event(
id bigint auto_increment
dt datetime,
event varchar,
goods varchar,
package int
...
) distribute by hash(id)
partition by value(date_format(dt, '%Y%m%d')) lifecycle 365
storage_policy = 'MIXED' hot_partition_count = 7;

Automatic Migration of Hot and Cold Data

The design is based on the design concept of writing/reading optimized storage. AnalyticDB uses the adaptive full index by default to realize the efficient combined analysis in any dimension. It means that each column has a column-level index, which ensures that AnalyticDB can run queries out of the box. However, write performance is also challenged. Therefore, AnalyticDB uses an LSM-like architecture and divides the storage into two partitions: real-time and historical partitions. The real-time partition uses block-level rough indexes of row storage and hybrid storage, and the historical partition uses the full index to ensure extremely fast querying. In addition, AnalyticDB transforms real-time partitions into historical partitions through the build task based on the backend data. The automatic transformation mechanism of hot and cold partitions is listed below:

  • When data has accumulated to a certain extent, the build task is automatically scheduled internally. Snapshots are created for real-time data, and the data is collated. Then, new historical partitions are created and written to the hot and cold areas according to the hot and cold policies.
  • While the build task is scheduled, historical partitions are automatically migrated from the hot area to the cold area according to the sliding window of the hot and cold policies. In the following figure, there are three defined hot partitions. On November 4, hot partitions were 11–04, 11–03, and 11–02. On November 5, the new 11–05 data was written, and the latest hot partitions were 11–05, 11–04, and 11–03 according to the sliding window. Therefore, the build task triggered the migration from hot partitions to cold partitions. As shown, partition 11–02 was automatically migrated to the cold area.

Query Acceleration of Cold Data

  • SSD Cache Entry with Different Granularities: This ensures that the random search and throughput-based data scanning of indexes can be done at the same time.
  • Metadata Pre-Heating: After the build task ends, the metadata of the cold partition is automatically generated to accelerate access.
  • Unlocked Access Queues of Hot and Cold Data: This prevents frequently accessed data from being swapped in and out frequently.

Hot and Cold Storage Usage

create table t1(
id int,
dt datetime
) distribute by hash(id)
partition by value(date_format('%Y%m',dt)
lifecycle 12
storage_policy = 'HOT';

2. Full Cold-Data Table: It applies when all of the data of the table is infrequently accessed, requiring low access performance. The DDL statement for this table is listed below:

create table t2(
id int,
dt datetime
) distribute by hash(id)
partition by value(date_format('%Y%m',dt)
lifecycle 12
storage_policy = 'COLD';

3. Hybrid Table: It applies when hot and cold data are mixed according to time duration. For example, the data of the last month is frequently accessed and requires high access performance, while the data of earlier months is cold data with infrequent access. The DDL statement for this table is listed below:

create table t3(
id int,
dt datetime
) distribute by hash(id)
partition by value(date_format('%Y%m',dt)
lifecycle 12
storage_policy = 'MIXED' hot_partition_count=1;

Dual partitions are created for the entire table based on months and store data of 12 months in total. Data of the last month is stored in the SSDs, and data before that is stored in HDDs. By doing so, a good balance between performance and cost is achieved.

Online-Offline Integration

Hybrid Computing Load

Online Analysis (Online/Interactive)

Batch

Complex Computing (such as Iterative/ML)

Multi-Tenancy in Resource Group (Pool)

Resource Group-Related Orders

-- Create a resource group.
CREATE RESOURCE GROUP group_name
[QUERY_TYPE = {interactive, batch}] -- Specify the execution mode of the resource group query.
[NODE_NUM = N] -- Number of resource group nodes.

-- Bind a resource group.
ALTER RESOURCE GROUP BATCH_RG ADD_USER= batch_user
-- Resize a resource group.
ALTER RESOURCE GROUP BATCH_RG NODE_NUM= 10
-- Delete a resource group.
DROP RESOURCE GROUP BATCH_RG

Resource groups support the following query execution modes:

  1. Interactive Mode: The all-in-memory and pipelined methods are used for online analysis that requires low latency.
  2. Batch Mode: The stage by stage model is used. Intermediate results and operator states can be persisted. This mode is suitable for queries that require high throughput and low latency, featuring lower computing costs.

Scheduled Elasticity

Intelligent Optimization

Query optimization is a key factor that affects the performance of the data warehouse system. Problems, such as generating better execution plans, partitioning methods, index configuration, and statistics update time, often bother data developers. AnalyticDB has been deeply engaged in intelligent optimization technology. By monitoring running queries in real-time, AnalyticDB dynamically adjusts the execution plan and the statistics it depends on and automatically improves query performance. The built-in intelligent algorithms allow dynamic adjustment on engine parameters according to the real-time operating status of the system. Thus, the current query load can be dealt with.

Smart Adjustment

Optimizations for Dynamic Execution Plans

Dynamic Management of Materialized Views

Summary

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.