AnalyticDB: Alibaba Cloud’s Cloud-native Data Warehouse Broke the TPC-DS World Record

  • Data volumes are growing explosively. The global data volume is expected to reach 40 ZB by 2020 and grow more than fourfold by 2025.
  • Enterprises are migrating to the cloud at a significantly faster rate. By 2025, 50% of enterprise data will be stored in the cloud, while 75% of enterprise databases will run in the cloud.
  • There is a strong demand for real-time data processing. By 2025, 30% of global data will be processed in real-time.
  • Data is intelligence-oriented. With the development of artificial intelligence (AI) and 5G technologies, unstructured data is growing rapidly. By 2025, 80% of data will be unstructured data.
  • Data import
  • Query performance (single query and multiple concurrent queries)
  • Query complexity (covering star and snowflake schemas, and support for complex window functions)
  • Availability (data consistency and fault tolerance for damaged disks)

1) AnalyticDB

2) TPC-DS Performance Benchmarks

  • The data set scale is large. For example, the store_sales table contains more than 28 billion rows.
  • It is oriented to real retail decision-making scenarios, and its SQL statements are very complex. It covers the core parts of SQL-99 and SQL: 2003 as well as online analytical processing (OLAP) standards. It includes low-latency ad hoc queries of reports and massive data mining and high-throughput analysis queries.
  • It involves many multi-dimensional test items, which require high performance, high reliability, high availability, and high cost-effectiveness, as well as extract, transform, load (ETL) and atomicity, consistency, isolation, durability (ACID) capabilities.

3) Technical Architecture of AnalyticDB for MySQL 3.0

4) AnalyticDB Storage Technology

4.1) Distributed and Highly Consistent Storage

  • The first level is cluster instance-based parallelism. A user instance is divided into multiple storage node groups (worker groups), and each worker group consists of N (usually 3, or another odd number) worker nodes. A worker node is equivalent to a user data node container. Worker nodes are grouped to ensure stable communication during large-scale system expansion and to facilitate the parallel upgrade and O&M of the system.
  • The second level is database-based parallelism. A user database is split into N shards. To ensure strong data consistency, each shard is an independent Raft group. Multiple shards form multiple Raft consensus groups. Shards can be Hash partitions or Range partitions. In general, data alignment can be performed in Hash partitions to avoid data shuffling when large data tables are joined. Shards can be balanced or migrated between different worker groups when necessary. Shards also support dynamic splitting and merging.
  • The third level is table-based parallelism. Historical data storage in data warehouse scenarios involves data partitioning. For example, in TPC-DS, store_sales can be partitioned by time period. Data partitioning facilitates data lifecycle management and supports the query of partition pruning and dynamic file pruning (DFP). This helps greatly narrow down the data computing scope.

4.2) High-performance Batch Import

4.3) High-throughput Real-time Data Update in DML

4.4) Row-column-mixed Storage and Intelligent Indexing

5) AnalyticDB Query Technologies

5.1) CBO Query Optimization Framework

  • Search Framework: From the perspective of database development, the Cascades-based search framework has become the industry standard. The commercial database SQL Server and the open-source database GP/ORCA are implemented by Cascades. We also implemented the AnalyticDB for MySQL optimizer CBO based on Cascades. The core problem of the search framework is that the search space will expand rapidly but the search response time must be within milliseconds. Therefore, you need an efficient optimization rule to generate a search space, an efficient data structure to store the search space, an efficient search algorithm to traverse the search space, and an efficient pruning policy to crop the search space.
  • Distributed Parallel Plan: The distributed MPP databases pose new challenges for the optimizer. In a distributed MPP database, the distribution of data directly affects the correctness of data. To meet the data distribution requirements of different operators, data redistribution is inevitable. However, the cost of data redistribution (data shuffling) is very high. Therefore, you need to reduce data shuffling as much as possible while also ensuring data correctness. For the distributed MPP database optimizer, the data partitioning, sorting, and grouping properties are required for the search space, and the optimal distributed parallel execution plan needs to be selected based on cost considerations.
  • Cost Estimation: Cost estimation is a key factor that allows the optimizer to find the optimal plan. Cost estimation involves the derivation of statistical information and a cost model. The derivation of statistical information depends on the statistical information of the original table, the derivation algorithm of intermediate operators, and various data assumptions (homogenization assumptions, independence assumptions, and inclusion assumptions). Guesses must be made in extreme cases. Therefore, the derivation of statistical information is full of uncertainty, making it much more difficult for the optimizer to find the optimal solution. Essentially, we must break the assumptions about data properties to estimate statistics correctly. However, breaking these assumptions will increase our costs.
  • Statistics Collection: We must collect the necessary statistics before a CBO runs. Basic statistical information is collected and updated automatically, while advanced statistics can be collected manually. This provides a reliable and multidimensional statistics basis for CBO. However, some statistical information may be lost or not collected on time. To prevent the generation of an inappropriate plan, you can dynamically collect sample data at runtime to obtain the necessary statistical information.

5.2) Hybrid Query Execution Framework

  • Just-in-time (JIT) Compilation: JIT compilation is data-centric. After the previous operator processes a piece of data, the next operator will compute it directly in the CPU cache. JIT compilation is cache-friendly and suitable for computing-intensive tasks.
  • Vectorization: In vectorization, an operator processes a batch of data and then sends the results to the next operator for computing. Vectorization is suitable for memory-intensive tasks and vector computing and achieves high cohesion of operator computing at the cost of overhead for materializing intermediate results.

5.3) Unified Memory Management

  • Binary memory unifies the memory type to store different types of data in the same type (byte). This is the basis for row-oriented storage and cache-friendly algorithm optimization to facilitate query execution.
  • Standardized memory management specifications unify the memory specifications to reduce the extra costs due to memory fragmentation and the difficulty of memory reuse.
  • Hierarchical memory management unifies memory management, maps memory lifecycles based on computing characteristics, implements MemoryCache and MemoryPool based on memory usage characteristics, and supports memory leak detection. This implements active memory management oriented to resident services.

5.4) DFP and CTE Technologies

  • Efficient Building of Dynamic Predicates: This allows you to build dynamic predicates by using in-place in the process, which reduces the cost of building dynamic words.
  • Multi-layer Filtering Optimization: Bloom Filter is used to prune partitions and perceive storage indexes, which accelerates filtering.
  • Disparate Data Source Push-down: Implementation of the unified data source interface layer is abstracted to support more disparate data sources.

6) Customer Scenarios and Cases

7) Summary and Outlook

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

4.97K Followers

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com