Why AnalyticDB for MySQL is the Best Bet for Building a Real-time Data Warehouse

Alibaba Cloud
8 min readSep 1, 2020

Alibaba Cloud launched the Basic Edition of AnalyticDB for MySQL to simplify the process of building a data warehouse. AnalyticDB for MySQL Basic Edition features high compatibility with MySQL, exceptional low costs, and superior performance, allowing small- and medium-sized enterprises (SMEs) to easily build a real-time data warehouse and mine data online for better value.

AnalyticDB for MySQL comes in Basic Edition and Cluster Edition. The Basic Edition provides services through a single node based on a minimalist architecture that significantly reduces costs. The Basic Edition provides robust analysis performance by using a storage-computing separation architecture, row-column hybrid store, lightweight indexing, and distributed hybrid computing engine. Using the Basic Edition, an enterprise can build a real-time data warehouse at an annual cost of less than RMB 10,000, without having to establish a dedicated big data team, which instead costs millions of RMB.

1) AnalyticDB for MySQL Basic Edition: Technical Architecture

Let’s take a look at the technical architecture of AnalyticDB for MySQL Basic Edition, which consists of a coordinator and a worker.

1.1) Coordinator — Frontend Control Node

Coordinator, the frontend node of AnalyticDB for MySQL Basic Edition executes the following key functions:

  • The coordinator connects to the MySQL protocol and parses SQL statements.
  • It implements authentication and authorization based on a comprehensive and fine-grained permission system model. The coordinator provides a whitelist feature and cluster-level control function through Resource Access Management (RAM). It records and audits all SQL operations for compliance purposes.
  • it supports cluster management with various functions such as member management, metadata management, data consistency, route synchronization, and backup and restoration (data and log management).
  • It manages asynchronous tasks in the background.
  • It implements transaction management.
  • It provides an optimizer to generate execution plans.
  • It schedules computing tasks.

1.2) Worker — Storage and Compute Node

Worker, the storage and compute node of AnalyticDB for MySQL Basic Edition includes the following components:

a) Computing Module: It uses the distributed hybrid computing engine and optimizer integrating massively parallel processing (MPP) and directed acyclic graph (DAG) to improve complex computing and hybrid load management capabilities. The computing module schedules resources in a flexible and elastic manner on the Alibaba Cloud Computing Platform. Start the Worker Compute Node independently and scale-out as needed in a matter of minutes or even seconds to maximize resource utilization.
b) Storage Module: It’s more lightweight and supports real-time reading and writing with higher throughput. The write performance is about 50% higher as compared to the earlier version with the same specifications. Data is written in several milliseconds to support real-time analysis. The storage node supports backup and restoration in full and incremental modes. It synchronizes the periodic snapshots and logs of cloud disks to Object Storage Service (OSS) in real-time for storage. This improves the security of your data and helps you recover data when a database error occurs.
c) Worker Groups: The worker nodes with storage modules are allocated to worker groups. The Cluster Edition provides triplicate storage and uses Raft, a consensus distributed protocol, to allow the worker nodes to work as a whole. There is no disruption in service continuity, even when some worker nodes are faulty. The Basic Edition provides single-replica storage.

2) Optimizer of the Basic Edition

The optimizer processes the parser-generated syntax tree and uses the optimization algorithm to generate an optimal-cost plan for the computing engine. The plan cost directly affects query performance. Hence, the optimizer is one of the core modules in the database. The Basic Edition provides an optimizer — as powerful as that of the Cluster Edition. The optimizer implements optimization based on rules, costs, and patterns.

Complex analytical queries often involve joining multiple tables, and the table join order directly affects query performance. AnalyticDB for MySQL provides an optimizer that uses an algorithm to optimize the join order based on the estimated cost and real-time sample information. The algorithm perceives data distribution in the underlying storage. AnalyticDB for MySQL provides the full-index feature and the optimizer uses this feature to improve the accuracy of filter factor estimation. For complex join operations, the optimizer dynamically adjusts the join order based on the data distribution. It evaluates the cost of data reshuffling to select the optimal execution plan based on the global cost estimates.

Unlike a typical rule-based optimizer, the AnalyticDB for MySQL optimizer also provides functions for cost estimation and iteration optimization and integrates the Cascades cost-based optimizer (CBO) framework. The CBO search framework calls the Property Enforcement module to generate distributed execution plans and calls the cost estimation module to evaluate the cost of each candidate plan and select the optimal distributed execution plan. The AnalyticDB for MySQL optimizer implements history-based optimization, SQL pattern-based dynamic optimization, and data-driven intelligence to improve the join order further and optimize the join efficiency. For example, the Auto Analyze module automatically collects statistics to provide accurate data that allows the optimizer to search for the optimal execution plan.

To improve query performance, the AnalyticDB for MySQL optimizer optimizes combined filter conditions, aggregate operators, and joined subqueries in complex queries. For example, the pushdown optimization technology pushes the filter conditions and aggregate operators in an execution plan to the underlying module for execution. It improves the efficiency of underlying operators, reduces the amount of data processed by upstream operators, and eventually improves the overall query performance. As for joined subquery statements, the optimizer converts joined subqueries into an unjoined plan with the same semantic meaning to enhance the pipelining efficiency of the computing engine.

3) Calculation Engine of Basic Edition

The AnalyticDB for MySQL computing engine uses the MPP+DAG architecture and in-memory pipelining mode to ensure high concurrency and low latency. To accelerate the evaluation of complex expressions and optimize the execution performance, the computing engine generates JVM bytecode at runtime by using Runtime CodeGen and dynamically loads the generated object instances. This reduces virtual function calls during execution and improves the efficiency of CPU-intensive tasks. The computing engine uses a vectorized execution model to evaluate expressions and uses a CPU SIMD instruction set to accelerate the evaluation.

4) Storage Engine of the Basic Edition

As shown in the diagram below, the AnalyticDB for MySQL storage engine uses a row-column hybrid store architecture.

For every k rows of data (known as row group) in a table, the data of each column is continuously stored in a separate data block, and the column blocks of each row group are continuously stored in a disk. It’s easy to sort the data in the column blocks of a row group based on a specified column to significantly reduce the number of random disk-based I/O operations during a query regarding this column. The row-column hybrid store combines the advantages of row store (suitable for point queries in online transaction processing [OLTP]) and column store (suitable for multidimensional analysis in online analytical processing [OLAP]) to meet the requirements of different types of workloads.

  • To execute point queries of the OLTP type, you must select an entire row of detailed data. However, in row-column hybrid store, data in the column store is read sequentially instead of randomly.
  • In OLAP multidimensional analysis, row-column hybrid store significantly reduces the number of read throughputs in row store during the statistical analysis of massive data volumes. Sequential read of column store is converted to sequential read miss during single-column I/O, and random read is converted to sequential read during multi-column I/O.
  • For large write throughput scenarios, random write of column store is converted to sequential write.

The AnalyticDB for MySQL storage engine uses intelligent full indexing to create an inverted index relative to the row number for the data of each column. In the query process, the AnalyticDB for MySQL storage engine converts the AND and OR operators of multiple SQL conditional expressions into Boolean queries and indexes these queries. It can search for the result sets that satisfy the WHERE clause in milliseconds and identify the numbers of the rows that store these result sets. It also merges multiple result sets.

5) Basic Edition — The Highlights

The Basic Edition significantly simplifies the process of building a data warehouse and is more cost-effective than other data warehouse building methods, such as big data (Hadoop, Spark, and Elastic MapReduce [EMR]) and OLTP.

a) Ease of Use: The Basic Edition is billed at a minimum price of RMB 1.75/hour or RMB 860/month, which is about two-thirds of the minimum price of the Cluster Edition. Storage space is billed at only RMB 0.6/GB. A Basic Edition cluster supports up to 4 TB of storage space. Users may expand storage space as needed. This makes it easier for small- and medium-sized enterprises to build real-time warehouses and perform sophisticated analytics.
b) High Performance: In terms of data query, a Basic Edition cluster is 10 times faster than a MySQL database that has the same configuration.
c) Wide Range of Specifications: The Basic Edition supports the multiple specifications, including T8, T16, T32, and T52. It’s easy to select and change the specifications based on business requirements.
d) Transparent Ecosystem: Upstream and downstream resources are compatible with the Cluster Edition and are transparent to users.

6) Intended Customers

The Basic Edition is suitable for different customer groups with varied requirements.

  • Small- and medium-sized enterprises that hope to implement fast data-driven transformations without using complex architectures such as Hadoop and Spark.
  • Small- and medium-sized enterprises that encounter slow queries on report databases and require interactive business intelligence (BI) analysis.
  • Users who need to quickly build a test environment to select a data warehouse.
  • Customers who want to understand swiftly AnalyticDB for MySQL

To learn more, visit the AnalyticDB for MySQL page!

Original Source:

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com