The Journey of an SQL Query in the MaxCompute Distributed System

MaxCompute: Mega-scale Computing for Enterprises

  • Fully managed, multi-tenant, and mega-scale platform
  • Enterprise-grade high-performance computing engine

Mega-scale Enterprise-grade SQL Engine: MaxCompute UniSQL

The Journey of an SQL Query in a Distributed System

SQL Features

  • Not Only SQL — Script Mode
  • Not Only SQL — Parameterized View
  • Not Only SQL — IF/ELSE
  • Not Only SQL — UDT and Storm

SQL Performance

  • SQL Engine for Huge Data — Adaptive Join
  • SQL Engine for Huge Data — Advanced Shuffle
  1. Use Greysort mode, in which the reducer sorts and the mapper does not. It increases downstream pipelining opportunities and eliminate sorting when downstream is transferred to Hash Join.
  2. Reduce I/O and cache miss by encoding and adaptive column compression.
  3. Optimize memory structure, reduce working set size, and eliminate pointer chasing.

Enterprise-grade Distributed Intelligent Scheduling and Execution Framework

An Enterprise-grade Distributed Scheduling Execution System

Enterprise-grade Distributed Computation Scheduling Framework

  • Dynamic and intelligent execution
  1. Adjust data statistics based on partitions to avoid increasing data skew due to concurrency adjustment. Both upward and downward adjustments are supported.
  2. Automatically split large partitions based on partition statistics. Dual adjustment eliminates data skew in partitions and supports data processing and merging to retain partition attributes.
  • Efficient Job Management
  • Multiple Computational Models

AliOrc: A Next-generation of Columnar Storage Engine

Deep Optimization Based on Apache Orc

Next-generation Columnar Storage Engine

  • Parallel Encoding
  • Asynchronous Parallel I/O
  • Lazy read and lazy decoding

