DAG 2.0: The Adaptive Execution Engine That Supported 150 Million Distributed Jobs in the 2020 Double 11

Preface

Challenge and Background

Number of distributed jobs running on computing platforms per day

Adaptive Dynamic Execution: A “Different” Double 11

Adaptive Shuffle: Intelligent Data Orchestration to Avoid Data Skew

Prolonged job execution caused by data skew
Intelligent data orchestration based on adaptive shuffle to avoid data skew
Distribution of the 130,000 jobs for which adaptive shuffle eliminated data skew issues

Adaptive Parallelism Adjustment: Dynamic Data Distribution at the Partition Level to Optimize Resource Usage

Simple dynamic parallelism adjustment vs adaptive dynamic parallelism adjustment

Conditional Join: The Optimal Choice for Real-Time Join Operations

Different join algorithms in distributed SQL
Implementation of conditional joins on DAG 2.0
Dynamic execution plan adjustment by using nested conditional joins

Next-Generation Execution Framework: Higher Flexibility and Efficiency

System Metric Optimization

Centralized Management of High-Load Clusters

Quasi-real-time execution framework 2.0
Prevented impact of peak traffic on Apsara Name Service and Distributed Lock Synchronization System

Support for the PAI Elastic-Inference Engine

Trend of execution duration of video feature extraction tasks performed by an algorithm recommendation team

Support for the Bubble Execution Mode

Bubble execution mode based on the DAG model
Latency and resource consumption comparison between the batch execution mode and bubble execution mode

Outlook

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com