Data Warehouse: In-depth Interpretation of Flink Resource Management Mechanism

  1. Basic Concepts
  2. Current Mechanisms and Policies
  3. Future Development Directions

1. Basic Concepts

1.1 Related Components

Figure 1 Resource management components of Flink

1.2 Logical Hierarchy

  • Operator: Operators are the most basic data processing units.
  • Task: They are the smallest units that are actually scheduled in Flink runtime. Each one consists of a series of chained operators. Note: If two operators belong to the same task and one operator has already started running, the other operator must be already scheduled.
  • Job: Each one corresponds to a job graph.
  • Flink Cluster: 1 Flink Master + N Task Managers
Figure 2 Logical hierarchy of components

1.3 Two-layer Resource Scheduling Model

Figure 3 Two-layer resource scheduling model
  • Slot Caching
  • Batch jobs
  • Failover of streaming jobs
  • Multiple tasks using slot resources in sequence or in turns
  • Slot Sharing
  • Multiple tasks sharing the same slot under certain conditions.

2. Current Mechanisms and Policies

2.1 Resources of a Task Manager

Figure 4 Resources of a task manager

2.2 Resources of a Slot

Figure 5. Slot resources

2.3 How Many Task Managers Does a Flink Cluster Have?

<FLINK_DIR>/conf/slaves -n <num>
flink run -yn <num>

2.4 Process of Scheduling Resources from a Cluster to a Job

Figure 6. Process of scheduling resources from a cluster to a job
  • Slot Allocation (Illustrated by red arrows in figure 6)
  • Starting Task Managers (Illustrated by blue arrows in figure 6)

2.5 Process of Scheduling Resources from a Job to a Task

  • Scheduler: Determines the next task to be scheduled based on the execution graph and task execution status. Initiates a slot request and determines the allocation between tasks and slots.
  • Slot Sharing: Tasks in the same slot sharing group can share slots. By default, all nodes are in the same slot sharing group. One slot can contain only one identical task.
  • Advantages: The maximum number of slots required to run a job is the maximum number of concurrencies. Relative load balancing.
Figure 7. Process of scheduling resources from a job to a task

2.6 Resource Optimization

3. Future Development Directions

3.1 Fine-grained Resource Management

Figure 8. Limitations of slot sharing
Figure 9
  • If the resource requirements of the operators are known, the slot size can be measured through empirical estimation and semi-automated or automated tools.
  • Each task exclusively occupies a slot for resource scheduling.

3.2 Dynamic Slot Segmentation

Figure 10. Static slot allocation
Figure 11. Dynamic slot segmentation
Figure 12. Static slot segmentation
Figure 13. Dynamic slot segmentation

3.3 Resource Fragmentation Problem

  • Streaming
  • Scheduling once for long-term operation
  • Higher benefits from improved resource utilization
  • Scheduling policies suitable for using custom Task Manager resources
  • Batch (batch processing, especially for short queries)
  • Frequent scheduling and short task running time
  • Sensitive to scheduling latency
  • Scheduling policies suitable for using non-custom task manager resources

3.4 Ease-of-Use Issues

3.5 Making Resource Scheduling Policies Plug-ins (FLINK-14106)

  • Resource scheduling policies
  • Number of task managers
  • When to apply for or release a task manager
  • Size of task manager resources
  • Adaptation between slot request and task manager resources
  • Provides different resource scheduling policies for stream processing and batch processing
  • Provides fine-grained and non-fine-grained resource management to meet diverse requirements
  • Allows the possibility of more resource scheduling policies in the future
  • For example, Spark performs elastic scaling on clusters based on the load.

About the Author

Original Source:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website: