Deploy Apache Flink Natively on YARN or Kubernetes?

Apache Flink Standalone Clusters

  • Isolation: When multiple jobs run in one cluster, tasks of different jobs may be executed on the same TM, causing the resources used by the threads (CPU/MEM) cannot be controlled and affect each other, and even one task can cause the entire TM to be out of memory and affects the jobs on it. The scheduling of multiple jobs is also in the same JM, which also has the problem of being affected by the problematic job.
  • Multi-Tenant Resource Usage (Quota) Management: The total amount of Job resources used by users cannot be controlled, and the coordinated management of resources among tenants is lacking.
  • Cluster Availability: Although the JM can be deployed with Standby machines to support high availability, the lack of maintenance of JM and TM processes inevitably causes too many processes to be down and the entire cluster is unavailable, due to the above isolation problems.
  • Cluster O&M: For version upgrade, capacity scaling, and other, complex O&M operations are required.

Native Integration of Flink and YARN

Apache Flink Standalone Clusters on YARN

  • Multiple YARN applications can be started for multiple jobs. Each app is a standalone cluster and runs independently. Moreover, through isolation, such as with cgroups, supported by YARN, the mutual influence between multiple tasks is avoided, and the isolation problem is solved.
  • Apps of different users can also be run in different YARN scheduling queues to solve the multi-tenant problem through the Queue Quota management capability.
  • At the same time, the YARN policy for restarting, retrying and rescheduling the app process, can be used to make the Flink Standalone cluster have high availability.
  • With the simple modification for parameters and the configuration file, Flink jars can be distributed through the Distributed Cache of YARN, thus facilitating upgrade and capacity scaling.
  • The Cluster scale (the number of TMs) is statically specified by the parameters when the YARN app is started. The compilation optimization of Flink itself makes it difficult to estimate the resource demand before running, so it is difficult to rationalize the number of TMs. If the number of TMs is large, resources will be wasted. Otherwise, if it is small, the job execution speed will be affected or may not even be able to run.
  • The resource size owned by each TM is also statically specified by parameters, and it is also difficult to estimate the actual needs. Different TM sizes cannot be applied dynamically for different task resource requirements, and only TM with the same specification size can be set, which makes it difficult to place an exact integer number of tasks, and waste the remaining resources.
  • App startup (1. Submit YARN App) and Flink job submission (7. Submit Job) needs to be completed in two stages, which will make the submission efficiency of each task low, and reduce the resource flow rate of the cluster.

FLIP-6: Deployment and Process Model

  • Dispatcher: It is responsible for communicating with the Client to receive job submission and generate JobManager. The lifecycle can span across jobs.
  • ResourceManager: It integrates with different resource scheduling systems to implement resource scheduling (application/release), and manage Container/TaskManager. Similarly, the lifecycle can span across jobs.
  • JobManager: It is responsible for scheduling and execution of the computing logic of the job. Each job has one instance.
  • TaskManager: It registers with RM to report the status of resources. It receives task execution from JM, and reports status.

Native Integration of Apache Flink and YARN



Resource Profile

Native Integration of Flink and Kubernetes

Apache Flink Standalone Clusters on Kubernetes

Native Integration of Apache Flink and Kubernetes


Original Source




Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to make your Dotfile management a painless affair

Newsletter #10: We Are Live!

How to configure the Unified CloudWatch Agent for collecting Logs and Custom Metrics.

Committed Use Discount on Google Cloud SQL

Ant Financial’s ApsaraDB for OceanBase Now Available on Alibaba Cloud

Serverless Vs. Containers — the big showdown

Evangelicalism in America is nearing extinction due to the movement’s devotion to politics at the…

Practices of Kubernetes Multi-tenant Clusters

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:

More from Medium

Deploying Airflow in Local Kubernetes Cluster: Part II

Kafka on GCP Cluster

Getting Started with TiDB Cloud Using Java

Deploy Apache Cassandra 4.0 on Kubernetes and AWS