Deploy Apache Flink Natively on YARN or Kubernetes?

By Ren Chunde

Apache Flink, considered by some to represent the next generation of big data computing engines, is trailing along its development path quicker than ever. The internal architecture behind the Flink framework is also constantly undergoing further steps in optimization and reconstructing to adapt to an increasing number runtime environments in addition to accommodating a much larger computing scale. In Flink Improvement Proposals-6, there is now a resigned, unified architecture for resource scheduling in various cluster management systems (among these systems are Standalone, YARN, and Kubernetes).

In this blog, we’re going to take a look into new developments for Apache Flink, including how standalone clusters are being implemented, how the Per-Job and Session modes on YARN are now supported, and also how Apache Flink is gaining native integration with Kubernetes.

Apache Flink Standalone Clusters

The reason why it is standalone is that it does not rely on other underlying resource scheduling systems, and can be directly deployed and started on its own bare machine nodes. Some automated O&M tools can be used for convenient deployment and management, but the following problems exist:

  • Isolation: When multiple jobs run in one cluster, tasks of different jobs may be executed on the same TM, causing the resources used by the threads (CPU/MEM) cannot be controlled and affect each other, and even one task can cause the entire TM to be out of memory and affects the jobs on it. The scheduling of multiple jobs is also in the same JM, which also has the problem of being affected by the problematic job.
  • Multi-Tenant Resource Usage (Quota) Management: The total amount of Job resources used by users cannot be controlled, and the coordinated management of resources among tenants is lacking.
  • Cluster Availability: Although the JM can be deployed with Standby machines to support high availability, the lack of maintenance of JM and TM processes inevitably causes too many processes to be down and the entire cluster is unavailable, due to the above isolation problems.
  • Cluster O&M: For version upgrade, capacity scaling, and other, complex O&M operations are required.

To solve the above problems, Flink needs to be run on popular and mature resource scheduling systems, such as YARN, Kubernetes, and Mesos.

Native Integration of Flink and YARN

Apache Flink Standalone Clusters on YARN

  • Multiple YARN applications can be started for multiple jobs. Each app is a standalone cluster and runs independently. Moreover, through isolation, such as with cgroups, supported by YARN, the mutual influence between multiple tasks is avoided, and the isolation problem is solved.
  • Apps of different users can also be run in different YARN scheduling queues to solve the multi-tenant problem through the Queue Quota management capability.
  • At the same time, the YARN policy for restarting, retrying and rescheduling the app process, can be used to make the Flink Standalone cluster have high availability.
  • With the simple modification for parameters and the configuration file, Flink jars can be distributed through the Distributed Cache of YARN, thus facilitating upgrade and capacity scaling.

Although the above problems may have found this fix, it is difficult to achieve efficient resource utilization by starting a Standalone Cluster for each job or a small number of jobs. The reasons are as follows:

  • The Cluster scale (the number of TMs) is statically specified by the parameters when the YARN app is started. The compilation optimization of Flink itself makes it difficult to estimate the resource demand before running, so it is difficult to rationalize the number of TMs. If the number of TMs is large, resources will be wasted. Otherwise, if it is small, the job execution speed will be affected or may not even be able to run.
  • The resource size owned by each TM is also statically specified by parameters, and it is also difficult to estimate the actual needs. Different TM sizes cannot be applied dynamically for different task resource requirements, and only TM with the same specification size can be set, which makes it difficult to place an exact integer number of tasks, and waste the remaining resources.
  • App startup (1. Submit YARN App) and Flink job submission (7. Submit Job) needs to be completed in two stages, which will make the submission efficiency of each task low, and reduce the resource flow rate of the cluster.

The more Flink jobs in a large-scale YARN cluster, the more resources are wasted and the higher the cost is. Moreover, the above problems do not only exist on YARN. When the Flink Standalone cluster runs directly on other resource scheduling systems, the same problems also exist. Therefore, Alibaba real-time computing takes the lead in improving the Flink resource utilization model based on the practical production experience with YARN, and subsequently designs and implements a common architecture with the community, which is applicable to different resource scheduling systems.

FLIP-6: Deployment and Process Model

  • Dispatcher: It is responsible for communicating with the Client to receive job submission and generate JobManager. The lifecycle can span across jobs.
  • ResourceManager: It integrates with different resource scheduling systems to implement resource scheduling (application/release), and manage Container/TaskManager. Similarly, the lifecycle can span across jobs.
  • JobManager: It is responsible for scheduling and execution of the computing logic of the job. Each job has one instance.
  • TaskManager: It registers with RM to report the status of resources. It receives task execution from JM, and reports status.

Native Integration of Apache Flink and YARN

Per-Job

Session

Resource Profile

Native Integration of Flink and Kubernetes

Apache Flink Standalone Clusters on Kubernetes

Native Integration of Apache Flink and Kubernetes

Summary

However, with all of that said, the optimization and improvement of related functions are still very much an ongoing process. Some developers find the difficulty of configuring resources by the Resource Profile daunting, and this whole process significantly reduces the usability of Flink overall. We are trying to implement Auto Config and Scaling and other functions for resource and concurrent configuration to solve these problems. Nevertheless, the “Serverless” architecture is developing rapidly, and it is expected that Flink and Kubernetes will be integrated into a powerful cloud-native computing engine (like FaaS) to save resources and bring greater value to users. We are excited for this future.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com