Principles and Practices of Flink on YARN and Kubernetes: Flink Advanced Tutorials

Flink Architecture Overview

This article provides an overview of Apache Flink architecture and introduces the principles and practices of how Flink runs on YARN and Kubernetes, respectively.

Flink Architecture Overview — Jobs

Flink Architecture Overview — JobManager

  • It provides a Scheduler to schedule tasks.
  • It provides a checkpoint coordinator to adjust the checkpointing of each task, including the checkpointing start and end times.
  • It communicates with the TaskManager through the Actor System.
  • It provides recovery metadata used to read data from metadata while recovering from a fault.

Flink Architecture Overview — TaskManager

  • Network Manager used to manage networks
  • Actor System used to implement network communication
  • In the master process, the Standalone ResourceManager manages resources. On submitting a JobGraph to the master node through a Flink cluster client, the JobGraph is first forwarded through the Dispatcher.
  • After receiving a request from the client, the Dispatcher generates a JobManager. The JobManager applies for resources from the Standalone ResourceManager and then starts the TaskManager.
  • The TaskManager initiates registration after startup. After registration is completed, the JobManager allocates tasks to the TaskManager for execution. This completes the job execution process in Standalone mode.

Flink Runtime-related Components

This section summarizes Flink’s basic architecture and the components of Flink runtime.

  • After receiving a request, JobManager schedules the job and applies for resources to start a TaskManager.
  • The TaskManager is responsible for task execution. It registers with the JobManager and executes the tasks that are allocated by the JobManager.

Principles and Practices of Flink on YARN

YARN Architecture Overview

YARN is widely used in the production environments of Chinese companies. This section introduces the YARN architecture to help you better understand how Flink runs on YARN.

YARN Architecture — Components

A YARN cluster consists of the following components:

  • The ApplicationMaster runs on a worker node and is responsible for data splitting, resource application and allocation, task monitoring, and fault tolerance.
  • The NodeManager runs on a worker node and is responsible for single-node resource management, communication between the ApplicationMaster and ResourceManager, and status reporting.
  • Containers are used to abstract resources, such as memory, CPUs, disks, and network resources.

YARN Architecture — Interaction

  • After receiving the request from the client, the ResourceManager allocates a container used to start the ApplicationMaster and instructs the NodeManager to start the ApplicationMaster in this container.
  • After startup, the ApplicationMaster initiates a registration request to the ResourceManager. The ApplicationMaster applies for resources from the ResourceManager. Based on the obtained resources, the ApplicationMaster communicates with the corresponding NodeManager, requiring it to start the program.
  • One or more NodeManagers start MapReduce tasks.
  • The NodeManager continuously reports the status and execution progress of the MapReduce tasks to the ApplicationMaster.
  • When all MapReduce tasks are completed, the ApplicationMaster reports task completion to the ResourceManager and deregisters itself.

Flink on YARN — Per Job

  • The YARN ResourceManager applies for the first container. This container starts a process through the ApplicationMaster, which runs Flink programs, namely, the Flink YARN ResourceManager and JobManager.
  • The Flink YARN ResourceManager applies for resources from the YARN ResourceManager. The TaskManager is started after resources are allocated. After startup, the TaskManager registers with the Flink YARN ResourceManager. After registration, the JobManager allocates tasks to the TaskManager for execution.

Flink on YARN — Session

Features of YARN

YARN has the following advantages:

  • Resource Isolation: YARN uses cgroups for resource isolation to prevent interference between resources. Once the amount of resources used by a container exceeds the predefined threshold, the container is killed.
  • Automatic Failover: YARN supports NodeManager monitoring and the recovery of the ApplicationManager from exceptions.

Principles of Flink on Kubernetes

Kubernetes is an open-source container cluster management system developed by Google. It supports application deployment, maintenance, and scaling. Kubernetes allows easily managing containerized applications running on different machines. Compared with YARN, Kubernetes is essentially a next-generation resource management system, but its capabilities go far beyond.

Kubernetes — Basic Concepts

In Kubernetes, a master node is used to manage clusters. It contains an access portal for cluster resource data and etcd, a high-availability key-value store. The master node runs the API server, Controller Manager, and Scheduler.

Kubernetes — Architecture Diagram

  • etcd is a key-value store and responsible for assigning tasks to specific machines. The kubelet on each node finds the corresponding container to run tasks on the local machine.
  • Submit a resource description for the Replication Controller to monitor and maintain the number of containers in the cluster. You may also submit a Service description file to enable the kube-proxy to forward traffic.

Kubernetes — Core Concepts

Kubernetes involves the following core concepts:

  • A Service provides a central service access portal and implements service proxy and discovery.
  • Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) are used for persistent data storage.
  • ConfigMap stores the configuration files of user programs and uses etcd as its backend storage.

Flink on Kubernetes — Architecture

  • The master container starts the Flink master process, which consists of the Flink-Container ResourceManager, JobManager, and Program Runner.
  • The worker containers start TaskManagers, which register with the ResourceManager. After registration, the JobManager allocates tasks to the containers for execution.
  • In Flink, the master and worker containers are essentially images but have different script commands. You can choose whether to start the master or worker containers by setting parameters.

Flink on Kubernetes — JobManager

The execution process of a JobManager is divided into two steps:

Flink on Kubernetes — TaskManager

A TaskManager is also described by a Deployment to ensure that it is executed by the containers of n replicas. A label, such as flink-taskmanager, is defined for this TaskManager.

Flink on Kubernetes — Interaction

  • The Deployment ensures that the containers of n replicas run the JobManager and TaskManager and applies the upgrade policy.
  • The ConfigMap mounts the /etc/flink directory, which contains the flink-conf.yaml file, to each pod.

Flink on Kubernetes — Practices

This section describes how to run a job in Flink on Kubernetes.

•Session Cluster
• Start
•kubectl create -f jobmanager-service.yaml
•kubectl create -f jobmanager-deployment.yaml
•kubectl create -f taskmanager-deployment.yaml
•Submit job
•kubectl port-forward service/flink-jobmanager 8081:8081
•bin/flink run -d -m localhost:8081 ./examples/streaming/TopSpeedWindowing.jar
• Stop
•kubectl delete -f jobmanager-deployment.yaml
•kubectl delete -f taskmanager-deployment.yaml
•kubectl delete -f jobmanager-service.yaml
sh build.sh --from-release --flink-version 1.7.0 --hadoop-version 2.8 --scala-version 2.11 --job-jar ~/flink/flink-1.7.1/examples/streaming/TopSpeedWindowing.jar --image-name topspeed
docker tag topspeed zkb555/topspeedwindowing 
docker push zkb555/topspeedwindowing
kubectl create -f job-cluster-service.yaml 
FLINK_IMAGE_NAME=zkb555/topspeedwindowing:latest FLINK_JOB=org.apache.flink.streaming.examples.windowing.TopSpeedWindowing FLINK_JOB_PARALLELISM=3 envsubst < job-cluster-job.yaml.template | kubectl create -f –
FLINK_IMAGE_NAME=zkb555/topspeedwindowing:latest FLINK_JOB_PARALLELISM=4 envsubst < task-manager-deployment.yaml.template | kubectl create -f -

Flink on YARN and Flink on Kubernetes FAQ

Q) Can I submit jobs to Flink on Kubernetes using operators?

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store