Apache Flink Fundamentals: Building a Development Environment and Configure, Deploy and Run Applications

By Sha Shengyang


Flink is an open-source big data project with Java and Scala as development languages. It provides open-source code on GitHub and uses Maven to compile and build the project. Java, Maven, and Git are essential tools for most Flink users. In addition, a powerful integrated development environment (IDE) helps to read code, develop new functions and fix bugs faster. While this article doesn’t include the installation details of each tool, it provides the necessary installation suggestions.

The article includes the following:

  • How to deploy and configure the Flink development environment.
  • How to run the Flink application (including local Flink cluster mode, Standalone cluster mode, and Yarn cluster mode).

Deploy and Configure Flink Development Environment

Use Mac OS, Linux or Windows systems in the development and testing environment. If you are using Windows 10, we recommend using the Windows 10 subsystem for Linux to compile and run.

We recommend you to use a stable branch released by the community, such as Release-1.6 or Release-1.7.

Compile Flink Code

After configuring the above-mentioned tools, execute the following commands to simply compile Flink.

Below are common compilation parameters.

Once the compilation is complete, you see the following files in the flink-dist/target/ subdirectory under the current Flink code directory (the version numbers compiled with different Flink code branches are different, and the version number here is Flink 1.5.1).

Note the following three types of files.

Note: Users in China may encounter “Build Failure” (MapR related errors) during compilation. This may relate to the download failures of MapR-related dependencies. Even if recommended settings.xml configuration (the Aliyun Maven source acts as a proxy for MapR-related dependencies) is used, the download failure may still occur.

The problem mainly relates to the larger Jar package of MapR. If you encounter this problem, try again. Before retrying, delete the corresponding directory in Maven local repository according to the failure message, else wait for the Maven download to time out before you download the dependency to the local device again.

Prepare Development Environment

The IntelliJ IDEA IDE is the recommended IDE tool for Flink. The Eclipse IDE is not officially recommended, mainly because Scala IDE in Eclipse is incompatible with Scala in Flink.

If you need to develop Flink code, configure the Checkstyle according to the configuration file in the tools/maven/ directory of Flink code. Flink force checks the code style during compilation, and if the code style does not conform to the specification, compilation may fail.

Run Flink Application

Basic Concepts

It is simple to run a Flink application. However, before running a Flink application, it is necessary to understand the components of the Flink runtime, because this involves the configuration of the Flink application. Figure 1 shows a data processing program written with the DataStream API. The operators that cannot be chained in a DAG Graph are separated into different tasks. Tasks are the smallest unit of resource dispatching in Flink.

Next, figure 2 shows that the Flink runtime environment consists of two types of processes.

  • JobManager (also known as JobMaster) coordinates the distributed execution of tasks. This includes dispatching tasks, adjusting checkpoints, and coordinating the recovery of each task from the checkpoint when a job fails over.
  • TaskManager (also known as Worker) executes tasks in the Dataflow figure. This includes allocating memory buffers and transferring data streams.

Figure 3 shows that a task slot is the smallest resource allocation unit in TaskManager. The number of task slots in TaskManager indicates the number of concurrent processing tasks. Note that multiple operators may execute in a task slot. Generally, these operators are chained and processed.

Prepare Runtime Environment

Preparing a runtime environment includes the following:

  • Prepare the Flink binary.
  • Download the Flink binary package from the Flink official website or compile Flink binary from Flink source code.
  • Install Java and configure JAVA_HOME environment variables.

Run Flink in Local Flink Cluster Mode

Basic Startup Process

The simplest way to run a Flink application is to run it in the local Flink cluster mode.

First, start the cluster using the following command.

Visit to see the Web interface of Flink and try to submit a WordCount task.

Also, explore the information displayed on the Web interface. For example, view the stdout log of the TaskManager to see the computation result of the WordCount job.

Use the -- input parameter to specify your own local file as the input, and then execute the following command.

Stop the cluster by executing the command below.

Common Configurations

Use the ‘- conf/slaves’ configuration to configure the TaskManager deployment. By default, only one TaskManager process starts. To add a TaskManager process, add a “localhost” line to the file.

Run the ./bin/taskmanager.sh start command to add a new TaskManager process.

Use conf/flink-conf.yaml to configure operation parameters for JM and TM. Common configurations include:

  • The heap size for the JobManager JVM
  • The heap size for the TaskManager JVM
  • The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.
  • The managed memory size for each task manager

Once the standalone cluster starts, analyze the operation of the two Flink related processes: the JobManager process and the TaskManager process. Run the jps command. Further, use the ps command to see the configuration of “-Xmx” and “-Xms” in the startup parameters of the process. Then try to modify several configurations in flink-conf.yaml and restart the standalone cluster to see what has changed.

Note that in the open-source branch of Blink, the memory computing of TaskManager is more refined than that of the current community version. The general method for computing the heap memory limit (-Xmx) of the TaskManager process is shown below.

In the latest Flink Community Release-1.7, the default memory configuration for JobManager and TaskManager is as follows.

The heap size for the JobManager JVM

The heap size for the TaskManager JVM

The taskmanager.heap.size configuration in Flink Community Release-1.7 actually refers to the total memory limit of the TaskManager process, instead of the memory limit of the Java heap. Use the above method to view the -Xmx configuration of the TaskManager process started by the Flink binary in Release-1.7.

Observe that the -Xmx value in the actual process is smaller than the configured taskmanager.heap.size because the network buffer deducts the memory.

The memory used by the network buffer is direct memory, thus it is not included in the heap memory limit.

View and Configure Logs

The startup logs of JobManager and TaskManager are available in the log subdirectory under the Flink binary directory. The files prefixed with flink-${user}-standalonesession-${id}-${hostname} in the log directory correspond to the output of JobManager. These include the following three files:

  • flink-${user}-standalonesession-${id}-${hostname}.log: the log output in the code
  • flink-${user}-standalonesession-${id}-${hostname}.out: the stdout output during process execution
  • flink-${user}-standalonesession-${id}-${hostname}-gc.log: the GC log for JVM

The files prefixed with flink-${user}-taskexecutor-${id}-${hostname} in the log directory correspond to the output of TaskManager. The output of JobManager includes these three files.

The log configuration file is in the conf subdirectory of the Flink binary directory.

  • log4j-cli.properties: The log configuration used by the Flink command-line client (such as executing the flink run command).
  • log4j-yarn-session.properties: The log configurations used by the Flink command-line client while starting a YARN session (yarn-session.sh).
  • log4j.properties: Whether in Standalone or Yarn mode, the log configuration used on JobManager and TaskManager is log4j.properties.

Three logback.xml files correspond to these three log4j.properties files respectively.

If you want to use logback files, just delete the corresponding log4j.*properties files. The corresponding relationship is as follows.

  • log4j-cli.properties -> logback-console.xml
  • log4j-yarn-session.properties -> logback-yarn.xml
  • log4j.properties -> logback.xml

Note that, flink-$ {user}-manualonesession-$ {id}-$ {hostname} and flink-$ {user}-taskexecutor-$ {id}-$ {hostname} "contains" $ {id }". "$ {id} indicates the start order of all processes of this role (JobManager or TaskManager) on the local machine and its default value is 0.

Moving Ahead

Repeat the ./bin/start-cluster.sh command and check the Web page (or execute the jps command) to see what happens.

Check the startup script and analyze the cause. Further, repeat the ./bin/stop-cluster.sh command, and see what happens after each execution.

Deploy the Flink Standalone Cluster on Multiple Hosts

Note the following key points before deployment.

  • Java and JAVA_HOME environment variables are configured on each host.
  • The Flink binary directory deployed on each host must be the same directory.
  • If you need to use HDFS, configure HADOOP_CONF_DIR environment variables.

Modify the conf/masters and conf/slaves configurations according to your cluster information.

Modify the conf/flink-conf.yaml configuration, and make sure that the address is the same as in the Masters file.

Make sure that the configuration files in the conf subdirectory of the Flink binary directory are the same on all hosts, especially the following three files.

Start the Flink cluster.

Submit a WordCount job.

Upload the input file for the WordCount job.

Submit a WordCount job to read and write HDFS.

Increase the concurrency of the WordCount job (note that submission fails if the output file name is duplicate).

Deploy and Configure High Availability (HA) in Standalone Mode

In Figure 2, the Flink runtime architecture shows that the JobManager is the most likely role in the entire system that causes the system unavailability. In case a TaskManager fails, if there are enough resources, you only need to dispatch related tasks to other idle task slots, and then recover the job from the checkpoint.

However, if only one JobManager is configured in the current cluster, once the JobManager fails, you must wait for the JobManager to recover. If the recovery time is too long, the entire job may fail.

Therefore, if the Standalone mode is used for a production business, you need to deploy and configure High Availability, so that multiple JobManagers can be on standby to ensure continuous service of the JobManager.


  • If you want to use the Flink standalone HA mode, make sure it is based on Flink Release-1.6.1 or later version, because a bug in the community may cause the leading JobManager to not work properly in this mode.
  • HDFS is needed in the following experiment. Hence, download the Flink Binary package with Hadoop support.

Use Standard Flink Script to Deploy ZooKeeper (Optional)

Flink currently supports ZooKeeper-based HA. If ZK is not deployed in your cluster, Flink provides a script to start the ZooKeeper cluster. First, modify the configuration file conf/zoo.cfg, and configure the server.X=addressX:peerPort:leaderPort based on the number of ZooKeeper server hosts that you want to deploy. "X" is the unique ID of the ZooKeeper server and must be a number.

Then, start ZooKeeper.

Execute the jps command to verify that the ZK process has started.

Execute the command to stop the ZooKeeper cluster.

Modify Configuration of Flink Standalone Cluster

Modify the conf/masters file and add a JobManager.

The previously modified conf/slaves file remains unchanged.

Modify the conf/flink-conf.yaml file.

Configure the high-availability mode

Configure the ZooKeeper Quorum (the hostname and port must be configured based on the actual ZK configuration)

Set the ZooKeeper root directory (optional)

It is equivalent to the namespace of the ZK node created in the standalone cluster (optional)

The metadata of the JobManager is stored in DFS. A pointer pointing to the DFS path is saved on ZK

Note that in the HA mode, both configurations in conf/flink-conf.yaml are invalid.

After the modification, make sure that the configuration is synchronized to other hosts.

Start the ZooKeeper cluster.

Start the standalone cluster (make sure that the previous standalone cluster has been stopped).

Open the JobManager Web pages on the two master nodes respectively.

Observe that the two pages finally go to the same address. The former address is the host where the current leading JobManager is located, and the other is the host where the standby JobManager is located. Now, the HA configuration in the Standalone mode completes.

Next, test and verify the effectiveness of HA. After determining the host of the leading JobManager, kill the leading JobManager process. For example, if the current leading JobManager is running on the host z05c19426.sqa.zth.tbsite.net, you can kill the process.

Then, open the two links again.

Note that the latter link is no longer displayed, while the former link can be displayed, indicating that a master-slave switch has occurred.

Then, restart the previous leading JobManager.

If you open the link http://z05c19426.sqa.zth.tbsite.net:8081 again, you should see that this link will open the page http://z05f06378.sqa.zth.tbsite.net:8081.

This indicates that the JobManager has completed a failover recovery.

Run the Flink Job in Yarn Mode

Compared to Standalone mode, a Flink job has the following advantages in Yarn mode:

  • Resources are used on-demand to improve the resource utilization of the cluster.
  • Tasks have priorities, and jobs are run according to those priorities.
  • Based on the Yarn dispatching system, the failover of each role is automatically processed.
  • The JobManager process and the TaskManager process are both monitored by the Yarn NodeManager.
  • If the JobManager process exits unexpectedly, the Yarn ResourceManager re-dispatches the JobManager to other hosts.
  • If the TaskManager process exits unexpectedly, the JobManager receives the message and requests resources from the Yarn ResourceManager again to restart the TaskManager process.

Start the Long-Running Flink Cluster on Yarn (Session Cluster Mode)

View command parameters.

Create a Flink cluster in the Yarn mode.

The parameters used include the following:

  • -n,--container ~ Number of TaskManagers
  • -jm,--jobManagerMemory ~ Memory for JobManager Container with an optional unit (default: MB)
  • -tm,--taskManagerMemory ~ Memory per TaskManager Container with optional unit (default: MB)
    - -qu,--queue ~ Specify YARN queue.
  • -s,--slots ~ Number of slots per TaskManager
  • -t,--ship ~ Ship files in the specified directory (t for transfer)

Submit a Flink job to the Flink cluster.

Although there is no specific information about the corresponding Yarn application, submit the Flink job to the corresponding Flink cluster, since the file - /tmp/.yarn-properties-${user} holds the cluster information used to create the last Yarn session.

Therefore, if the same user creates another Yarn session on the same host, this file will be overwritten.

If /tmp/.yarn-properties-${user} is deleted or the job is submitted on another host, it can be submitted to the expected Yarn session. The high-availability.cluster-id parameter is configured to obtain the address and port of the JobManager from ZooKeeper and submit the job.

If the Yarn session is not configured with HA, you must specify the application ID on Yarn in the Flink job submission command and pass it through the -yid parameter.

Note that the TaskManager is released soon after a task is completed, and it will be pulled up again the next time a task is submitted. To extend the timeout period of idle TaskManagers, configure the following parameter in the conf/flink-conf.yaml file, in milliseconds.

Run a Single Flink Job on Yarn (Job Cluster Mode)

Run the following command if you only want to run a single Flink job and then exit.

Common configurations include:

  • -yn,--yarncontainer ~ Number of Task Managers
  • -yqu,--yarnqueue ~ Specify YARN queue.
  • -ys,--yarnslots ~ Number of slots per TaskManager
  • -yqu,--yarnqueue ~ Specify YARN queue.

View the parameters available for Run through the Help command.

The parameters prefixed with -y and -- yarn in "options for yarn-cluster mode of the ./bin/flink run -h command corresponds to those of the ./bin/yarn-session.sh -h command one-to-one, and their semantics are basically the same.

The relationship between the -n (in Yarn Session mode), -yn (in Yarn Single Job mode), and -p parameters as defined below.

  • The -n and -yn parameters do not have actual control function in the Community version (Release 1.5 — Release 1.7). The actual resources are requested based on the -p parameter, and the TM will return the resources after using them.
  • In the open-source version of Blink, the -n parameter (in Yarn Session mode) is used to start a specified number of TaskManagers at the beginning. Later, it will not apply for new TaskManagers even if the job requires more slots.
  • In the open-source version of Blink, the -yn parameter (in Yarn Single Job mode) indicates the initial number of TaskManagers, without setting the maximum number of TaskManagers. Note that the Single Job mode is only used if the -yd parameter is added (for example, the command ./bin/flink run -yd -m yarn-cluster xxx).

Configure High Availability in Yarn Mode

First, make sure that the configuration in the yarn-site.xml file is used to start the Yarn cluster. This configuration is the upper limit for restarting the YARN cluster-level AM.

Then, in the conf/flink-conf.yaml file, configure the number of times the JobManager for this Flink job can be restarted.

Finally, configure the ZK-related configuration in the conf/flink-conf.yaml file. The configuration methods are basically the same as those of Standalone HA, as shown below.

Configure the high-availability mode

Configure the ZooKeeper Quorum (the hostname and port must be configured based on the actual ZK configuration)

Set the ZooKeeper root directory (optional)

Delete the configuration

The metadata of the JobManager is stored in DFS. A pointer pointing to the DFS path is saved on ZK

It is better to remove the high-availability.cluster-id configuration, because in Yarn (and Mesos) mode, if the cluster-id is not configured, it will be configured as the Application ID on Yarn to ensure uniqueness.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store