Apache Flink Fundamentals: Building a Development Environment and Configure, Deploy and Run Applications

Preface

Deploy and Configure Flink Development Environment

Compile Flink Code

mvn clean install -DskipTests
# Or
mvn clean package -DskipTests
-Dfast    This is mainly to ignore the compilation of QA plugins and JavaDocs.
-Dhadoop.version=2.6.1 To specify the Hadoop version
--settings=${maven_file_path} To explicitly specify the maven settings.xml configuration file

Prepare Development Environment

Run Flink Application

Basic Concepts

Figure 1. Parallel Dataflows
Figure 2. Flink Runtime Architecture Diagram
Figure 3. Process

Prepare Runtime Environment

Run Flink in Local Flink Cluster Mode

Basic Startup Process

./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar
./bin/flink run examples/streaming/WordCount.jar --input ${your_source_file}
./bin/stop-cluster.sh

Common Configurations

./bin/taskmanager.sh start|start-foreground|stop|stop-all- conf/flink-conf.yaml
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 4
taskmanager.managed.memory.size: 256
TotalHeapMemory = taskmanager.heap.mb + taskmanager.managed.memory.size + taskmanager.process.heap.memory.mb(the default value is 128MB)
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m

View and Configure Logs

Moving Ahead

Deploy the Flink Standalone Cluster on Multiple Hosts

jobmanager.rpc.address: z05f06378.sqa.zth.tbsite.net
conf/masters
conf/slaves
conf/flink-conf.yaml
./bin/start-cluster.sh
./bin/flink run examples/streaming/WordCount.jar
hdfs dfs -copyFromLocal story /test_dir/input_dir/story
./bin/flink run examples/streaming/WordCount.jar --input hdfs:///test_dir/input_dir/story --output hdfs:///test_dir/output_dir/output
./bin/flink run examples/streaming/WordCount.jar --input hdfs:///test_dir/input_dir/story --output hdfs:///test_dir/output_dir/output --parallelism 20

Deploy and Configure High Availability (HA) in Standalone Mode

Figure 4. Flink JobManager HA diagram

Use Standard Flink Script to Deploy ZooKeeper (Optional)

# <em>The port at which the clients will connect</em>
clientPort=3181
server.1=z05f06378.sqa.zth.tbsite.net:4888:5888
server.2=z05c19426.sqa.zth.tbsite.net:4888:5888
server.3=z05f10219.sqa.zth.tbsite.net:4888:5888
./bin/start-zookeeper-quorum.sh
./bin/stop-zookeeper-quorum.sh

Modify Configuration of Flink Standalone Cluster

$cat conf/masters
z05f06378.sqa.zth.tbsite.net:8081
z05c19426.sqa.zth.tbsite.net:8081
$cat conf/slaves
z05f06378.sqa.zth.tbsite.net
z05c19426.sqa.zth.tbsite.net
z05f10219.sqa.zth.tbsite.net
high-availability: zookeeper
high-availability.zookeeper.quorum z05f02321.sqa.zth.tbsite.net:2181,z05f10215.sqa.zth.tbsite.net:2181
high-availability.zookeeper.path.root: /test_dir/test_standalone2_root
high-availability.cluster-id: /test_dir/test_standalone2
high-availability.storageDir: hdfs:///test_dir/recovery2/
jobmanager.rpc.address
jobmanager.rpc.port
./bin/start-zookeeper-quorum.sh
./bin/start-cluster.sh
./bin/jobmanager.sh start z05c19426.sqa.zth.tbsite.net 8081

Run the Flink Job in Yarn Mode

Figure 5. Flink Yarn Deployment Flowchart

Start the Long-Running Flink Cluster on Yarn (Session Cluster Mode)

./bin/yarn-session.sh -h
./bin/yarn-session.sh -n 4 -jm 1024m -tm 4096m
./bin/flink run examples/streaming/WordCount.jar --input hdfs:///test_dir/input_dir/story --output hdfs:///test_dir/output_dir/output
/bin/flink run -yid application_1548056325049_0048 examples/streaming/WordCount.jar --input hdfs:///test_dir/input_dir/story --output hdfs:///test_dir/output_dir/output
slotmanager.taskmanager-timeout: 30000L         # deprecated, used in release-1.5
resourcemanager.taskmanager-timeout: 30000L

Run a Single Flink Job on Yarn (Job Cluster Mode)

./bin/flink run -m yarn-cluster -yn 2 examples/streaming/WordCount.jar --input hdfs:///test_dir/input_dir/story --output hdfs:///test_dir/output_dir/output
./bin/flink run -h

Configure High Availability in Yarn Mode

<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>100</value>
</property>
yarn.application-attempts: 10     # 1+ 9 retries
high-availability: zookeeper
high-availability.zookeeper.quorum z05f02321.sqa.zth.tbsite.net:2181,z05f10215.sqa.zth.tbsite.net:2181
high-availability.zookeeper.path.root: /test_dir/test_standalone2_root
high-availability.cluster-id: /test_dir/test_standalone2
high-availability.storageDir: hdfs:///test_dir/recovery2/

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store