2. Setting up flink-- cluster environment 07/11 Update SLTechnology News&Howtos

2. Setting up flink-- cluster environment

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Build the Flink environment 1.1 flink deployment mode

Flink can choose from the following deployment methods:

Local, Standalone (low resource utilization), Yarn, Mesos, Docker, Kubernetes, AWS.

We mainly analyze the Flink cluster deployment in Standalone mode and Yarn mode.

Standalone mode is often used for stand-alone program testing, while Yarn mode is often used in the actual online production environment.

1.2 Cluster planning

1. Cluster planning

Node name master (jobManager) worker (taskManager) zookeeperbigdata11 √√ bigdata21 √√√ bigdata31 √√

(note: zookeeper is only a necessary component to implement master HA. If master HA is not needed, zookeeper can be removed. )

2. Software version

Jdk1.8scala2.11.8hadoop2.8zookeeper3.4.10flink1.6.1

3. Basic environment

Install jdk, scala, hadoop (hdfs+yarn should all be deployed) and zookeeper. See the previous article on deployment method. And it should be noted that the nodes should be configured with a good ssh secret key to avoid login.

1.3 Standalone mode installation

1. Decompression program:

Tar-zxvf flink-1.6.1-bin-hadoop28-scala_2.11.tgz-C / opt/module/ modify configuration file

2. Modify the configuration file

Configure master node address: [root@bigdata11 conf] $sudo vi mastersbigdata11:8081 configure worker node address: [root@bigdata11 conf] $sudo vi slavesbigdata12bigdata13 modify flink working parameters: [root@bigdata11 conf] $sudo vi flink-conf.yaml taskmanager.numberOfTaskSlots:2 / / 52 lines jobmanager.rpc.address: bigdata11 / / 33 lines specify the rpc address of jobmanager optional configuration: amount of memory available for each JobManager (jobmanager.heap.mb)  The amount of memory available per TaskManager (taskmanager.heap.mb) of , the number of slot available per machine (taskManager) of  (taskmanager.numberOfTaskSlots), the parallelism of each job of  (parallelism.default)  temporary directory (taskmanager.tmp.dirs)

3. Configure environment variables

Vim / etc/profile.d/flink.shexport FLINK_HOME=/opt/module/flink-1.6.1export PATH=$PATH:$FLINK_HOME/bin and then source / etc/profile.d/flink.sh enable environment variables

4. Copy the configured / opt/module/flink-1.6.1 to other nodes

Use scp or rsync

Scp-r / opt/module/flink-1.6.1 bigdata12: `pwd`scp-r / opt/module/flink-1.6.1 bigdata13: `pwd`

Configure the environment variables of the other two stations at the same time

5. Start the flink cluster

[root@bigdata11 flink-1.6.1] $. / bin/start-cluster.sh Starting cluster.Starting standalonesession daemon on host bigdata11.Starting taskexecutor daemon on host bigdata12.Starting taskexecutor daemon on host bigdata13.

Use jps to view the corresponding process on the corresponding node

StandloneSessionClusterEntrypoint, this is the jobmanager process TaskManagerRunner, this is the taskmanager process.

6. Web UI view

Http://bigdata11:8081

7. Run the test task

Flink run-m bigdata11:8081. / examples/batch/WordCount.jar-- input / opt/module/datas/word.txt-- output / tmp/word.output

8. Add or subtract nodes to the cluster

first, we need to know that Flink has two deployment modes, Standalone and Yarn Cluster. For Standalone, Flink must rely on Zookeeper to implement JobManager's HA (Zookeeper has become an essential module for most open source frameworks HA). With the help of Zookeeper, a Standalone Flink cluster will have multiple living JobManager at the same time, of which only one is in the working state and the other is in the Standby state. When the working JobManager loses connectivity (such as downtime or Crash), the Zookeeper elects a new JobManager from the Standby to take over the Flink cluster.

for Yarn Cluaster mode, Flink relies on Yarn itself to HA JobManager. In fact, this is entirely the mechanism of Yarn. For Yarn Cluster mode, both JobManager and TaskManager are launched by Yarn in the Container of Yarn. At this time, JobManager should be called Flink Application Master. In other words, its failure recovery depends entirely on the ResourceManager in Yarn (the same as MapReduce's AppMaster). Because you are completely dependent on Yarn, there may be slight differences between different versions of Yarn. No further research will be done here.

1. Modify the configuration file

Conf/flink-conf.yaml

Comment out # jobmanager.rpc.address: bigdata11 modify the following configuration line high-availability: zookeeper / / 73 specifies the highly available method specifies the address list of zookeeper in the high availability mode for zookeeper# / / 88 line high-availability.zookeeper.quorum:bigdata11:2181,bigdata12:2181,bigdata13:2181# specifies that the jobmanager state data is persisted to hdfs high-availability.storageDir: hdfs:///flink/ha/ # JobManager metadata is saved in the file system storageDir Only pointers to this state are stored in ZooKeeper (required) / there is no high-availability.zookeeper.path.root: / flink # root ZooKeeper node, all cluster nodes are placed under this node (recommended), this is the location where cluster node information is saved, high-availability.cluster-id:/flinkCluster # custom cluster (recommended), here is the configuration of checkpoints and savepoints, saved in hdfs Optional state.backend: filesystemstate.checkpoints.dir: hdfs:///flink/checkpointsstate.savepoints.dir: hdfs:///flink/checkpoints

Conf/masters

Write both the active and standby jobmanager addresses to the configuration file. Bigdata11:8081bigdata12:8081

Conf/zoo.cfg

Server.1=bigdata11:2888:3888server.2=bigdata12:2888:3888server.3=bigdata13:2888:3888

Synchronize the configuration to all other nodes after modification.

2. Start the cluster

Start the zookeeper service first. Then start the hdfs service. Finally, start the flink cluster. Start-cluster.sh

1.5 yarn mode installation

The deployment steps are basically the same as the above standalone, and will not be repeated here. Also add the following configuration:

Configure the hadoop (hdfs and yarn) environment, as well as the environment variable HADOOP_HOME.

Then start jobmanager and taskmanager under yarn.

/ opt/module/flink-1.6.1/bin/yarn-session.sh-n 2-s 4-jm 1024-tm 1024-nm test-d where:-n (--container): the number of TaskManager. -s (--slots): the number of slot for each TaskManager. The default is one slot and one core, and the default number of slot for each taskmanager is 1. Sometimes more taskmanager can be used for redundancy. -memory of jm:JobManager (in MB). -tm: memory per taskmanager (in MB). -appName of nm:yarn (now the name on the ui of yarn). -d: background execution automatically starts the corresponding jobmanager and taskmanager according to the configuration file under conf/

After the startup is complete, you can check the web page of yarn to see the session task just submitted:

Http://bigdata11:8088

At the same time, you can use jps to view the corresponding process on the node that submitted the session:

YarnSessionClusterEntrypoint this is the session process maintained by yarn-session that has just been submitted.

Submit the test task to the flink cluster in yarn to run

. / bin/flink run. / examples/batch/WordCount.jar-- input output data path-- output output data path can manually specify the jobmanager address using-m jobManagerAddress, but flink client can automatically obtain the jobmanager address according to the flink configuration file, so you don't have to specify it.

After submitting the task, you can find the relevant task information on the web page of yarn.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.