Example Analysis of the Operation principle of Flink on yarn 07/06 Update SLTechnology News&Howtos

Example Analysis of the Operation principle of Flink on yarn

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

The editor will share with you an example analysis of how Flink on yarn works. I hope you will gain something after reading this article. Let's discuss it together.

The Flink runtime consists of two types of processes:

1) JobManager is also called master to coordinate distributed execution. They schedule tasks, coordinate checkpoints, coordinate fault recovery, and so on. There is at least one JobManager. Multiple JobManager can be launched under high availability, one of which is elected as leader and the rest as standby.

2) TaskManager, also known as worker, is responsible for executing specific tasks, caching, and exchanging data streams. There is at least one TaskManager.

JobManager and TaskManager can be started in many ways: they can be started directly as a Standalone cluster, or they can be managed by yarn or Mesos. TaskManager connects to JobManager, announces that he is available, and accepts the assigned work.

The client is not part of runtime and program execution, but is used to prepare and send data streams to JobManager.

The client can then disconnect or remain connected to receive a progress report. The client runs as part of the Java / Scala program that triggers execution, or runs. / bin/flink in a command-line process.

Flink on yarn

In fact, based on the previous explanation, it is easy to understand the deployment of flink on yarn.

First, we will start a set of jobmanager and taskmanager clusters

We can then submit our application to the cluster and run our application with the jobmanager and TaskManager started in the previous step.

With the above two steps, we can first give the interaction diagram of flink on yarn, as follows:

When starting a new Flink YARN session, the client first checks to see if the requested resources (containers and memory) are available. After that, it uploads the jar and configuration containing flink to HDFS (step 1).

The next step for the client is to request (step 2) the YARN container to start ApplicationMaster (step 3). Because the client registers the configuration and jar files as resources for the container, the NodeManager of the YARN running on that particular machine will be responsible for preparing the container (for example, downloading the file). Once finished, ApplicationMaster (AM) starts.

The JobManager and AM run in the same container. After a successful startup, AM can easily know the address of JobManager (its own host). It generates a new Flink configuration file for TaskManagers (so that they can connect to JobManager). The file is also uploaded to HDFS. In addition, the AM container provides a Web interface for Flink. All ports assigned by the YARN code are temporary ports. This allows the user to execute multiple Flink YARN sessions in parallel.

After that, AM starts assigning containers to Flink's TaskManagers, which downloads the jar file and the modified configuration from HDFS. After completing these steps, Flink sets up and is ready to accept the job.

Test flink on yarn

First, start a yarn session and assign it four taskmanager, each taskmanager 4GB memory.

# get the hadoop2 package from the Flink download page at# http://flink.apache.org/downloads.htmlcurl-O tar xvzf flink-1.5.0-bin-hadoop2.tgzcd flink-1.5.0/./bin/yarn-session.sh-n 4-jm 1024-tm 4096

-jm represents the memory size of jobmanager,-tm represents the memory size of TaskManager, and-n represents n taskmanager. Then some people may ask, there is also the concept of slot, how to set parameters, in fact, only need the-s parameter, you can set the number of slot contained in each TaskManager. As for the slot number setting techniques we will describe in detail later, the recommended practice is that the number of slot contained in each taskmanager is the number of processors.

Then run a flink job to yarn

# get the hadoop2 package from the Flink download page at# http://flink.apache.org/downloads.htmlcurl-O tar xvzf flink-1.5.0-bin-hadoop2.tgzcd flink-1.5.0/./bin/flink run-m yarn-cluster-yn 4-yjm 1024-ytm 4096. / examples/batch/WordCount.jar

Some people may ask, why not specify master, in fact, he will automatically find the flink cluster and submit the job.

The premise is that the yarn configuration of hadoop should be set in advance.

If any one of the variables YARN_CONF_DIR and HADOOP_CONF_DIR or HADOOP_CONF_PATH is configured, flink can read the configuration file of yarn.

After reading this article, I believe you have some understanding of "example Analysis of Flink on yarn Operation principle". If you want to know more about it, please follow the industry information channel. Thank you for your reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.