Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to set parallelism in Apache Flink

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to set parallelism in Apache Flink. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

When using Apache Flink to process data, you usually need to set the degree of parallelism. Parallelism is a very important concept in Apache Flink. Setting a reasonable degree of parallelism can speed up the efficiency of data processing, and unreasonable parallelism will reduce the efficiency and even lead to task errors.

The Apache Flink program contains multiple tasks (source,transformations/operators,sink). These tasks are performed using several parallel instances, which are called parallelism.

How to set parallelism

Apache Flink supports setting parallelism at different levels. Profile, env level, operator level.

Profile default

If parallelism is not taken into account when we submit a Job, then Flink will use the parallelism in the default configuration file. We can view the parallelism of the Flink configuration file with the command.

$cat flink-conf.yaml | grep "parallelism.default"

Parallelism.default: 1

For example, the parallelism degree currently obtained is 1. That is, when you do not set the parallelism, it will use the default parallelism 1 of the configuration file.

2. Env level

The level of env is the Environment level. That is, the overall Job parallelism is set through Execution Environment.

Val env = Stream...

Env.setParallelism (5)

Client level

If you find that the parallelism is not set in the code and does not modify the configuration file when executing Job, you can set the parallelism of Job through Client.

. / bin/flink run-p 5.. / wordCount-java*.jar

-p sets the Job parallelism of WordCount to 5. 4. Operator level

When we write a Flink project, we may set different parallelism for different Operator. For example, to achieve the most efficient reading of Kafka, we need to set the parallelism by the number of partition referring to Kafka, and when Sink, we need to set different parallelism for the media of Sink. In this way, there will be a Job that needs to have multiple parallelism. This requires the use of operator-level parallelism settings.

Val env = Stream...

Val text =...

Text.keyBy (XXX)

.flatMap (XXX) .setParallelism (5) / / set to 5 when calculating

.addSink (XXXXX) .setParallelism (1) / / set to 1 when writing to the database

A high level of parallelism overrides low-level configurations. For example, the policy set in the operator overrides the parallelism in the configuration file.

In terms of priority: operator level > env level > Client level > system default level

In practical use, we need to set a reasonable degree of parallelism to ensure efficient data processing. In general, such as source,Sink, we may need different parallelism to ensure fast data reading and writing load.

Number of parallelism settings

The parallelism setting of Apache Flink does not mean that the larger the better, the higher the efficiency of data processing. Instead, you need to set a reasonable degree of parallelism. So what is reasonable?

The parallelism of Apache Flink depends on the number of slot on each TaskManager. Flink's JobManager divides tasks into subtasks and submits them to slot for execution. The same slot shares the same JVM resources, while providing information such as a maintained heartbeat to the Flink.

Slot refers to the concurrent execution ability of TaskManagere. Generally speaking, there will be as many slot as there are core CPU in TaskManager. From this point of view, the degree of parallelism we set is actually related to the total number of Slot in TaskManager.

This is the end of the article on "how to set parallelism in Apache Flink". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report