Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the architecture and running architecture of Flink

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the architecture and running architecture of Flink". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how the architecture and running architecture of Flink is.

Flink architecture and its main components

Like most big data frameworks, Flink is a very classic Master/Slave structure implementation, and JobManager is Master,TaskManager and Slave.

JobManager processor (Master)

Coordinate distributed execution, which is used to schedule task, coordinate checkpoints (CheckPoint), coordinate recovery on failure, etc., Flink runtime has at least one master processor, if configured with high availability mode, there will be multiple master processors, one of which is leader, and the others are standby. Applications received by JobManager include jar and JobGraph.

TaskManager processor (Slave)

The processor, also known as Worker, is mainly responsible for receiving tasks from JobManager, deploying and starting tasks, receiving upstream data and processing. TaskManager is a work node that executes tasks in one or more threads in JVM. TaskManager registers its own resource information (number of Slot, etc.) with ResourceManager when starting.

ResourceManager

Flink provides different ResourceManager for different environments and resource providers, such as (YARN,Me search, Kubernetes or stand-alone deployment). Its role is the resource processing unit responsible for Flink: slot, which can be understood as cpu and memory resources.

Dispatcher

Provide a REST interface for us to submit applications that need to be executed. Once an application is submitted for execution, Dispatcher launches a JobManager and transfers the application to him. Dispatcher also launches a webUI to provide information about job execution Note: some applications may not use Dispatcher in the way they are submitted for execution.

For the relationship of the above components, please refer to the following figure:

Flink operational architecture

The difference between the running architecture and the architecture is that the architecture mainly refers to the embodiment of the Flink on the entity machine, what the process is and how the whole process system is, while the running architecture refers to the abstract processes that a program takes after it is submitted.

Flink program structure

The basic building blocks of Flink programs are streams and transformations (note that the DataSet used in Flink's DataSet API is also an internal stream). Conceptually, a stream is a (possibly endless) data record stream, while a transformation is to input one or more streams and produce one or more output streams

The above figure shows the application structure of Flink. There are three important components of Source data source: Source (source), Transformation (transformation) and Sink (receiver). It defines where Flink loads data. Flink has about four types of source in streaming and batch processing: source based on local collection, file-based source, source based on network sockets, and custom source. Common custom source include Apache kafka, RabbitMQ, and so on. Various operations of Transformation data conversion, also known as operators, such as Map / FlatMap / Filter / KeyBy / Reduce / Window, can convert the data into the data you want. Sink receiver, where Flink sends the converted calculated data, and defines the output direction of the resulting data. The common Sink types of Flink are as follows: write file, print out, write socket, custom sink. Common custom sink include Apache kafka, RabbitMQ, MySQL, ElasticSearch, Apache Cassandra, HDFS and so on.

Task and SubTask

Task is a collection of multiple SubTask with the same function in a phase, similar to TaskSet in Spark.

SubTask (subtask) SubTask is the smallest task execution unit in Flink and an instance of a Java class. This Java class has properties and methods to complete specific computing logic, such as an execution operation map. In a distributed scenario, it will be executed in multiple threads simultaneously, and each thread is called a SubTask.

Operator chain (operator chain)

All operations of Flink are called Operator, and the client will optimize the Operator when submitting the task, and the Operator that can be merged will be merged into an Operator, and the merged Operator will be called Operator chain, which is actually an execution chain, and each execution chain will be executed in a separate thread on the TaskManager. Shuffle

Data Transmission in Flink

In the course of running, the tasks in the application will continue to exchange data. In order to make effective use of network resources and improve throughput, Flink uses a buffer mechanism in the process of data transfer between tasks.

Task slot and slot sharing

Task slot is also called task-slot, slot sharing is also called slot sharing.

Each TaskManager is a JVM process that can perform one or more subtasks in different threads. To control how many task a worker can receive. Worker is controlled by task slot (a worker has at least one task slot)

Task slot

Each task slot indicates that the TaskManager owns a fixed-size subset of resources. Generally speaking: we allocate the number of slots is equal to the number of CPU cores, for example, 6 cores, then allocate 6 slots. Flink divides the memory of the process into multiple Slot. Suppose a TaskManager machine has three slot, then each slot occupies 1 to 3 memory (split equally).

After the memory is divided into different slot, you can get the following benefits: the maximum number of tasks that can be executed concurrently by TaskManager can be controlled, that is, three, because it cannot exceed the number of slot. Slot has exclusive memory space, so that multiple different jobs can be run in one TaskManager without shadow between jobs.

Slot sharing

By default, Flink allows subtasks subtast (map [1] map [2] keyby [1] keyby [2] to share slots, even if they are subtasks of different tasks, as long as they come from the same job. The result is that a slot can hold the entire pipe of the job.

Thank you for your reading, the above is the content of "what is the architecture and running architecture of Flink". After the study of this article, I believe you have a deeper understanding of how the architecture and running architecture of Flink is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report