1. Flink-- architecture, operation and scheduling principle 09/21 Update SLTechnology News&Howtos

1. Flink-- architecture, operation and scheduling principle

2025-09-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. Overview of flink 1.1 stream processing technology semantics

At most once (once at most): each data record is processed at most once, and the subtext indicates that the data may be lost (not disposed of).

At least once (at least once): each data record is processed at least once. This is stronger than the previous point in that it at least ensures that the data will not be lost, at least it has been processed, and the only disadvantage is that the data may be reprocessed.

Exactly once (exactly once): each data record is processed exactly once. There is no data loss and no repetitive data processing. This is the most demanding of the three semantics.

1.2 what is flink

The Flink home page shows the concept of the project at the top: "Apache Flink is an open source stream processing framework for distributed, high-performance, ready-to-use, and accurate streaming applications." Apache Flink is a framework and distributed processing engine for stateful computing of * * and bounded data streams. Flink is designed to run in all common cluster environments, performing calculations at memory execution speed and on any scale.

1.3 basic framework of flink

The characteristics of batch processing are bounded, persistent and massive. Batch processing is very suitable for computing work that needs to access a full set of records to complete, and is generally used for offline statistics. Stream processing is characterized by real-time and real-time. Stream processing does not need to perform operations on the entire data set, but on each data item transmitted through the system, which is generally used for real-time statistics.

In the Spark ecosystem, different technical frameworks are used for batch and stream processing, batch processing is implemented by SparkSQL, and flow reason Spark Streaming is implemented, which is also the strategy adopted by most frameworks, which uses independent processors to implement batch and stream processing, while Flink can achieve both batch and stream processing.

How does Flink implement both batch and streaming processing? The answer is that Flink treats batch processing (that is, processing limited static data) as a special stream processing.

The core computing architecture of Flink is the Flink Runtime execution engine in the following figure, which is a distributed system that accepts data flow programs and executes in a fault-tolerant manner on one or more machines.

The Flink Runtime execution engine can run on a cluster as a YARN (Yet Another Resource Negotiator) application, on a Mesos cluster, or on a stand-alone machine (which is useful for debugging Flink applications).

figure 1.1 flink-- basic architecture

The figure above is the core component of Flink technology stack. It is worth mentioning that Flink provides streaming-oriented interface (DataStream API) and batch-oriented interface (DataSet API) respectively. Therefore, Flink can complete both streaming and batch processing. The extension libraries supported by Flink include machine learning (FlinkML), complex event processing (CEP), and graph computing (Gelly), as well as Table API for streaming and batch processing, respectively.

The programs that can be accepted by the Flink Runtime execution engine of are very powerful, but such programs have lengthy code and are difficult to write. For this reason, Flink provides API encapsulated on the Runtime execution engine to help users easily generate streaming computing programs. Flink provides DataStream API for streaming and DataSet API for batch processing. It is worth noting that although the Flink Runtime execution engine is based on stream processing, DataSet API was developed before DataStream API because the industrial demand for infinite flow processing was small at the beginning of Flink.

DataStream API can analyze infinite data streams fluently and can be implemented in Java or Scala. Developers need to develop based on a data structure called DataStream, which represents a never-ending distributed data stream.

The distributed feature of Flink is that it can run on hundreds of machines. It divides large computing tasks into many small parts, each machine executes a part. Flink can automatically ensure that the calculation continues in the event of a machine failure or other error, or it can be performed again in a planned way after bug repair or version upgrade. This capability frees developers from having to worry about running failures. Flink essentially uses fault-tolerant data streams, which allows developers to analyze data that is continuously generated and never ends (that is, stream processing).

1.4 Infinite data flow and finite data flow

Infinite data set: an infinite continuous set of data sets.

Finite data set: a limited set of data that will not change

Common infinite data sets are:

Real-time interactive data between user and client

Log generated in real time by the application

Real-time transactions in financial markets

1.5Compared with stormflink stateless management, Flink and storm require users to manage stateful windows on their own. Stateful window support is weaker for event windows, caches all data of the entire window, and supports computing windows together at the end of the window. It comes with some window aggregation methods, and automatically manages the window state. Message delivery semantic At Most OnceAt Least OnceAt Most OnceAt Least Once Exactly Once fault-tolerant ACK mechanism: full-link tracking for each message and retransmission of failure or timeout. Checkpoint mechanism: through the distributed consistent snapshot mechanism, the data flow and operator state are saved. Enables the system to roll back when an error occurs. The application status has been more mature in Meituan Dianping's real-time computing business, including management platform, commonly used API and corresponding documents, and a large number of real-time jobs are built on Storm. There are some applications in Meituan Dianping's real-time computing business, but the management platform, API and documents still need to be further improved. 1.6 flink characteristics

1. High throughput and low delay

2. Support Event Time and out-of-order events

Flink supports the window mechanism of stream processing and Event Time semantics.

Event time makes it easier to calculate events that arrive out of order or that may be delayed.

3. Exactly-once semantics of state computing.

In the failure state, the computing task needs to be restarted, and the repeated processing of the data that has been processed needs to be avoided.

The stream program can maintain a custom state during the calculation.

The checkpointing mechanism of Flink ensures that the exactly once semantics of the state can be guaranteed even when a fault occurs.

4. Highly flexible streaming window

Flink supports time windows, statistics windows, session windows, and data-driven windows.

Windows can be customized with flexible trigger conditions to support complex flow computing patterns.

5. Continuous flow model with backpressure.

Data flow applications perform an uninterrupted (resident) operators.

Flink streaming has natural flow control at run time: slow data sink nodes backpressure fast data sources (sources).

6. Fault tolerance

The fault tolerance mechanism of Flink is based on Chandy-Lamport distributed snapshots.

This mechanism is very lightweight, allowing the system to have high throughput while providing strong consistency.

7. Batch and Streaming share one engine for system streaming and batch processing

Flink shares a common engine for streaming and batch applications. Batch applications can run efficiently as a special streaming application.

8. Memory management

Flink implements its own memory management in JVM.

Applications can exceed the size limit of main memory and bear less garbage collection overhead.

9. Iteration and incremental iteration

Flink has special support for iterative computing (for example, in machine learning and graph computing).

Incremental iterations can use dependent computation to converge faster.

10. Program tuning

The batch program automatically optimizes scenarios, such as avoiding expensive operations such as shuffles and sorts, and caching some intermediate data.

1.7 flink application scenarios

Apache Flink is powerful and supports the development and running of many different kinds of applications. Its main features include batch flow integration, precise state management, event time support and accurate once state consistency guarantee. Flink can not only run on a variety of resource management frameworks, including YARN, Mesos, Kubernetes, but also support independent deployment on bare metal clusters. It does not have a single point of failure when the highly available option is enabled. Facts have proved that Flink can be extended to thousands of cores, its state can reach the TB level, and can still maintain the characteristics of high throughput and low latency. There are many demanding stream processing applications running on Flink around the world.

1.7.1 event-driven applications

Anti-fraud

Anomaly detection

Rule-based alarm

Business process monitoring

Web application

1.7.2 Application of data analysis

Telecom network quality monitoring

Analysis of Product Update and Experimental Evaluation in Mobile applications

Large-scale graph analysis

1.7.3 data pipeline application

Construction of Real-time query Index in Electronic Commerce

Continuous ETL in Electronic Commerce

II. Roles in Flink basic Architecture 2.1 flink

The Flink runtime includes two types of processors:

JobManager processors: also known as Master, used to coordinate distributed execution, they are used to schedule task, coordinate checkpoints, restore on coordination failure, and so on. There is at least one master processor in the Flink runtime, and if you configure high availability mode, there will be multiple master processors, one of which is leader and the others are standby.

TaskManager processor: also known as Worker, used to perform a dataflow's task (or special subtask), data buffering, and data stream exchange, Flink runtime will have at least one worker processor.

figure 2.1 flink--JobManager and TaskManager

Master and Worker processors can be started directly on the physical machine or through a resource scheduling framework such as YARN. The Worker connects to the Master, informs itself of its availability and gets the task assignment.

2.2 * data flow and bounded data flow

* data flow:

* data streams have a beginning but no end. They do not terminate and provide data when they are generated. * streams must be processed continuously, that is, event must be processed immediately after acquisition. For * data streams, we cannot wait for all the data to arrive because the input is * and will not be completed at any point in time. Processing * * data usually requires that the event be obtained in a specific order, such as the order in which events occur, so that the integrity of the results can be inferred.

Bounded data flow:

Bounded data streams have clearly defined starts and ends, and bounded flows can be processed by getting all the data before performing any calculation. Bounded flows do not need to be acquired in order, because bounded data sets can always be sorted. Bounded flow processing is also known as batch processing.

Apache Flink is an open source computing platform for distributed data stream processing and batch data processing. It can support streaming and batch processing applications based on the same Flink runtime (Flink Runtime). The existing open source computing solutions will regard streaming processing and batch processing as two different application types, because the goals they want to achieve are completely different: stream processing generally needs to support low latency and Exactly-once guarantee, while batch processing needs to support high throughput and efficient processing, so two sets of implementation methods are usually given during implementation. Or implement each of these solutions through a separate open source framework. For example, the open source solutions for batch processing are MapReduce, Tez, Crunch, Spark, and the open source solutions for stream processing are Samza and Storm.

When Flink implements stream processing and batch processing, it is completely different from some traditional schemes. It looks at stream processing and batch processing from another perspective, unifying the two: Flink fully supports stream processing, that is, the input data stream is * when viewed as stream processing; batch processing is treated as a special stream processing, but its input data stream is defined as bounded. Based on the same Flink runtime (Flink Runtime), streaming and batch API are provided respectively, and these two API are also the basis for implementing upper-level stream-oriented and batch-type application frameworks.

2.3 flink data flow programming interface abstraction

Flink provides different levels of abstraction to develop a stream or batch job, as shown in the following figure:

figure 2.3 flink programming interface abstraction

The lowest-level abstraction of provides only stateful flows, which are embedded in DataStream API through procedural functions (Process Function). The underlying procedure function (Process Function) is integrated with DataStream API to allow low-level abstraction of certain operations, allowing users to freely handle events from one or more data streams and use consistent fault-tolerant states. In addition, the user can register the event time and handle the time callback so that the program can handle complex calculations.

in fact, most applications do not need the underlying abstraction mentioned above, but are programmed against core API (Core APIs), such as DataStream API (bounded or * * streaming data) and DataSet API (bounded dataset). These API provide common building blocks for data processing, such as user-defined various forms of transformation (transformations), joins (joins), aggregations (aggregation), windows (window operation), and so on. DataSet API provides additional support for bounded datasets, such as loops and iterations. The data types processed by these API are represented by their respective programming languages in the form of classes.

Table API is table-centric declarative programming in which tables can change dynamically (when expressing stream data). Table API follows the (extended) relational model: tables have two-dimensional data structures (schema) (similar to tables in relational databases), while API provides comparable operations such as select, project, join, group-by, aggregate, and so on. The Table API program declaratively defines what logical operations should be performed, rather than determining exactly how the operation code looks. Although Table API can be extended through many types of user-defined functions (UDF), it is still not as expressive as the core API, but it is simpler to use (less code). In addition, Table API programs are optimized by a built-in optimizer before execution.

You can seamlessly switch between tables and DataStream/DataSet to allow programs to mix Table API with DataStream and DataSet.

The highest level of abstraction provided by Flink is SQL. This layer of abstraction is similar to Table API in syntax and expressiveness, but represents the program in the form of SQL query expressions. The SQL abstraction interacts closely with Table API, while SQL queries can be executed directly on tables defined by Table API.

3. Flink runs the architecture 3.1 the process of submitting tasks to yarn

In the production of flink, yarn is generally used as the resource scheduling platform, and standalone is rarely used for resource scheduling. So here we take yarn as an example to illustrate the process of submitting a task to yarn by flink.

figure 3.1 flink-- submits a task to the yarn process

After the Flink task is submitted, Client uploads the Jar package and configuration of Flink to HDFS, then submits the task to Yarn ResourceManager, ResourceManager allocates Container resources and informs the corresponding NodeManager to start ApplicationMaster,ApplicationMaster to load Flink's Jar package and configure the construction environment, and then starts JobManager. After ApplicationMaster applies for resources from ResourceManager to start TaskManager,ResourceManager to allocate Container resources, ApplicationMaster instructs the NodeManager of the node where the resource is located to start TaskManager,NodeManager to load Flink's Jar package and configuration build environment and start TaskManager After TaskManager starts, it sends a heartbeat to JobManager and waits for JobManager to assign tasks to it.

3.2 Task scheduling component

figure 3.2 flink-- task scheduling

1. Program Code: the Flink application code we wrote

2. Job Client:Job Client is not the internal part of Flink program execution, but it is the starting point of task execution. Job Client is responsible for accepting the user's program code, then creating the data flow, and submitting the data flow to Job Manager for further execution. After the execution is completed, Job Client returns the result to the user

3. JobManager: the main process (also known as the job manager) coordinates and manages the execution of the program. Its main responsibilities include scheduling tasks, managing checkpoint, fault recovery and so on. At least one master,master in the machine cluster must be responsible for scheduling task, coordinating checkpoints and disaster recovery. If there is a high availability setting, there can be multiple master, but to ensure that one is active, the other is that standby; Job Manager contains three important components: Actor system (communication system), Scheduler (scheduling) and Check pointing.

4. Task Manager: receive the Task to be deployed from Job Manager. A Task Manager is a worker node that executes a task in one or more threads in JVM. The parallelism of task execution is determined by the task slots (task slot) available on each Task Manager. Each task represents a set of resources assigned to the task slot. For example, if Task Manager has four slots, it allocates 25% of the memory for each slot. You can run one or more threads in the task slot. Threads in the same slot share the same JVM. Tasks in the same JVM share TCP connections and heartbeat messages. A Slot of Task Manager represents an available thread that has fixed memory. Note that Slot isolates only memory, not CPU. By default, Flink allows subtasks to share Slot, even if they are subtask of different task, as long as they come from the same job. This kind of sharing can have better resource utilization.

3.3 TaskManager and slots principles

each worker (TaskManager) is a JVM process that may execute one or more subtask on separate threads. In order to control how many task,worker a worker can receive, it is controlled by task slot (a worker has at least one task slot).

each task slot represents that the TaskManager owns a fixed-size subset of resources. If a TaskManager has three slot, it equally divides the memory it manages into three parts to each slot. Resource slotalization means that a subtask will not need to compete with subtask from other job for managed memory. Instead, it will have a certain amount of memory reserve. It is important to note that CPU isolation is not involved here, and slot is currently only used to isolate task's managed memory.

allows users to define how subtask are isolated from each other by adjusting the number of task slot. If one TaskManager and one slot, that means that each task group runs in a separate JVM (the JVM may be launched through a specific container), and one TaskManager with multiple slot means that more subtask can share the same JVM. Task in the same JVM process will share TCP connections (based on multiplexing) and heartbeat messages. They may also share datasets and data structures, so this reduces the load on each task.

figure 3.3 taskManager and slots

TaskSlot is a static concept, which refers to the concurrent execution ability of TaskManager, which can be configured by the parameter taskmanager.numberOfTaskSlots, while the parallelism parallelism is a dynamic concept, that is, the concurrency ability actually used by TaskManager when running programs, which can be configured by the parameter parallelism.default.

, that is, suppose there are 3 TaskManager, and 3 TaskSlot are allocated in each TaskManager, that is, each TaskManager can receive 3 task for a total of 9 TaskSlot. If we set parallelism.default=1, that is, the default parallelism of running the program is 1 TaskSlot, only 1 of 9 parallelism is used, and 8 are free. Therefore, setting appropriate parallelism can improve efficiency. In fact, the slots limit limits the number of task that the taskmanager can run in parallel in the entire cluster, while parallelism.default limits the number of slot that a single job can use, but allows multiple job to run at the same time, so it is actually a concurrency limit on a single job.

3.4 programs and data streams

The basic building blocks of the Flink program are streams and transformations (it should be noted that the DataSets used by Flink's DataSet API is also stream internally). A stream can be seen as an intermediate result, while a transformations is some kind of operation that takes one or more stream as input, and the operation uses these stream to calculate to produce one or more result stream.

When is running, programs running on Flink are mapped to streaming dataflows, which contains streams and transformations operators. Each dataflow begins with one or more sources and ends with one or more sinks. Dataflow is similar to any directed acyclic graph (DAG), although certain forms of rings can be built through iteration. In most cases, there is an one-to-one correspondence between transformations in a program and operator in dataflow, but sometimes a transformation may correspond to multiple operator.

figure 3.4 Program and data flow

3.5 parallel data streams (operator parallel)

The execution of Flink programs has the characteristics of parallel and distributed. During execution, a stream contains one or more stream partition, and each operator contains one or more operator subtask, which are executed independently of each other in different threads, different physical machines, or different containers.

the number of subtask of a particular operator is called its parallelism (parallelism). The parallelism of a stream is always equal to that of its producing operator. In a program, different operator may have different degrees of parallelism.

figure 3.5 parallel data flow

Stream can transfer data between operator in the form of one-to-one (forwarding) or redistributing mode, depending on the type of operator.

One-to-one:stream (for example, between source and map operator) maintains partitions and the order of elements. That means that the number and order of elements seen by map operator's subtask are the same as those produced by source operator's subtask. Map, fliter, flatMap and other operators are all corresponding to one-to-one. This mode can only be done if the partition is not changed.

The partition of Redistributing:stream (between map () and keyBy/window or between keyBy/window and sink) will change. Each operator subtask sends data to a different destination subtask according to the selected transformation. For example, keyBy () is repartitioned based on hashCode, broadcast and rebalance are randomly repartitioned, and these operators all cause redistribute processes, and redistribute processes are similar to shuffle processes in Spark.

3.6 task and operator chains

for distributed execution purposes, Flink links the subtask of the same type of operator together to form a task, and each task executes in a thread. Linking operators to task is a very effective optimization: it reduces switching between threads and cache-based data exchange, reducing latency while improving throughput. The behavior of the link can be specified in the programming API. This task chain is actually in the same way as dividing the stage and then building the task chain in spark. If you don't understand it, you can read the previous spark article.

The following figure shows five subtask executing with five parallel threads:

figure 3.6 flink--operator chains

See the above figure, because the keyBy operator will lead to re-partitioning, then take this as the boundary, divide the stage, and then the front source and map can build the task chain independently, and the later keyBy and window can build the task chain separately. Plus the final unified sink operation, it is actually five task chains, and then run according to the sequence. This mechanism is the same as that of spark.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.