In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to analyze the Apache Flink framework, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
First: Flink history, basic architecture and distributed deployment history
The Flink project started in 2010 with the "Stratosphere: Information Management on the Cloud" (stratosphere: information Management in the Cloud) project jointly developed by the Technical University of Berlin, Humboldt University in Berlin and the Hasso Platner Institute. Flink started as Fork as a distributed execution engine for the project, became a project under the Apache Foundation in 2014, and became a top Apache project by the end of 2014. The annual Flink Forward is the biggest annual meeting about Apache Flink.
Basic architecture
Flink is a native stream processing system that provides API for high level. Flink also provides API for batch processing like Spark, but the basis of the two processing is completely different. Flink treats batch processing as a special case of streaming. In Flink, all data is treated as a stream, which is a good abstraction because it is closer to the real world.
Cdn.nlark.com/lark/0/2018/png/108276/1542247457636-84d1bd8c-ad12-4c67-99a2-9cfde822141f.png ">
Basic architecture diagram of Flink
The main architecture of Flink is similar to that of Spark, which is based on the master-slave mode of Master-Slave. In terms of execution order:
1: start the cluster, start JobManager and multiple TaskManager
The 2:Flink Program program submits the code, generates the actual Job to be executed through the optimizer / task graph generator, and passes it to Client
3:Client submits the submit task (essentially sending a data stream containing the task information) to JobManager
4:JobManager distributes tasks to each Worker----TaskManager that actually performs computing tasks
5:TaskManager begins to perform computing tasks, and regularly reports heartbeat information and statistics to JobManager,TaskManager for data transmission in the form of stream
In the above steps, there may be no attribution relationship between step 2 and the Flink cluster, that is, we can submit jobs on any machine, as long as it is connected to JobManager. After the Job is submitted, the Client can even end the process directly without affecting the execution of the task in the distributed cluster.
Client:
When a user submits a Flink program, a Client will be created first. The Client will first preprocess the Flink program submitted by the user and submit it to the Flink cluster for processing, so Client needs to obtain the address of JobManager from the Flink program configuration submitted by the user, establish a connection to JobManager, and submit the Flink Job to JobManager. Client assembles a JobGraph of the Flink program submitted by the user and submits it in the form of JobGraph. A JobGraph is a Flink Dataflow, which is a DAG made up of multiple JobVertex. Therefore, a JobGraph contains the following information for a Flink program: JobID, Job name, configuration information, a set of JobVertex (actual task operators), and so on.
JobManager:
JobManager is the coordinator of the Flink system, which is responsible for receiving the Flink Job and scheduling the execution of multiple Task that make up the Job. At the same time, JobManager is also responsible for collecting the status information of the Job and managing the slave node TaskManager in the Flink cluster. It mainly includes:
When RegisterTaskManager-- starts up in the Flink cluster, TaskManager registers with JobManager. If the registration is successful, JobManager will reply the message AcknowledgeRegistration to TaskManager.
Within the SubmitJob--Flink program, the Flink Job is submitted to JobManager through Client, in which the basic information of Job is described as JobGraph in the message SubmitJob.
CancelJob-- requests to cancel the execution of a Flink Job. The CancelJob message contains the ID of Job. If it succeeds, it returns the message CancellationSuccess. If it fails, it returns the message CancellationFailure.
UpdateTaskExecutionState--TaskManager will request JobManager to update the status information of ExecutionVertex in ExecutionGraph, that is, report the specific execution status of operator to JobManager. If the update is successful, true will be returned.
Others include RequestNextInputSplit and JobStatusChanged
TaskManager:
TaskManager is also an Actor (in charge), which is the Worker that is actually responsible for performing the calculation, and a set of Task on which the Flink Job is executed. It sets the number of slots (Slot) when it starts, and each slot can start a Task,Task as a thread. TaskManager receives the Task that needs to be deployed from JobManager, and after the deployment starts, it establishes a Netty connection with its own upstream (upstream processing node with dependencies on the task), receives the data and processes it. Each TaskManager is responsible for managing the resource information on its node, such as memory, disk, and network, and reports the status of the resource to the JobManager at startup.
The TaskManager side can be divided into two phases:
Registration phase-TaskManager registers with JobManager, sends RegisterTaskManager messages, waits for JobManager to return AcknowledgeRegistration, and then TaskManager can initialize the process
Operational phase-this is the stage in which TaskManager can receive and process messages related to Task, such as SubmitTask, CancelTask, FailTask. If the TaskManager cannot connect to the JobManager, TaskManager loses contact with the JobManager and automatically enters the "registration phase". Only when the registration is completed can the Task related messages continue to be processed.
Structure based on Yarn level
1: Clinet client uploads jars to HDFS with Flink and HDFS configuration, because YARN client needs to access Hadoop configuration to connect YARN Explorer and HDFS;2: Clinet client requests a YARN container as resource manager-Resource Manager to launch ApplicationMaster
3: RM assigns the first container to run AM--AppplicationMaster
4: AM starts to be responsible for the supervision and management of resources
5: Job Manager and AM run in the same container, and after both are successfully started, AM knows the address of the job manager (the host it owns)
6: Job Manager generates a new Flink configuration for Task Manager so that task can connect to Job Manager
7: the AM container can be used as a web interface service for Flink. All ports of YARN code are assigned temporary ports, which allows users to execute multiple yarn sessions in parallel.
8: AM launches the assigned containers. As Task Manager of Flink, these containers will download jar and update configuration from HDFS. Cluster Run can receive Job.
HA scheme of Flink cluster:
In the basic architecture diagram of Flink, we find that there is a single point of problem with this Master-Slave pattern, that is, if down is dropped at the point of JobManager, the whole cluster will be wiped out. Flink provides a total of three deployment modes: Local, Standalone, and YARN. Except the first is a local stand-alone mode, the latter two are all in cluster mode. For Standalone and YARN,Flink, a HA mechanism is provided to avoid the above single point of failure and enable the cluster to recover from failure.
YARN mode:
The mechanism at the Yarn level is introduced in the previous paragraph, and it is noted that Flink's JobManager is in the same process as YARN's Application Master (AM for short). YARN's ResourceManager monitors AM. When AM is abnormal, YARN restarts AM. After startup, all JobManager metadata is recovered from HDFS. However, during the recovery period, the old business cannot be run and the new business cannot be submitted. There is still JobManager metadata on ZooKeeper (Apache ZooKeeper ™), such as information about running Job, which will be provided to the new JobManager. The failure of TaskManager is handled by the DeathWatch mechanism of Akka on JobManager. When the TaskManager fails, re-apply to YARN for the container and create the TaskManager.
Standalone mode:
For clusters in Standalone mode, you can start multiple JobManager and then elect leader as the actual JobManager through ZooKeeper. In this mode, a master JobManager (Leader JobManager) and multiple standby JobManager (Standby JobManager) can be configured, which ensures that when the master JobManager fails, a standby JobManager can assume the responsibility of the master. The following figure shows the recovery process of the master / slave JobManager.
Second: Flink's streaming computing architecture hierarchical stack
Deployment layer:
Local, cluster, and commercial cloud models, which are not discussed in detail
Runtime layer:
The Runtime layer provides all the core implementations that support Flink computing, such as supporting distributed Stream processing, JobGraph-to-ExecutionGraph mapping, scheduling and so on, providing basic services for the upper API layer.
API layer:
The API layer mainly implements stream processing for unbounded Stream and batch processing API for Batch, in which stream processing corresponds to DataStream API and batch processing corresponds to DataSet API. To put it simply, both DataSet and DataStream are immutable collections containing duplicate data, except that in DataSet, the data is limited, while for DataStream, the number of elements can be unlimited. As far as the program is concerned, the source of the initial data collection is the source data in Flink program, such as the online real-time data source of double 11 payment data with large screen; then through filter, map, flatmap and other API, they can be converted, thus a new set can be derived from the initial data set. Note that the collection is immutable and can only derive new ones, not modify the original ones.
Libraries layer:
In the Flink application framework layer, according to the division of the API layer, the application-specific implementation computing framework built on the API layer also corresponds to two categories: stream-oriented and batch-oriented. Stream-oriented support: CEP (complex event processing), SQL-like-based operations (Table-based relational operations); batch-oriented support: FlinkML (machine learning library), Gelly (graph processing).
Third, characteristic analysis of high throughput & low latency
To put it simply, the outstanding advantage of Flink over Spark Streaming & Storm in streaming computing is high throughput and low latency, as shown in the following figure:
Support for Event Time and out of order events
Flink supports the window mechanism of stream processing and Event Time semantics. Before discussing how to solve the problem of message disorder, you need to define the time and order. In stream processing, there are two concepts of time:
Event time: Event time is the time when the event occurred, often expressed as a timestamp, and sent with the data. Time-stamped data streams include Web service log, monitoring agent log, mobile log, etc.
Processing time: Processing time is the server time that processes the event data, typically the server clock that runs the streaming application.
In many flow processing scenarios, there are various delays in the time when the event occurs and when the event arrives in the message queue to be processed:
Various network delays
Queue blocking and backpressure effects caused by data flow consumers
Data stream burr, that is, data fluctuation
Event producer (mobile device, sensor, etc.) offline
Many of the above reasons can cause messages in the queue to be out of order frequently. The time at which the event occurs and the time it takes for the event to arrive in the message queue to be processed vary over time, which is often referred to as time offset (event time skew), expressed as "processing time-event time".
For most applications, event-based creation time analysis data is more meaningful than event-based processing time analysis data. Flink allows users to define windows based on event time (event time) rather than processing time.
Flink uses the event time clock to track the event time, which is implemented in watermarks. Watermarks is a special event generated by the Flink source stream based on the event point in time. The watermarks at T point in time means that events less than T's timestamp will no longer arrive. All operations of Flink are based on watermarks to track the event time.
Exactly-once and Fault-tolerant Mechanism of State Computing
The stream program can maintain a custom state during the calculation.
Apache Flink provides a fault-tolerant mechanism that can restore the data flow to a consistent state. Ensure that in the event of a failure, each record of the program acts on the state only once (exactly-once), but it can also be degraded to at least once (at-least-once). This fault-tolerant mechanism is realized by continuously creating snapshots of distributed data streams. For streaming applications with small state footprint, these snapshots are very lightweight and can be created with high frequency with little impact on performance. The state of the stream computing application is saved in a configurable environment, such as a master node or HDFS.
In the event of a program failure (such as machine, network, software, etc.), Flink stops the distributed data flow. The system restarts all operator and resets them to the most recent successful checkpoint. Enter the reset to the appropriate status snapshot location. Ensure that any record processed in the restarted parallel data stream is not part of the previous checkpoint state.
In order for the fault-tolerant mechanism to work, data sources (such as message queues or broker) need to be able to replay the data flow. Apache Kafka has this feature, and Kafka's connector in Flink takes advantage of this feature. The group's TT system has the same function.
One of the core concepts of Flink distributed snapshots is barrier. As shown in the figure above, these barrier are inserted into the data stream and flow down with the data as part of the data flow. Barrier will not interfere with normal data, and the data flow is strictly orderly. A barrier splits the data flow into two parts: one enters the current snapshot and the other goes to the next snapshot. Each barrier comes with a snapshot ID, and data prior to barrier is entered into this snapshot. Barrier does not interfere with data flow processing, so it is very lightweight. Multiple barrier of multiple different snapshots appear simultaneously in the flow, that is, multiple snapshots may be created at the same time.
Barrier is inserted on the data source side, and when the barrier of snapshot N is inserted, the system records the current snapshot location value N (represented by Sn). For example, in Apache Kafka, this variable represents the offset of the last piece of data in a partition. This location value Sn is sent to a module called Checkpoint Coordinator (that is, Flink's JobManager).
The barrier then flows down, and when an operator receives all the barrier that identifies snapshot N from its input stream, it inserts a barrier that identifies snapshot N into all its output streams. When sink operator (the end point of the DAG stream) receives all barrier N from its input stream, it confirms to Checkpoint Coordinator that snapshot N has been completed. When all sink confirm the snapshot, the snapshot is identified as complete.
Highly flexible streaming window Window
Flink supports time windows, statistics windows, session windows, and data-driven windows. Windows (Window) can be customized with flexible trigger conditions to support complex stream computing patterns.
Description from cloud evil-"in streaming applications, data is continuous, so it is impossible for us to wait until all the data arrives." Of course we can process every message, but sometimes we need to do some aggregation processing, such as how many users have clicked on our page in the past minute. In this case, we must define a window to collect the data in the last minute and calculate the data in that window. "
Windows can be time-driven (Time Window, for example, every 30 seconds) or data-driven (Count Window, for example, every hundred elements). A classic window classification can be divided into: tumble window (Tumbling Window), scroll window (Sliding Window), and session window (Session Window).
Continuous flow model with reverse pressure (BackPressure)
Data flow applications perform an uninterrupted (resident) operators.
Flink streaming has natural flow control at run time: slow data sink nodes backpressure fast data sources (sources).
Backpressure usually occurs in a scenario where a short-term load peak causes the system to receive data at a much higher rate than it can process. Many day-to-day problems can lead to backpressure, for example, a garbage collection pause can lead to a rapid accumulation of incoming data, or a sharp boost or flash sale activity that leads to a sharp increase in traffic. If the reverse pressure is not handled correctly, it may lead to resource depletion or even system collapse.
Reverse pressure of Flink:
If you see a back pressure alert for task (for example, high), this means that production data is consumed faster than downstream operators. The transmission direction of Record in your workflow is downstream, such as from source to sink, while back pressure is exactly in the opposite direction, upstream.
To take a simple example, a workflow has only two steps from source to sink. If you see an alarm on the source side, this means that the sink consumption data rate is slower than the producer's production data rate. Sink is back pressure upstream.
The wonderful thing is that BackPressure, which is a thorny problem in Spark Streaming and Storm, is not a problem in Flink. To put it simply, Flink does not need to reverse pressure, because the rate at which the system receives data and the rate at which it is processed are naturally matched. The premise for the system to receive data is that the Task that receives the data must have a free Buffer, and the data will continue to be processed on the premise that the downstream Task also has a free Buffer. Therefore, there is no system that accepts too much data, resulting in more than the processing capacity of the system. This is a bit like a generic blocking queue in Java threads: a slower receiver slows down the sender's sending rate because once the queue is full (bounded queue) the sender will be blocked.
After reading the above, do you have any further understanding of how to analyze the Apache Flink framework? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.