Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the entry-level knowledge points of Flink

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the basic knowledge points of Flink". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the basic knowledge points of Flink".

What is Flink?

On the official website of Flink, you can set the official document language to Chinese, so we can see that the official introduction is as follows:

Based on a sentence on the official website, we can think of a lot of things.

This article can give you a brief understanding of some of the basic concepts of Flink, and when you really use it, you can use this article to get an introduction to Flink. Now that Storm has been abandoned by many people, what are the advantages of Flink over Storm? Next, let's take a look at Flink.

What is borderless and borderless?

Apache Flink is a framework and distributed processing engine for stateful computing on borderless and bounded data streams.

In fact, there are official introductions, but it is not easy for beginners to understand. Let me take a look at the nursery.

You've learned about Flink. Message queuing must have been used, right? So how do you use message queues? Producer production data, send to Broker,Consumer consumption, done.

When spending, do we need to care about when Producer sends messages? You don't need it. Anyway, if I have one, I'll deal with one. Is there anything wrong with it?

This kind of message without any processing is borderless by default.

It is easy to understand that there is a boundary: if a condition is added on the basis of no boundary, there is a boundary. What conditions should be added? For example, I want to add time: I want to consume data from August 8 to August 9, that is, there is a boundary.

What is statefulness?

Apache Flink is a framework and distributed processing engine for stateful computing on borderless and bounded data streams.

What is stateless and what is stateless?

Stateless we can simply think that each execution does not depend on the results of the last or N times, and each execution is independent.

Stateful we can simply think that execution depends on the results of the last or N times of execution, and that one execution depends on the processing results of previous events.

We can simply think that Flink itself provides us with the function of "storage", and we can rely on the "storage" of Flink for each execution, so it is stateful.

Take the figure above as an example: the Source data stream has the following numbers: 21, 13, 8, 5, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Now that you're done with 2Jing 1, so the cumulative value is 4, and now Flink has stored the accumulated state 4 (it is thought that the previous numbers of 2 and 1 have been completely processed).

The program went all the way down, dealing with 5Power3, and now the cumulative value is 12, but now Flink has no time to store 12 to the final media, when the system dies.

After Flink restarts, it will restore the system to a cumulative value of 4, so 5Jet 3 will have to continue the calculation and the program will continue to go down.

Read the article some students may think: accurate one-time does not mean that a certain piece of code will only be executed once, will not be executed many times or will not be executed. Didn't you double-count the numbers 5 and 3? How can it be accurate once?

Obviously, it must be impossible to execute the code only once. We can't control which line of code the system dies in. If you hang up and the current method is not finished, you still have to re-execute the method.

Therefore, the state is persisted only once to the final storage medium (local database / HDFS), which is called exactly once under Flink (calculated data may be duplicated (unavoidable), but the state is stored only once on the storage media).

So how often is Flink stored? We manually configured this by ourselves.

We don't commit the offset until we have finished the business rules, and the checkponit is the same (the real checkponit will not be carried out until all the processes of the data are pulled down).

Here comes the question again, so how does checkpoint know that the data pulled down is over? Flink inserts barrier in the stream processing, and every step is reported to barrier. When sink reports to barrier, it means that checkpoint is over this time.

Thank you for your reading, the above is the content of "what are the basic knowledge points of Flink". After the study of this article, I believe you have a deeper understanding of what the entry knowledge points of Flink have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report