How to analyze Flink source code reading environment to build and debug Flink-Clients module 07/01 Update SLTechnology News&Howtos

How to analyze Flink source code reading environment to build and debug Flink-Clients module

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how to analyze the Flink source code reading environment to build and debug the Flink-Clients module, the content is very detailed, interested friends can refer to, hope to help you.

Outline

First, Flink official documents are so comprehensive, why read the Flink source code

The purpose of reading the document and reading the source code is not the same, take Apache Flink as an example, if you want to know the functions, design ideas, and implementation principles of Flink, it is enough to read the official documents; if you want to know the specific details, such as how StreamGraph is generated or how Exactly Once is implemented, then you need to read the source code.

The key is to see what your purpose is, if you want to understand ideas, experience, etc., look at the document, because the document is written by people; if you want to know the details, you should look at the source code, because the source code is written by people to the machine. There are these things in the source code.

So what is the purpose of writing this article? My goal is, in case you have experienced many battles in production and mastered the principles of Flink, then looking at the source code is the best way for you to advance, so I have prepared this environment-building tutorial for you to save valuable time with your family and children.

Second, Flink source code millions of lines, how to start

There is usually a methodology for reading source code.

1. First of all, there must be prerequisites.

Relevant language and basic technical knowledge. Such as Java,Maven,Git, design patterns and so on. If you only know how to read Flink source code on a whim, it will be unrealistic.

The function of an open source project. You need to know what problems this project is designed to solve, what functions it accomplishes, what features it has, how to start it, and what configuration items it has. Get this project running first, and you can run a simple Demo.

Related documents. That is, what modules are there in a huge project, and what is the approximate function of each module?

Once these prerequisites are ready, you will have a perceptual understanding of the project, and it will be easier to read its code.

In the process of reading the code, you don't mean to read the source code directly from the first module line by line, so it's easy to get lost and get caught up in the details of the code.

2. secondly, we need to pay attention to these key things.

Interface abstract definition. Any project code will have many interfaces, interface inheritance relationships and methods, describing the data structures, business entities, and relationships with other modules that it handles. It is very important to clarify these relationships.

Module bonding layer. Many design patterns in the code are designed to decouple each module. The advantage is flexible expansion, while the disadvantage is that the original tiled and straightforward code is split into modules, which is not so easy to read.

Business process. At the beginning of the code, do not go into the details, on the one hand, it will discourage you, on the other hand, you can't see it. To stand at a certain height, find out what the whole business process is like and how the data is transmitted. It is best to draw a flow chart or sequence diagram for easy understanding and memory.

The concrete realization. In the concrete implementation, there are still some important points that need to be clarified.

(1) Code logic. In the code, there is business logic, which is real business processing logic, and control logic, such as process flow.

(2) error handling. In fact, many places are dealing with the logic of errors, which can be ignored and interference factors can be eliminated.

(3) data processing. Attribute conversion, JSON parsing, XML parsing, these codes are tedious and boring, can be ignored

(4) important algorithms. This is not only the core, but also the most technical place.

(5) low-level interaction. There is some code that interacts with the underlying operating system or JVM, and you need to know something at the bottom.

Run-time debugging. This is the most direct way to see how the code works and what the data looks like. It's the most important way to understand the code.

Sum up into one sentence: from a strategically advantageous position, outline and grasp the direction

All right, with these contents and methods, let's start the actual combat!

Third, install the Git environment

I will not demonstrate specifically, talk about the general process, can be their own Baidu, a lot of related articles.

1. Download Git

Download the Git client for the corresponding platform (Windows,Mac) and install it

Download address: https://git-scm.com/downloads

2. Initially configure $git config-- global user.name "Your Name" $git config-- global user.email yourEmail@example.com3, generate the secret key, and upload it to Gitee ssh-keygen-t rsa

4. How to solve the problem of Github turtle speed

GitHub is very slow how to download dozens of M source files?

If you want to download any GitHub project, you can import the Github project on Gitee:

After the import, you can download it. Of course, Apache Flink activity in the first few projects, Gitee will certainly be synchronized, you can search directly.

Https://gitee.com/apache/flink?_from=gitee_search

Then open Git Bash and clone the project

Git@gitee.com:apache/flink.git

Get all branches

Git fetch-tags

Switch to 1.12.0 branch

Git checkout release-1.12.0

In this way, the latest release of 1.12.0 source code is local.

5. Configure Maven Ali image

Before importing IDEA, we need to configure the image of Maven as Aliyun, so that it is faster to download the Jar package.

In the settings.xml file of the conf directory of the Maven installation directory, add the following configuration to the mirrors tag

Alimaven aliyun maven http://maven.aliyun.com/nexus/content/groups/public/ central VI. Import IDEA

Open IDEA, just open it, and wait for it to download all the dependencies

After import, you can see that there are many modules, but the function of each module is very clear, so I won't introduce it one by one here. Start the Debug Flink-Clients module directly.

Start debugging Flink-Clients

First of all, I would like to emphasize why you want to debug this module. Because this module is the entry module for submitting Flink jobs, the code flow is relatively clear. After debugging, you can know how Flink jobs are submitted.

1. Which object should we debug

Recall, big data's Hello,World program is what, is not the WordCount,Flink distribution comes with the example, there is WordCount program.

In the picture below, I downloaded the Flink-1.12 distribution of the official website and put it on my virtual machine.

How do you get it running?

First, start a local Flink cluster. After unzipping the compressed package, don't do anything, just start it.

Cd / my2/flink/bin./start-cluster.sh

Submit WordCount programs to the cluster

. / flink run.. / examples/streaming/WordCount.jar

In this way, the WordCount program is submitted directly to the cluster. How do you do it? You can see what's in the flink command.

Vi flink

Move to the end, you can find

# Add HADOOP_CLASSPATH to allow the usage of Hadoop file systemsexec $JAVA_RUN $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting [@]}"-classpath "`manglePathList" $CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS "`" org.apache.flink.client.cli.CliFrontend "$@"

It turns out that it is a java-classpath class name that starts a Java virtual machine.

This class is

Org.apache.flink.client.cli.CliFrontend

This class is the object we are going to run.

2. Start debugging

You can see that there is a main method in CliFrontend. If you don't say a word, just debug. If you report a mistake, you can say it again.

Sure enough, the error is as follows:

It is said that the FLINK_CONF_DIR configuration was not found in the environment variables, that is, the flink configuration file was not found, that is, the flink-conf.yml file

This file is actually under the distribution directory:

And then configure a

Add this configuration to this place

FLINK_CONF_DIR=D:\ Code\ flink\ flink\ flink-dist\ src\ main\ resources

Run it again, and the error is as follows

It turns out that when we ran the command, we had a bunch of parameters behind it, and now we didn't pass any parameters to the main method, which of course made a mistake.

Here we also need a WordCount.jar package, the source code is available, directly from the source code package out, is so wayward.

Pack the Flink: Examples: Streaming module directly.

After packing, there will be a WordCount.jar package in the target directory.

Fill in this place.

Run D:\ Code\ flink\ flink\ flink-examples\ flink-examples-streaming\ target\ WordCount.jar

Then take a look at Debug and find that it has been stuck here for a long time until it times out (WARNING don't worry about it)

This is normal because after it finally generates the JobGraph, it is submitted to the cluster through the JobClient client (remember our configuration file? It is equipped with the JobManager address and port of the cluster, but we did not start the cluster locally in Windows.

On how to analyze the Flink source code reading environment to build and debug the Flink-Clients module to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.