Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Getting started tutorial | build the first Flink application from scratch in 5 minutes

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

This article is reproduced from Jark's Blog, author Wu Jie (Yunxie), Apache Flink Committer, Alibaba senior development engineer.

This article will describe how to quickly build the first Maven application from the aspects of preparation of development environment, creation of Flink project, writing Flink program, running program and so on.

In this article, we will show you how to build your first Flink application from scratch.

Development environment preparation

Flink can run on Linux, Max OS X, or Windows. In order to develop Flink applications, you need a Java 8.x and maven environment on the local machine.

If you have a Java 8 environment, running the following command outputs the following version information:

$java-versionjava version "1.8.0y65" Java (TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot (TM) 64-Bit Server VM (build 25.65-b01, mixed mode) if there is a maven environment, running the following command will output the following version information: $mvn-versionApache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe 2018-06-18T02:33:14+08:00) Maven home: / Users/wuchong/dev/mavenJava version: 1.8.0: 65, vendor: Oracle Corporation, runtime: / Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/jreDefault locale: zh_CN, platform encoding: UTF-8OS name: "mac os x", version: "10.13.6", arch: "x86: 64" Family: "mac" in addition, we recommend using ItelliJ IDEA (community free version is enough) as the development IDE of Flink applications. Eclipse is OK, but Eclipse has known problems with a mix of Scala and Java projects, so Eclipse is not recommended. In the next section, we will show you how to create a Flink project and import it into ItelliJ IDEA. Create a Maven project

We will use Flink Maven Archetype to create our project structure and some initial default dependencies. In your working directory, run the following command to create the project:

Mvn archetype:generate\-DarchetypeGroupId=org.apache.flink\-DarchetypeArtifactId=flink-quickstart-java\-DarchetypeVersion=1.6.1\-DgroupId=my-flink-project\-DartifactId=my-flink-project\-Dversion=0.1\-Dpackage=myflink\-DinteractiveMode=false

You can edit the groupId, artifactId, package above to make the path you like. Using the above parameters, Maven will automatically create a project structure for you as follows:

$tree my-flink-projectmy-flink-project ├── pom.xml └── src └── main ├── java │ └── myflink │ ├── BatchJob.java │ └── StreamingJob.java └── resources └── log4j.properties

Our pom.xml file already contains the required Flink dependencies, and there are several sample program frameworks under src/main/java. Next we will begin to write the first Flink program.

Write Flink programs

Start IntelliJ IDEA, select "Import Project" (Import Project), and select pom.xml under the root directory of my-flink-project. Complete the project import according to the guide.

Create a SocketWindowWordCount.java file under src/main/java/myflink:

Package myflink;public class SocketWindowWordCount {public static void main (String [] args) throws Exception {}}

Now this program is still very basic, we will fill in the code step by step. Note that we will not write out the import statements below, because IDE will automatically add them. I will show the complete code at the end of this section. If you want to skip the following steps, you can paste the final complete code directly into the editor.

The first step in the Flink program is to create a StreamExecutionEnvironment. This is an entry class that can be used to set parameters and create data sources and submit tasks. So let's add it to the main function:

StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment ()

Next we will create a data source that reads data from the socket of the local port number 9000:

DataStream text = env.socketTextStream ("localhost", 9000, "\ n")

This creates a DataStream of type string. DataStream is the core API for streaming in Flink, which defines many common operations (such as filtering, transformation, aggregation, window, association, etc.). In this example, we are interested in the number of times each word appears in a particular time window, such as a five-second window. To do this, we first parse the string data into words and times (represented by Tuple2). The first field is the word, and the second field is the number, and the initial value of the number is set to 1. We implemented a flatmap because there may be multiple words in a row of data.

DataStream wordCounts = text .flatMap (new FlatMapFunction () {@ Override public void flatMap (String value, Collector out) {for (String word: value.split ("\\ s")) {out.collect (Tuple2.of (word, 1));})

Then we group the data flow according to the word field (that is, the index field 0). Here we can simply use the keyBy (int index) method to get a Tuple2 data stream with the word as key. Then we can specify the desired window on the stream and calculate the result based on the data in the window. In our example, we want to aggregate the number of words every 5 seconds, with each window counting from scratch.

DataStream windowCounts = wordCounts .keyby (0) .timeWindow (Time.seconds (5)) .sum (1)

The .timeWindow () of the second call specifies the tumbling window (Tumble) that we want for 5 seconds. The third call specifies the sum aggregate function for each key, each window, which in our example is added by the number of fields (that is, the No. 1 index field). The resulting data stream will output the number of occurrences of each word every 5 seconds.

The last thing to do is to print the data flow to the console and start executing:

WindowCounts.print () .setParallelism (1); env.execute ("Socket Window WordCount")

The final env.execute call is required to start the actual Flink job. All operator operations (such as creating sources, aggregating, printing) are just graphs that build internal operator operations. Only when execute () is called will it be executed on the commit to the cluster or on the local computer.

Here is the complete code, some of which has been simplified (the code is also accessible on GitHub):

Package myflink;import org.apache.flink.api.common.functions.FlatMapFunction;import org.apache.flink.api.java.tuple.Tuple2;import org.apache.flink.streaming.api.datastream.DataStream;import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;import org.apache.flink.streaming.api.windowing.time.Time;import org.apache.flink.util.Collector;public class SocketWindowWordCount {public static void main (String [] args) throws Exception {/ / create execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment () / / obtain input data by connecting to socket, which is connected to local port 9000. If port 9000 is already occupied, please change to another port DataStream text = env.socketTextStream ("localhost", 9000, "\ n"). / / parsing data, grouping by word, windowing, aggregating DataStream windowCounts = text .flatMap (new FlatMapFunction () {@ Override public void flatMap (String value, Collector out) {for (String word: value.split ("\\ s")) {out.collect (Tuple2.of (word, 1)) }) .keyby (0) .timeWindow (Time.seconds (5)) .sum (1); / / print the results to the console, note that single-threaded printing is used here instead of multithreaded windowCounts.print (). SetParallelism (1); env.execute ("Socket Window WordCount");}} run the program

To run the sample program, first start netcat on the terminal to get the input stream:

Nc-lk 9000

If it is a Windows platform, you can install ncat through https://nmap.org/ncat/ and run:

Ncat-lk 9000

Then run the main method of SocketWindowWordCount directly.

You only need to enter words in the netcat console, and you can see the word frequency statistics of each word in the SocketWindowWordCount output console. If you want to see a count greater than 1, type the same word repeatedly in 5 seconds.

Cheers!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 204

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report