In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "the building and running mechanism of Flink cluster in java". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the building and running mechanism of Flink cluster in java".
1. Overview of Flink 1. Basic introduction
Flink is a framework and distributed processing engine for stateful computing of unbounded and bounded data flows. Flink is designed to run in all common cluster environments, performing calculations at memory execution speed and on any scale. The main features include batch flow integration, precise state management, event time support and accurate one-time state consistency guarantee. Flink can not only run on a variety of resource management frameworks, including YARN, Mesos, Kubernetes, but also support independent deployment on bare metal clusters. It does not have a single point of failure when the highly available option is enabled.
Two concepts are explained here:
Boundary: borderless and bounded data flows, which can be understood as policies or conditions for data aggregation
Status: that is, whether there is a dependency in the execution order, that is, whether the next execution depends on the last result
2. Application scenarios
Data Driven
Event-driven applications do not need to query remote databases, and local data access makes it have higher throughput and lower latency. In the case of anti-fraud, DataDriven writes the rule model to DatastreamAPI, and then abstracts the whole logic to Flink engine. When events or data inflow, the corresponding rule model will be triggered. Once the conditions in the rules are triggered, DataDriven will quickly process and notify the business application.
Data Analytics
Compared with batch analysis, flow analysis saves the periodic data import and query process, so the delay of obtaining indicators from events is lower. Not only that, batch query must deal with the artificial data boundary caused by regular import and input boundedness, but flow query does not need to consider this problem. Flink provides good support for continuous flow analysis and batch analysis, real-time processing of analysis data, real-time large screen and real-time report in many scenarios.
Data Pipeline
Compared with periodic ETL job tasks, continuous data pipeline can significantly reduce the delay of moving data to the destination, such as real-time cleaning or expanding data based on upstream StreamETL, and building real-time data warehouses downstream to ensure the timeliness of data query and form highly time-effective data query links. This scenario is very common in media stream recommendations or search engines.
2. Environment deployment 1. Installation package management [root@hop01 opt] # tar-zxvf flink-1.7.0-bin-hadoop27-scala_ 2.11.tgz [root @ hop02 opt] # mv flink-1.7.0 flink1.72, cluster configuration
Management node
[root@hop01 opt] # cd / opt/flink1.7/conf [root@hop01 conf] # vim flink-conf.yamljobmanager.rpc.address: hop01
Distribution node
[root@hop01 conf] # vim slaveshop02hop03
The two configurations are synchronized under all cluster nodes.
3. Start and stop / opt/flink1.7/bin/start-cluster.sh/opt/flink1.7/bin/stop-cluster.sh
Startup log:
[root@hop01 conf] # / opt/flink1.7/bin/start-cluster.shStarting cluster.Starting standalonesession daemon on host hop01.Starting taskexecutor daemon on host hop02.Starting taskexecutor daemon on host hop03.4, Web interface
Visit: http://hop01:8081/
Third, the development entry case 1, the data script
Distribute a data script to each node:
/ var/flink/test/word.txt2, introduce basic dependencies
This is based on the basic case written by Java.
Org.apache.flink flink-java 1.7.0 org.apache.flink flink-streaming-java_2.11 1.7.03, read file data
Here directly read the data in the file, through the program flow analysis of the number of times each word appears.
Public class WordCount {public static void main (String [] args) throws Exception {/ / read file data readFile ();} public static void readFile () throws Exception {/ / 1, execution environment creation ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment (); / / 2, read data file String filePath = "/ var/flink/test/word.txt"; DataSet inputFile = environment.readTextFile (filePath) / / 3. Group and sum DataSet wordDataSet = inputFile.flatMap (new WordFlatMapFunction ()) .groupBy (0) .sum (1); / / 4, print processing result wordDataSet.print () } / / static class WordFlatMapFunction implements FlatMapFunction {@ Override public void flatMap (String input, Collector collector) {String [] wordArr = input.split (","); for (String word: wordArr) {collector.collect (new Tuple2 (word, 1));}
4. Read port data
Create a port on the hop01 service and simulate some data to be sent to that port:
[root@hop01 ~] # nc-lk 5566cm2
Read and analyze the data contents of the port through the Flink program:
Public class WordCount {public static void main (String [] args) throws Exception {/ / read port data readPort ();} public static void readPort () throws Exception {/ / 1, execution environment creation StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment (); / / 2, read Socket data port DataStreamSource inputStream = environment.socketTextStream ("hop01", 5566) / / 3. SingleOutputStreamOperator resultDataStream = inputStream.flatMap (new FlatMapFunction () {@ Override public void flatMap (String input, Collector collector) {String [] wordArr = input.split (","); for (String word: wordArr) {collector.collect (new Tuple2 (word, 1)) ) .keyby (0) .sum (1); / / 4. Print the analysis result resultDataStream.print (); / / 5, the environment starts environment.execute ();}} IV. Operation mechanism
FlinkClient
The client is used to prepare and send data flow to the JobManager node, and then according to the specific needs, the client can directly disconnect, or maintain the connection state and wait for the result of the task processing.
JobManager
In the Flink cluster, one JobManger node and at least one TaskManager node are started. After the JobManager receives the task submitted by the client, the JobManager sends the task coordination to the specific TaskManager node for execution, and the TaskManager node sends the heartbeat and processing information to JobManager.
TaskManager
Task slot (slot) is the smallest resource scheduling unit in TaskManager. The number of slots is set at startup. Each slot can start a Task, receive the tasks deployed by JobManager nodes, and carry out specific analysis and processing.
Fifth, source code address GitHub address https://github.com/cicadasmile/big-data-parentGitEE address https://gitee.com/cicadasmile/big-data-parent thank you for reading, the above is the content of "Building and running Mechanism of Flink Cluster in java". After the study of this article, I believe you have a deeper understanding of the building and running mechanism of Flink cluster in java, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.