In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "how to configure Apache Beam Java SDK". In daily operation, I believe many people have doubts about how to configure Apache Beam Java SDK. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "how to configure Apache Beam Java SDK". Next, please follow the editor to study!
Set up the development environment
Download and install Java Development Kit (JDK) 1.7 or later. Verify that the JAVA_HOME environment variable is set and point to the JDK installation directory.
Download and install Apache Maven according to the Maven installation guide for the specified operating system.
Get the WordCount code
The easiest way to get a copy of WordCount pipeline is to generate a simple Maven project containing a WordCount example of Beam and build for the latest Beam version using the following command:
one
two
three
four
five
six
seven
eight
nine
ten
$mvn archetype:generate\
-DarchetypeRepository= https://repository.apache.org/content/groups/snapshots\
-DarchetypeGroupId=org.apache.beam\
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples\
-DarchetypeVersion=LATEST\
-DgroupId=org.example\
-DartifactId=word-count-beam\
-Dversion= "0.1"\
-Dpackage=org.apache.beam.examples\
-DinteractiveMode=false
Maven creates a directory word-count-beam that contains a simple pom.xml and a series of sample pipelines for counting words in a text file.
one
two
three
four
five
six
seven
eight
$cd word-count-beam/
$ls
Pom.xml src
$ls src/main/java/org/apache/beam/examples/
DebuggingWordCount.java WindowedWordCount.java common
MinimalWordCount.java WordCount.java
For a detailed description of the Beam concepts used in these examples, see the WordCount sample walkthrough. Here, we only focus on implementing WordCount.java.
Run WordCount
A single Beam pipeline can run on Beam runners, including ApexRunner, FlinkRunner, SparkRunner and DataflowRunner. DirectRunner is a commonly used getting started guide because it runs locally and does not require special settings.
After selecting the runner to use:
Ensure that any runner-specific settings have been completed.
Build the command line:
Specify a specific runner using-runner = (default is DirectRunner)
Add options required for runner to run
Select the input file and output location that runner can access. (for example, if you are running pipeline on an external cluster, you cannot access local files. )
Run your first WordCount pipeline.
Take Spark as an example (for other examples, please see the official website documentation):
one
two
$mvn compile exec:java-Dexec.mainClass=org.apache.beam.examples.WordCount\
-Dexec.args= "--runner=SparkRunner-- inputFile=pom.xml-- output=counts"-Pspark-runner
Examination result
Once the pipeline is complete, you can view the output. You will notice that there may be multiple output files prefixed with count. The exact number of these files is determined by the running program, giving it the flexibility to perform efficient distributed execution.
one
$ls counts*
When you look at the contents of the file, you will see that they contain unique words and the number of occurrences of each word. The order of the elements in the file may be different because the beam model usually does not guarantee sorting to once again allow runner to optimize efficiency.
one
two
three
four
five
six
seven
eight
nine
ten
$more counts*
Beam: 27
SF: 1
Fat: 1
Job: 1
Limitations: 1
Require: 1
Of: 11
Profile: 10
...
At this point, the study on "how to configure Apache Beam Java SDK" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.