In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Build the environment 1. Build the Hadoop running environment 1.1. Install the virtual machine.
(1) download and install VMware virtual machine software.
(2) create a virtual machine, and the configuration of the virtual machine in the experimental environment is shown in the following figure.
(3) install the Ubuntu system, and the installation result is shown below.
1.2 configure the JDK environment
Download and install JDK. After installation, you need to configure the java environment. The successful configuration results are as follows.
2. Hadoop installation and deployment
(1) create a Hadoop installation folder and cut to this path.
(2) download the Hadoop installation file from hadoop.apache.org and copy the file to the folder where Hadoop is installed. The download address is shown in the figure below, and the latest version, 2.7.3, is selected for this experiment.
(3) decompress the Hadoop file.
(4) modify the configuration file
Let's write the configuration files core-site.xml, hdfs-site.xml, and hadoop-env.sh. These three files are all under ~ / hadoop/hadoop-2.7.3/etc/hadoop/. Write the following in the sum of the first two files.
The first file, core-site.xml
Please note that the / home/ealon/hadoop/tmp folder to be replaced with the tmp folder in the computer's current user directory is not created.
The second file, hdfs-site.xml
The third file, hadoop-env.sh, finds the following line and writes it.
Next, write the Hadoop environment variable gedit / etc/environment in the system environment variable
# add at the end of the file ""
: / home/ealon/hadoop/hadoop-2.7.3/bin
: / home/ealon/hadoop/hadoop-2.7.3/sbin
(5) restart the virtual machine
First, restart the system, and after booting, enter the following command hadoop version to verify whether the Hadoop environment is installed successfully. When you see the version number of Hadoop displayed on the screen, it means that the stand-alone mode has been configured. This is shown in the following figure.
(6) start HDFS
Start the HDFS service by using pseudo-distribution mode. First, the HDFS needs to be formatted.
The following content is displayed for successful formatting.
(7) start HDFS
(8) View the process
Run the jps command. If you see the following on the screen, HDFS has been successful.
(9) stop HDFS
Through the above operations, the Hadoop environment is basically built.
3. Hadoop-Eclipse-Plugin installation and configuration 3.1 plug-in installation
To compile and run MapReduce programs on Eclipse, you need to install hadoop-eclipse-plugin and download hadoop2x-eclipse-plugin on Github.
After downloading, copy the hadoop-eclipse-kepler-plugin-2.6.0.jar from release to the plugins folder of the Eclipse installation directory, and run eclipse- clean to restart Eclipse.
Unzip-qo ~ / download / hadoop2x-eclipse-plugin-master.zip-d ~ / download # extract to ~ / download
Sudocp ~ / download / hadoop2x-eclipse-plugin-master/release/hadoop-eclipse-plugin-2.6.0.jar/usr/lib/eclipse/plugins/ # copy to the plugins directory of the eclipse installation directory
/ usr/lib/eclipse/eclipse-clean # after adding the plug-in, you need to use this way to make the plug-in effective
3.2 plug-in configuration
Make sure that Hadoop is turned on before continuing with the configuration.
After starting Eclipse, you can see DFS Locations in the Project Explorer on the left (if you see the welcome interface, click the x in the upper left corner to close it.
The plug-in requires further configuration
Step 1: select Preference under the Window menu.
Open Preference. Open Preference.
A form will pop up and there will be more Hadoop Map/Reduce options on the left side of the form. Click this option and select the installation directory of Hadoop (for example, if / usr/local/hadoop,Ubuntu is not good to choose a directory, just enter it).
Select the installation directory for Hadoop select the installation directory for Hadoop
Step 2: switch the Map/Reduce development view, select Open Perspective- > Other under the Window menu (CentOS is Window-> Perspective- > Open Perspective- > Other), pop up a form, and select the Map/Reduce option to switch.
Switch Map/Reduce development view switch Map/Reduce development view
Step 3: establish a connection to the Hadoop cluster, click on the Map/Reduce Locations panel in the lower right corner of the Eclipse software, right-click in the panel, and select New Hadoop Location.
Establish a connection to the Hadoop cluster establish a connection to the Hadoop cluster
In the pop-up General options panel, the settings of General should be consistent with the configuration of Hadoop. Generally speaking, the two Host values are the same. If it is pseudo-distributed, just enter localhost. In addition, if I use the Hadoop pseudo-distributed configuration and set fs.defaultFS to hdfs://localhost:9000, then the Port of DFS Master should be changed to 9000. The Port of Map/Reduce (V2) Master can be used by default, and Location Name can be filled in at will.
The final settings are shown in the following figure:
Settings for HadoopLocation settings for HadoopLocation
The Advancedparameters option panel configures the Hadoop parameter, which actually fills in the configuration item of Hadoop (the configuration file in / usr/local/hadoop/etc/hadoop). If I configure hadoop.tmp.dir, I have to modify it accordingly. But it can be tedious to modify, and we can solve it by copying the configuration file (as we'll see below).
In short, all we have to do is configure General and click finish,Map/Reduce Location to create it.
3.3 manipulate files in HDFS in Eclipse
Configuration, click on the left Project Explorer in the MapReduce Location (click on the triangle to expand) can directly view the list of files in HDFS (HDFS to have files, such as the following figure is the output of WordCount), double-click to view the contents, right-click can upload, download, delete files in HDFS, no longer through the tedious hdfs dfs-ls and other commands to operate.
Use Eclipse to view the contents of files in HDFS
If you cannot view it, right-click Location to try Reconnect or restart Eclipse.
3.4 create a MapReduce project in Eclipse
Click the File menu and select New-> Project... :
Create Project
Select Map/ReduceProject and click Next.
Create a MapReduce project
Enter Projectname as WordCount and click Finish to create the project.
Fill in the project name
At this point, in the Project Explorer on the left, you can see the project you just created.
Project creation completed
Then right-click the WordCount project you just created and select New-> Class
Create a new Class
There are two places you need to fill in: org.apache.hadoop.examples; at Package and WordCount at Name.
Fill in the Class information
After you have created the Class, you can see the file WordCount.java in the src of Project. Copy the code from the following WordCount to this file.
Package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;importorg.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat Importorg.apache.hadoop.util.GenericOptionsParser;public class WordCount {public static class TokenizerMapper extends Mapper {private final static IntWritable one = new IntWritable (1); private Text word = new Text (); public void map (Object key, Text value, Context context) throws IOException,InterruptedException {StringTokenizer itr = new StringTokenizer (value.toString ()); while (itr.hasMoreTokens ()) {word.set (itr.nextToken ()); context.write (word, one) } public static class IntSumReducer extends Reducer {private IntWritable result = new IntWritable (); public void reduce (Text key, Iterable values, Context context) throws IOException,InterruptedException {int sum = 0; for (IntWritableval: values) {sum + = val.get ();} result.set (sum); context.write (key, result) } public static void main (String [] args) throws Exception {Configuration conf = new Configuration (); String [] otherArgs = new GenericOptionsParser (conf,args). GetRemainingArgs (); if (otherArgs.length! = 2) {System.err.println ("Usage: wordcount"); System.exit (2);} Job job = new Job (conf, "wordcount"); job.setJarByClass (WordCount.class); job.setMapperClass (TokenizerMapper.class); job.setCombinerClass (IntSumReducer.class) Job.setReducerClass (IntSumReducer.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (IntWritable.class); FileInputFormat.addInputPath (job,new Path (otherArgs [0])); FileOutputFormat.setOutputPath (job,new Path (otherArgs [1])); System.exit (job.waitForCompletion (true)? 0: 1);} 3.5 run MapReduce through Eclipse
Before running the MapReduce program, you need to perform one more important action (that is, to solve the parameter setting problem by copying configuration files mentioned above): copy the modified configuration files in / usr/local/hadoop/etc/hadoop (for example, pseudo-distribution requires core-site.xml and hdfs-site.xml), and log4j.properties to the src folder (~ / workspace/WordCount/src) under the WordCount project:
Cp/usr/local/hadoop/etc/hadoop/core-site.xml ~ / workspace/WordCount/src
Cp/usr/local/hadoop/etc/hadoop/hdfs-site.xml ~ / workspace/WordCount/src
Cp/usr/local/hadoop/etc/hadoop/log4j.properties ~ / workspace/WordCount/src
The program will not run correctly without copying these files, and this tutorial ends by explaining why you need to copy these files.
After the copy is completed, be sure to right-click WordCount and select refresh to refresh (it will not be refreshed automatically, it needs to be refreshed manually). You can see that the file structure is as follows:
WordCount project file structure
Click the Run icon in the toolbar, or right-click WordCount.java in Project Explorer and select Run As-> Run onHadoop to run the MapReduce program. However, since no parameters are specified, the runtime prompts "Usage: wordcount", which needs to be set through Eclipse.
Right-click the WordCount.java you just created and select Run As-> RunConfigurations, where you can set the relevant parameters at run time (if there is no WordCount under the Java Application, you need to double-click Java Application first). Switch to the "Arguments" column and fill in "input output" at Program arguments.
Set program running parameters
Or you can set the input parameters directly in the code. You can change the String [] otherArgs= new GenericOptionsParser (conf, args) .getRemainingArgs () of the code main () function to:
/ / String [] otherArgs = new GenericOptionsParser (conf, args) .getRemainingArgs ()
String [] otherArgs=new String [] {"input", "output"}; / * directly set input parameters * /
After setting the parameters, run the program again, you can see the prompt to run successfully, and you can also see the output output folder after refreshing DFS Location.
Results of WordCount operation
At this point, you can use Eclipse to easily develop MapReduce programs.
2. Content of the experiment
Download userdata.csv Weibo user data through the network, a total of 14388385 pieces of user data, including: user id, province, gender, whether or not authentication information. The data screenshot is shown in the following figure.
The project code structure is shown in the following figure:
1. Gender distribution of users
Maper function:
Reducer function:
The distribution of the calculation results is shown in the following figure:
It can be observed from the picture that a large proportion of Weibo users are female, with more male users.
2. Distribution of users in provinces
This part of the Maper function and Reducer function are the same as the gender statistics algorithm, so we will not repeat them. The calculated results are as follows:
As can be seen from the picture, Guangdong, Beijing, Shanghai, Jiangsu, Zhejiang and other places have a large number of Weibo users.
3. Distribution of users' real name verification
This part of the Maper function and Reducer function are the same as the gender statistics algorithm, so we will not repeat them. The calculated results are as follows:
As can be seen from the picture, non-real-name users account for the vast majority of Weibo users.
4. Distribution of male to female ratio in major provinces and cities
This part of the Maper function, Reducer function and gender statistics algorithm are basically the same, do not repeat, the calculation results are shown in the following figure:
The number of female users in all provinces and cities accounts for more than half of the total number of users in their provinces and cities.
III. Summary
By browsing the console and Web management side output results, the algorithm execution process did not see obvious anomalies or errors. The output from the console is shown in the following figure:
The statistical results of the Web end panel are shown below:
The statistics of task execution results are shown in the following figure:
Combined with the actual needs of the water transportation industry, intelligent port construction as an important way of port transformation and upgrading in China, the key technologies involved include port data analysis and processing. Hadoop technology has been widely used in the Internet industry, but it has not played a key and core role in the construction of port automation and intelligence. Therefore, the in-depth application of big data analysis and mining technology in the port field is an advanced stage of port development. As far as China's ports are concerned, it has become an important way for our ports to improve their international competitiveness and complete transformation and upgrading by building intelligent ports, optimizing and upgrading port infrastructure and management models, and realizing port function innovation, technological innovation and service innovation. Through the research on the application of big data technology in intelligent ports, it is of great significance to show that the massive data accumulated by port informatization in our country give full play to its great advantages and provide decision-making support for our port management departments and port enterprises.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.