Construction of hadoop and Analysis of wordcount instance Operation 07/04 Update SLTechnology News&Howtos

Construction of hadoop and Analysis of wordcount instance Operation

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article focuses on "hadoop building and wordcount instance operation analysis". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "hadoop Building and wordcount instance running Analysis".

Premise preparation: since hadoop is based on linux, it is necessary to simulate the linux environment under windows. Several software are recommended: cygwin, hadoop4win, HDP and vmware.

Cygwin:Cygwin is a UNIX-like simulation environment running on the windows platform. Cygwin provides a UNIX simulation DLL and a variety of software packages built on it that can be found in the Linux system, and provides good support in versions above Windows XP SP3. To put it simply, the command line under unix is the same as windows's cmd!

Hadoop4win: is an integration package that includes cygwin, hadoop, jdk, and hbase. These are all needed by hadoop, and hadoop4win is included on the installation. Just run hadoop directly. This is still adding packages, but one drawback is that the built-in hadoop version is 0.2.0, a bit low, and the latest is 2.6.0.

HDP:Hortonworks Data Platform (HDP) is designed, developed and built entirely in an open source environment, providing a data platform available to enterprises and enabling organizations to adopt modern data architectures.

With YARN as its architectural center, HDP is a multi-workload data processing platform for a range of processing methods (from batch to interactive to real-time), with the key capabilities required by the enterprise data platform-extensive regulation, security, and operations.

This is an integrated tool that can be opened under vmware. This is a virtual machine that can be opened directly without installation. I guess it's a trend. In my research, we can discuss it together.

At the beginning of the text: I used three methods, the first failure, the second and third success, spent a lot of time, really benefited a lot.

1: now install cygwin under windows, which is troublesome to install. It took me 3 days, but finally I didn't succeed. This is a very important step. Although it was not successful, I learned a lot, which is very helpful for other installations. * download cygwin to the latest one on the official website, otherwise you will make an error if you are prompted to select an image in the installation step, or enter http://www.cygwin.com/setup-x86.exe directly in the address bar which is 32-bit. If 64-bit changes x86 to x86 / 64, it will be fine.

During the installation process, you will be prompted to install the package, there are two to choose openssh and openssl these two in the net directory, or just search above. These two must be chosen. After the installation, it is necessary to configure and run cygwin to enter ssh-host-config. I will not say the following, but I will search a lot on the Internet. The error / var pression denied will be prompted in the middle, which will pay the permission to this file. Just enter the following code: chmod 777 / var and chown: Users / var to try these two several times on it, mine is also sometimes not allowed to count. 777 stands for the highest authority, there are other numbers on the Internet, this is fine.

The most important step: I am stuck here, open the ssh service, use the command net start sshd, and then set the private key to the public command: ssh-keygen this step will be prompted to directly click enter, and then enter cd ~ / .ssh, cp id_rsa.pub authorized_keys.

Finally, verify that the net start sshd input command ssh localhost is not prompted to indicate that it is correct. If you prompt Connection closed by:: 1, you are wrong.

. I can't walk at this point, because running hadoop projects is to be connected by ssh, otherwise I can't run. There are a lot of cygwin configurations on the Internet, you can try, and then there is my successful configuration.

Note: it may have something to do with the system. I changed a win7 and installed it. I originally asked win8.1 that it could not be installed, but I didn't study it. You can try changing the system.

2: download the hadoop4win address http://sourceforge.net/projects/hadoop4win/files/0.1.4/hadoop4win-setup-net_0.1.4.zip/download and install it directly. Run hadoop4win. First see that there is an opt/hadoop/bin in the installation directory and enter this folder. (note: you must run as an administrator during installation, otherwise there will be an error of incomplete installation and lack of shortcuts.) enter the command ls below:

Select the hadoop-daemon.sh script command inside: hadoop-daemon.sh start namenode

Use the jps command to view the process

Show namenode this process shows that the success, a total of 5 processes to start, namenode, datanode, secondarynamenode, jobtracker, tasktracker. The order of these five is the above order. The specific functions of these five can be google, as shown in the following figure.

Next, open the browser input: localhost:50030 and localhost:50070 to see if the successful display is as follows:

Show that these two show that you installed successfully, the following run an example of wordcount, hadoop4win comes with a jar package, there are two ways a command line, one is eclipse, novice recommended command line, so you can understand the process, familiar with the eclipse.

First create a txt file to enter whatever you want. Mine are as follows.

Upload this file to the HDFS file system and now enter the local directory under the hadoop4win command cd d: then use the command: hadoop fs-put hello.tex /

Next, open localhost:50070 to view the file, click browser the filesystem to see if there is a hello.txt.

In this way, the upload is successful. Next, run the jar package. The command is shown below.

To run the hadoop-0.20.2-examples.jar package, the command is as follows: hadoop jar hadoop-0.20.2-examples.jar wordcount hello.txt / sum.txt, where sum.txt is a self-defined file, and others. The figure is as follows:

Then open localhost:50070, and sum.txt will appear. Check the contents, and click to view the contents:

The above writing is very cheap and abbreviated, and the ability is limited. Many of the commands under linux are not said. For those who do not understand, you can take a look at this video. Http://www.ppvke.com/10354.html is very detailed.

The above is done, it may take some time to check out many of them, think more, do more, google more! Here is the third one under study, share it all at once:

First install the virtual machine, not to mention vmware, open the file imported and downloaded by the virtual machine after installation, address: http://zh.hortonworks.com/hdp/downloads/

Just open it directly. After opening it, an address will appear in the form of 192.168.xxx.xxx.

At this point, I believe you have a deeper understanding of "hadoop building and wordcount instance operation analysis". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.