How to deploy hadoop-0.20.1 04/23 Update SLTechnology News&Howtos

How to deploy hadoop-0.20.1

2025-04-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how to deploy hadoop-0.20.1. It is very detailed and has a certain reference value. Friends who are interested must finish it!

Hadoop-0.20.1 deployment

Because you need to analyze a large number of access logs, the existing analysis methods (stand-alone) can not meet the requirements, so you need to deploy hadoop to solve this problem. Before I did the distributed file system, I deployed and tested hadoop, using a version of hadoop-0.19.0, and I remember that the configuration was very successful at that time. This test, I used the * * version of hadoop-0.20.1, in the deployment process, it took a lot of time to do successfully (2 days). As a memo, it is specially recorded.

Hadoop-0.20.1 compared with the old version, some files have changed, the main change is the conf directory

For the existing hadoop articles on the Internet, the file that needs to be modified is hadoop-site.xml, but the hadoop-0.20.1 version does not have this file and is replaced by core-site.xml.

First, deploy hadoop

Hadoop deployment is divided into two steps: name node (namenode) password-free access to each data node (datanode) and configuration of hadoop. My experimental environment is a name node and two data nodes. Unfortunately, one of the data nodes is broken. So it can only be tested with one data node. In this test, namenode's ip:192.168.199.135,datanode 's ip:192.168.199.136.

(1) name node (namenode) access to each data node without password (datanode)

1. The name node and the data node each create a user hadoop, using the same password.

2, log in to the node with the name of hadoop and then enter all the way to generate the file .ssh / id_rsa.pub, copy the file to the current location, name it authorized_keys;, and then execute the command ssh227.0.0.1, if you do not need a password to log in directly, then meet the requirements; otherwise, you need to check the permissions of authorized_keys to see if it is 644 (- rw-r-r-). Next, also hadoop users log in to the data node server, create a .ssh directory, and give 600th permissions (chmod600.ssh); then copy the authorized_keys on the name server to the directory. / ssh, notice that the permissions and directory structure are consistent with the name node, and then log in to the data node with ssh from the name node, if you do not need a password to log in successfully, then the configuration of ssh ends. Let's take a look at the configuration of Hadoop in the hadoop-0.20.1 deployment.

(2) hadoop configuration

1. Download jdk and set it up. My JAVA_HOME=/usr/local/jdk1.6.0_06

2. Download hadoop, unpack it, and copy it to / usr/local/hadoop. That is HADOOP_HOME=/usr/local/hadoop

3. The separation of the data storage directory, that is, the actual storage of the data is not in HADOOP_HOME (more than n articles on the Internet are in the hadoop installation directory). I use 2 1TB hard drives separately to store the actual data blocks, which are called / disk2,/disk3. Format the two hard drives and create a file system, and then connect to the two directories. The relevant steps are omitted.

4. Set up the owner of the directory and file. Chown-Rhadoop:hadoop/disk2/disk3/usr/local/hadoop

5. Add the following to the file / usr/local/hadoop/conf/mapred-site.xml:

Mapred.job.tracker hadoopm:9001 ThehostandportthattheMapReducejobtrackerrunsat.If "local", thenjobsarerunin-processasasinglemapand reducetask.

6. Modify the file / usr/local/hadoop/conf/slaves and / usr/local/hadoop/conf/masters to add the hostname of the data node to slaves and the name node hostname to masters. You can add more than one, one per line. Note that the hostname needs to be mapped well in the / etc/hosts of each server.

7. Modify the file / usr/local/hadoop/conf/hadoop-env.sh and add the line exportJAVA_HOME=/usr/local/jdk1.6.0_ 06.

Repeat the steps of 1x6 on each data node.

Initialize and start the hadoop cluster

(1) most of the operations of hadoop are performed on named nodes. Log in to the system as a hadoop user and then execute hadoopnamenode-format, which usually completes this process smoothly. After initialization, the data nodes do nothing (the most important thing is to generate a bunch of directories).

(2) start the hadoop service. The naming server executes start-all.sh as the hadoop user. Look at the processes, and if normal, you should see 2-3 java processes. If the startup is normal, the data node begins to generate the relevant directory. The comparison output is as follows:

III. Testing

1. To check the hadoop status, use the command $hadoopdfs-report.

2. Create a directory. $hadoopdfs-mkdirsery, and then send a few large files $hadoopdfs-put7.*sery. Passed about 900m of data (2 iso files), quickly finished.

3. Fault testing. Two data nodes are required, first shut down one data node, and then create a directory and copy the data on the network hadoop. When finished, start the service of the closed data node and observe its operation.

These are all the contents of the article "how to deploy hadoop-0.20.1". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.