The basic construction of hadoop 07/06 Update SLTechnology News&Howtos

The basic construction of hadoop

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop

Server1.example.com 172.25.23.1 master

Server2.example.com 172.25.23.2 slave

Server3.example.com 172.25.23.3 slave

Server4.example.com 172.25.23.4 slave

Selinux iptables disabled plus parsing (ping can be communicated between nodes) sshd enaled

Hadoop1.2.1

A stand-alone computer for storage and calculation on Master

Useradd-u 900 hadoop

Echo westos | passwd-- stdin hadoop

Su-hadoop

1. Install java (uninstall java first if there is one on the original machine)

(1)

Sh jdk-6u32-linux-x64.bin

Mv jdk1.6.0_32 / home/hadoop

Ln-s jdk1.6.0_32 java

(2) add a path

Vim .bash _ profile

Export JAVA_HOME=/home/hadoop/java

Export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Export PATH=$PATH:$JAVA_HOME/bin

Source .bash _ profile

two。 Set up hadoop

Tar zxf hadoop-1.2.1.tar.gz

Ln-s hadoop-1.2.1 hadoop

3. No password ssh setting

Ssh-keygen

Ssh-copy-id 172.25.23.1

Ssh 172.25.23.1 Test ensures no password

4. Modify the configuration file

(1) set slave node

Vim hadoop/conf/slave

172.25.23.1

(2) set up the master node

Vim hadoop/conf/master

172.25.23.1

(3) modify the home path of java

Vim hadoop/conf/hadoop-env.sh

(4) modify the Hadoop core configuration file core-site.xml to configure the address and port number of HDFS to specify namenode

Vim hadoop/conf/core-site.xml

(5) specify the number of copies saved in the file

Vim hadoop/conf/hdfs-site.xml

(6) specify jobtracker

Vim hadoop/conf/mapred-site.xml

5. Start the service

(1) format a new distributed file system

Bin/hadoop namenode-format

(2) start the Hadoop daemon

Bin/start-all.sh = bin/start-dfs.sh + bin/start-mapred.sh

(3) View the progress

A) jps

B) bin/hadoop dfsadmin-report

7. Some common hadoop commands (commands similar to linux only add hadoop peculiar to the previous ones)

Bin/hadoop fs-ls

Mkdir input

Cp conf/*.xml input

Bin/hadoop jar hadoop-examples-1.2.1.jar grep input output 'dfs [a murz.] +'

Bin/hadoop fs-cat output/*

Bin/hadoop fs-put conf/ input

Bin/hadoop fs-get output output

8. Browse the network interfaces of NameNode and JobTracker, and their addresses are:

NameNode-http://172.25.23.1:50070/

Click / user/hadoop/ under Browse the filesystem to see the uploaded

JobTracker-http://172.25.23.1:50030/

Two distributed deployment

Stop the related services on master, bin/stop-all.sh, and then delete / tmp/*

Slave

1. Directory Settin

Useradd-u 900 hadoop

two。 Make sure that master can connect to slave without a password

Yum install-y rpcbind

/ etc/init.d/rpcbind start (a mediation service of nfs is used to notify the client)

3. Synchronous data (nfs)

(1) on the master end (under the sharing node root)

/ etc/init.d/nfs start

Vim / etc/exports

Exportfs-rv

(2) Mount it on the slave end

Yum install-y nfs-utils

Showmount-e 172.25.23.1

Mount 172.25.23.1:/home/hadoop / home/hadoop

Master

1. Modify the configuration file

(1) vim hadoop/conf/slave

172.25.23.2

172.25.23.3

(2) vim hadoop/conf/hdfs-site.xml

Keep 2 copies of datanade

two。 See if master can connect with slave without a password.

Ssh 172.25.23.2

If you need a password, do the following

(1) enter the hadoop user to view the permission display (correct should be shown as hadoop)

(2) whether the analysis is correct

(3) whether rpcbind is enabled

(4) if all the above are correct, it can be carried out.

Chkconfig rpcbind on

Chkconfig rpcgssd on

Chkconfig rpcidmapd on

Chkconfig rpcsvcgssd on

Reboot

You can connect without a password.

3. Start the service

(1) format a new distributed file system

Bin/hadoop namenode-format

(2) start the Hadoop daemon

Bin/start-all.sh

(3) View the progress

Master

Slave

(4) upload files

Bin/hadoop fs-put conf/ input

(5) visit 172.25.23.150030

You can see there are 2 nodes

172.25.23.150070

There are files uploaded

Add a slave node (172.25.23.4) and transfer the file

1. The new node makes the same settings as the configured slave node.

Yum install-y nfs-utils rpcbind

Useradd-u 900 hadoop

/ etc/init.d/rpcbind start

Vim / etc/hosts

Showmount-e 172.25.23.1

Mount 172.25.23.1:/home/hadoop / home/hadoop

two。 Modify the slaves on the master side

Add 172.25.23.4

3. Start the service on the new slave node to join the cluster

Bin/hadoop-daemon.sh start datanode

Bin/hadoop-daemon.sh start tasktracker

4. View on master

Bin/hadoop dfsadmin-report

You can see the new node

5. Balanced data:

Bin/start-balancer.sh

1) if equalization is not performed, cluster will store all the new data on the new datanode, which will reduce the efficiency of mapred.

2) set the balance threshold. The default is 10%. The lower the value, the more balanced each node, but the longer the bin/start-balancer.sh-threshold 5.

6. Data transfer deletion

(1) vim hadoop/conf/mapred-site.xml

Add the following

(2) add the hostname to be deleted

Vim / home/hadoop/hadoop/conf/hostexclude

172.25.23.3

(3) bin/hadoop dfsadmin-refreshNodes

This operation migrates the data in the background, and when the status of this node is displayed as Decommissioned, it can be safely shut down. You can view the datanode status through bin/hadoop dfsadmin-report

When doing data migration, this node should not participate in tasktracker, otherwise an exception will occur.

(4) deleting tasktracker can be stopped directly on 172.25.23.3 (there are no node nodes on it)

4. Recover junk files

1. Edit junk file retention time

Vim hadoop/conf/core-site.xml

two。 test

Delete the file you can find that there is an extra .Trash directory to enter the directory layer by layer until you find the deleted file and then mv the file to the original directory. You can find that there are no files in .Trash.

Hadoop2.6.4

Distributed deployment (all master and slave nodes are switched to su-hadoop)

In the same environment directory as 1.2.1, nfs and so on remain unchanged.

This configuration uses a 1.2.1 machine (the host rpcbind nfs in it is modified). When reconfiguring 2.6.4, stop all 1.2.1 hadoop services, delete links such as java, delete files under tmp/, etc.

Second Java configuration

Version 2.6.4 requires java version 6 or 7

1. Download the java installation package (in hadoop's home directory)

Jdk-7u79-linux-x64.tar.gz

Tar zxf jdk-7u79-linux-x64.tar.gz

Ln-s jdk1.7.0_79/ java

two。 Configure the java path (same as the configuration of 1.2)

3. Check the version number java-version

Configuration of three Hadoop

Cd hadoop/etc/hadoop

1. Vim core-site.xml

2. Vim hdfs-site.xml

3. Cp mapred-site.xml.template mapred-site.xml

Vim mapred-site.xml

4. Vim yarn-site.xml

5. Vim yarn-env.sh

6. Vim etc/hadoop/hadoop-env.sh

7. Vim slaves

172.25.23.2

172.25.23.3

172.25.23.4

4 start the service

1. Formatting

In tar xf hadoop-native-64-2.6.0.tar-C hadoop/lib/native lib, it is best to save or delete the original one and change the library file to 64-bit

Bin/hdfs namenode-format

two。 Start the service

Sbin/start-dfs.sh sbin/start-yarn.sh

3. View the process

Master

Slave

5. Upload files

1. Create a saved directory (version V1 is created automatically)

Bin/hdfs dfs-mkdir / user

Bin/hdfs dfs-mkdir / user/hadoop

two。 Upload files

Mkdir input

Cp etc/hadoop/*.xml input

Bin/hdfs dfs-put input

Bin/hadoop jar hadoop-examples-1.2.1.jar wordcount input output bin/hadoop jar hadoop-examples-1.2.1.jar wordcount input output

3. Visit

172.25.23.1:8088

172.25.23.1:50070

Change the following .jsp to .html to access the page

problem

1.datanade did not start

No datanade to stop will be found when shutting down the node.

A namenodeID is created after each format, and the tmp contains the last IDnamenode-format to clear the data under the namenode, but the failure to clear the data under the datanade leads to startup failure, so all the data under the / tmp/* under the master-slave node must be erased after each format.

2.namenode in safe mode

Just execute bin/hadoop dfsadmin-safemode leave

3.Exceeded MAX_FAILED_UNIQUE_FETCHES

This is because there are too many files open in the program. By default, the average user in the general system should not exceed 1024.

You can switch to root Modification / etc/security/limits.conf

Add hadoop-nproc 4096

Hadoop-nofile 65535

-can represent soft links and hard links, and then switch to hadoop for viewing

4. Vim hadoop/conf/core-site.xml

(configure the hadoop.tmp.dir parameter under 2.6.4)

Fs.defaultFS

Hdfs://172.25.23.1:9000

Hadoop.tmp.dir

/ home/hadoop/tmp

Dfs.namenode.name.dir

/ home/hadoop/tmp/namedir

Dfs.datanade.data.dir

/ home/hadoop/tmp/datadir

If these parameters are not configured, the default temporary directory is under / tmp/, and the / tmp/ directory will be emptied every time it is restarted. It must be re-format.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.