Hadoop~ big data 07/06 Update SLTechnology News&Howtos

Hadoop~ big data

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop is a distributed file system (Hadoop Distributed File System) HDFS. Hadoop is a software framework capable of distributed processing of a large amount of data. Hadoop processes data in a reliable, efficient, and scalable manner. Hadoop is reliable because it assumes that computing elements and storage will fail, so it maintains multiple copies of working data to ensure that processing can be redistributed against failed nodes. Hadoop comes with a framework written in the Java language.

The master node of Hadoop includes name nodes, dependent name nodes, and jobtracker daemons, as well as utilities and browsers used to manage the cluster. Slave nodes include tasktracker and data nodes. The master node includes daemons that provide Hadoop cluster management and coordination, while the slave nodes include daemons that implement Hadoop file system (HDFS) storage functions and MapReduce functions (data processing functions).

Namenode is the master server in Hadoop, software that usually runs on separate machines in the HDFS instance, and manages file system namespaces and access to files stored in the cluster. One namenode and one secondary namenode can be found in each Hadoop cluster. When an external client sends a request to create a file, NameNode responds with the block ID and the DataNode IP address of the first copy of the block. This NameNode also notifies other DataNode that will receive a copy of the block.

The Datanode,hadoop cluster contains a NameNode and a large number of DataNode. DataNode is usually organized in the form of a rack, which connects all systems through a switching machine. The DataNode responds to read and write requests from the HDFS client. They also respond to commands from NameNode to create, delete, and copy blocks.

JobTracker is a master service. After the software starts, JobTracker receives the Job and is responsible for scheduling each subtask of the Job task to run on the TaskTracker, monitors them, and reruns it if a failed task is found.

TaskTracker is a slaver service that runs on multiple nodes. TaskTracker actively communicates with JobTracker, receives jobs, and is responsible for performing each task directly. TaskTracker needs to be run on HDFS's DataNode.

NameNode, Secondary, NameNode, and JobTracker run on the Master node, while on each Slave node, deploy a DataNode and TaskTracker so that the data handler running by the Slave server can process native data as directly as possible.

Server2.example.com 172.25.45.2 (master)

Server3.example.com 172.25.45.3 (slave)

Server4.example.com 172.25.45.4 (slave)

Server5.example.com 172.25.45.5 (slave)

Configuration of hadoop traditional version:

Server2,server3,server4 and server5 add hadoop users:

Useradd-u 900 hadoop

Echo westos | passwd-- stdin hadoop

Server2:

Sh jdk-6u32-linux-x64.bin # # install JDK

Mv jdk1.6.0_32/ / home/hadoop/java

Mv hadoop-1.2.1.tar.gz / home/hadoop/

Su-hadoop

Vim .bash _ profile

Export JAVA_HOME=/home/hadoop/javaexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/libexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

Source .bash _ profile

Tar zxf hadoop-1.1.2.tar.gz # # configure hadoop single node

Ln-s hadoop-1.1.2 hadoop

Cd / home/hadoop/hadoop/conf

Vim hadoop-env.sh

Export JAVA_HOME=/home/hadoop/java

Cd..

Mkdir input

Cp conf/*.xml input/

Bin/hadoop jar hadoop-examples-1.1.2.jar

Bin/hadoop jar hadoop-examples-1.1.2.jar grep input output 'dfs [a murz.] +'

Cd output/

Cat *

1 dfsadmin

Set master to slave login without password:

Server2:

Su-hadoop

Ssh-keygen

Ssh-copy-id localhost

Ssh-copy-id 172.25.45.3

Ssh-copy-id 172.25.45.4

Cd / home/hadoop/hadoop/conf

Vim core-site.xml # # specify namenode

Fs.default.namehdfs://172.25.45.2:9000

Vim mapred-site.xml # # specify jobtracker

Mapred.job.tracker172.25.45.2:9001

Vim hdfs-site.xml # # specify the number of copies saved in the file

Dfs.replication1

Cd..

Bin/hadoop namenode-format # # formatted into a new file system

Ls / tmp

Hadoop-hadoop hsperfdata_hadoop hsperfdata_root yum.log

Bin/start-dfs.sh # # start the hadoop process

Jps

Bin/start-mapred.sh

Jps

Open it in the browser: 172.25.45.2purl 50030

Open 172.25.45.2purl 50070

Bin/hadoop fs-put input test # # test the distributed file system into the newly created file

Bin/hadoop jar hadoop-examples-1.2.1.jar wordcount output

At the same time in the web page

View the uploaded files on the web page:

Bin/hadoop fs-get output test

Cat test/*

Rm-fr test/ # # Delete downloaded files

2. Server2:

Shared file system:

Su-root

Yum install nfs-utils-y

/ etc/init.d/rpcbind start

/ etc/init.d/nfs start

Vim / etc/exports

/ home/hadoop * (rw,anonuid=900,anongid=900)

Exportfs-rv

Exportfs-v

Server3 and server4:

Yum install nfs-utils-y

/ etc/init.d/rpcbind start

Showmount-e 172.25.45.2 # #

Export list for 172.25.45.2:

/ home/hadoop *

Mount 172.25.45.2:/home/hadoop / home/hadoop/

Server2:

Su-hadoop

Cd hadoop/conf

Vim hdfs-site.xml

Dfs.replication2

Vim slaves # # ip on server

172.25.45.3172.25.45.4

Vim masters # # ip on Mastermind

172.25.45.2

Tip: # # if any previous process is open, it must be closed before it can be formatted to ensure that there are no processes running in jps.

To close a process

After bin/stop-all.sh # # execution, sometimes tasktracker,datanode will be turned on, so close them

Bin/hadoop-daemon.sh stop tasktracker

Bin/hadoop-daemon.sh stop datanode

Delete files in / tmp as hadoop user, and keep files without permission

Su-hadoop

Bin/hadoop namenode-format

Bin/start-dfs.sh

Bin/start-mapred.s

Bin/hadoop fs-put input test # #

Bin/hadoop jar hadoop-examples-1.2.1.jar grep test output 'dfs [a murz.] +' #

When uploading while opening 172.25.45.2 in the browser, you will find that there are files being uploaded.

Su-hadoop

Bin/hadoop dfsadmin-report

Dd if=/dev/zero of=bigfile bs=1M count=200

Bin/hadoop fs-put bigfile test

Open 172.25.45.2purl 50070 in the browser

3. Add server5.example.com 172.25.45.5 as the new slave terminal:

Su-hadoop

Yum install nfs-utils-y

/ etc/init.d/rpcbind start

Useradd-u 900 hadoop

Echo westos | passwd-- stdin hadoop

Mount 172.25.45.2:/home/hadoop/ / home/hadoop/

Su-hadoop

Vim hadoop/conf/slaves

172.25.45.3172.25.45.4172.25.45.5

Cd / home/hadoop/hadoop

Bin/hadoop-daemon.sh start datanode

Bin/hadoop-daemon.sh start tasktracker

Jps

Delete a slave:

Server2:

Su-hadoop

Cd / home/hadoop/hadoop/conf

Vim mapred-site.xml

Dfs.hosts.exclude/home/hadoop/hadoop/conf/datanode-excludes

Vim / home/hadoop/hadoop/conf/datanode-excludes

172.25.45.3 # # Delete 172.25.45.3 not as a slave

Cd / home/hadoop/hadoop

Bin/hadoop dfsadmin-refreshNodes # # refresh node

Bin/hadoop dfsadmin-report # # check the status of the node and find that the data on server3 is transferred to serve5

On server3:

Su-hadoop

Bin/stop-all.sh

Cd / home/hadoop/hadoop

Bin/hadoop-daemon.sh stop tasktracker

Bin/hadoop-daemon.sh stop datanode

Server2:

Vim / home/hadoop/hadoop/conf/slaves

172.25.45.4

172.25.45.5

4. Configure the new version of hadoop:

Server2:

Su-hadoop

Cd / home/hadoop

Tar zxf jdk-7u79-linux-x64.tar.gz

Ln-s jdk1.7.0_79/ java

Tar zxf hadoop-2.6.4.tar.gz

Ln-s hadoop-2.6.4 hadoop

Cd / home/hadoop/hadoop/etc/hadoop

Vim hadoop-env.sh

Export JAVA_HOME=/home/hadoop/javaexport HADOOP PREFIX=/home/hadoop/hadoop

Cd / home/hadoop/hadoop

Mkdir inp

Cp etc/hadoop/*.xml input

Tar-tf hadoop-native-64-2.6.0.tar

Tar-xf hadoop-native-64-2.6.0.tar-C hadoop/lib/native/

Cd / home/hadoop/hadoop

Rm-fr output/

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep input output 'dfs [a murz.] +'

Cd / hone/hadoop/hadoop/etc/hadoop/

Vim slaves

172.25.45.3172.25.45.4

Vim core-site.xm

Fs.defaultFShdfs://172.25.45.2:9000

Vim mapred-site.xml

Mapred.job.tracker172.25.45.2:9001

Vim hdfs-site.xml

Dfs.replication2

Cd / home/hadoop/hadoop

Bin/hdfs namenode-format

Sbin/start-dfs.sh

Jps

Bin/hdfs dfs-mkdir / user/hadoop # # files to be uploaded must create a new directory before uploading

Bin/hdfs dfs-put input/ test

Rm-fr input/

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar grep test output 'dfs [a murz.] +'

Bin/hdfs dfs-cat output/*

1dfsadmin

Open 172.25.45.2purl 50070 in the browser

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.