Example Analysis of hadoop/hdfs 10/21 Update SLTechnology News&Howtos

Example Analysis of hadoop/hdfs

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shares with you the content of the sample analysis of hadoop/hdfs. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Hadoop/hdfs

First of all, hadoop is an open source software framework implemented by the apache Foundation in Java, which implements distributed computing of massive data in a cluster composed of a large number of computers.

Hadoop/hdfs and mfs are both distributed file systems, and the comparison is as follows

1. Both hdfs and mfs are similar to goolefs, that is, a cluster composed of multiple chunkserver in a master+.

2. There is the problem of master single point failure.

3. All support online expansion

4. They all deal with large amounts of data, but they are weak in dealing with large and small files.

Differences:

1. Hadoop is implemented based on java, and mfs based on C++.

2. Snapshot function provided by mfs

3 、

Therefore, first of all, to ensure the normal operation of the java platform, download and install the jdk codebook, generally decompress it to / usr/local, and then create a soft connection.

Hadoop stand-alone mode:

[root@server1 local] # ln-s jdk1.7.0_79/ java

Go to / etc/profile to modify the java path (java_home,classpath,path).

Export JAVA_HOME=/usr/local/java

Export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Export PATH=$PATH:$JAVA_HOME/bin

Source makes the configuration file effective.

[root@server1 java] # source / etc/profile

Check whether the configuration is correct

[root@server1 java] # echo $JAVA_HOME

/ usr/local/java

[root@server1 java] # echo $CLASSPATH

.: / usr/local/java/lib:/usr/local/java/jre/lib

[root@server1 java] # echo $PATH

/ usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/java/bin:/usr/local/java/bin

* if the system command cannot be used because of a configuration file error, you need to use an absolute path to invoke the command

Write a java Mini Program to test the correctness of the java environment:

[root@server1 java] # cat test.java

Class test

{

Public static void main (String args []) {

System.out.println ("hello world")

}

[root@server1 java] #! jav

Javac test.java

[root@server1 java] # java test

Hello world

At this point, the java environment configuration is complete.

Hadoop/hdfs installation.

[root@server1 hadoop] # tar-xf hadoop-2.7.3.tar.gz-C / usr/local/

Create a soft connection:

[root@server1 local] # ln-s hadoop-2.7.3/ hadoop

By default, hadoop runs in non-distributed mode, in which case debug can be done.

Try to use hdfs to count the data:

[root@server1 hadoop] # pwd

/ usr/local/hadoop

$mkdir input

$cp etc/hadoop/*.xml input

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs [a murz.] +'

$cat output/*

[root@server1 hadoop] # cat output/*

1dfsadmin

1dfs.replication

Hdfs pseudo-distribution model

Many templates for hadoop configuration are available in share/doc, so move them to / var/www/html and view them using a web page to modify the configuration file:

[root@server1 doc] # cp hadoop/ / var/www/html/-r

[root@server1 hadoop] # pwd

/ usr/local/hadoop/etc/hadoop

[root@server1 hadoop] # vim core-site.xml

Fs.defaultFS

Hdfs://172.25.33.1:9000

Namenode IP can also use hostnames, but it's best to have resolution.

[root@server1 hadoop] # vim hdfs-site.xml

Dfs.replication

The number of chips per slice, which is configured as 1 in stand-alone mode.

[root@server1 hadoop] # vim hadoop-env.sh

24 # The java implementation to use.

25 export JAVA_HOME=/usr/local/java

Ssh will be used to log in multiple times when starting hdfs, so set ssh to enter the password to log in.

[root@server1 hadoop] # ssh-keygen / / enter all the way

[root@server1 hadoop] # ssh-copy-id 172.25.33.1 / distribute the key to 172.25.33.1, that is, the IP just set in core-site

Access without a password using ssh 172.25.33.1 is a success.

Format namenode nodes

[root@server1 hadoop] # bin/hdfs namenode-format

Start hdfs

[root@server1 hadoop] # sbin/start-dfs.sh

Use jps to view the resource information of this node

[root@server1 hadoop] # jps

2918 NameNode

3011 DataNode

3293 Jps

3184 SecondaryNameNode

At this point, information about hdfs can be accessed through 50070 of the web page.

* if you find that you cannot access it, prompt not found to check whether the local 50070 is open, then clean up all temporary files under / tmp/, clear the browser's cache, and access it again.

Create a hdfs directory to facilitate mapreduce work.

Bin/hdfs dfs-mkdir / user

Bin/hdfs dfs-mkdir / user/root

* * if the hdfs directory is not created here, an error may occur without this directory.

Put something in the item distributed system

Bin/hdfs dfs-put etc/hadoop/ input

Some test methods for running mapreduce:

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs [a murz.] +'

Test the result of the output

[root@server1 hadoop] # bin/hdfs dfs-get output output

[root@server1 hadoop] # cat output/*

6dfs.audit.logger

4dfs.class

3dfs.server.namenode.

2dfs.period

2dfs.audit.log.maxfilesize

2dfs.audit.log.maxbackupindex

1dfsmetrics.log

1dfsadmin

1dfs.servers

1dfs.replication

1dfs.file

Or execute directly:

[root@server1 hadoop] # bin/hdfs dfs-cat output/*

At the same time, you can also check whether the result is successful in the browser.

Using bin/hdfs dfs enter, you will find a lot of commands that can be executed.

Distributed hdfs

Stop the bin/stop-dfs.sh.

Start three virtual machines:

Create a hadoop user, uid=1000, for three virtual machines

Mv hadoop-2.7.3/ / home/hadoop/

Vim / etc/profile

Export JAVA_HOME=/home/hadoop/java

Export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Export PATH=$PATH:$JAVA_HOME/bin

Source / etc/profile

Change the master and group of all files under / home/hadoop/ are hadoop

Nfs-utils implementation / home/hadoop directory sharing is installed on all 2rem 3rem 4.

Modify the configuration file to share / home/hadoop

Vim / etc/exports

/ home/hadoop 172.25.33.0 Compact 24 (rw,anonuid=1000,anongid=1000)

Add slaves to server1

[root@server1 hadoop] # pwd

/ home/hadoop/hadoop/etc/hadoop

Edit the slaves file to add the host IP to be shared

172.25.33.3

172.25.33.4

Start nfs

Service nfs start

And check the status of rpcbind and rpcidmapd

If both states are not running, nfs may not be able to start

Put the two together

Execute showmount-e on server1 to view your own shares

Use exports-v to output your own share

Exportfs-v

/ shared 172.25.33.2 Compact 24 (rw,wdelay,root_squash,no_subtree_check)

/ var/www/html/upload

172.25.33.2 Compact 24 (rw,wdelay,root_squash,no_subtree_check)

/ home/hadoop 172.25.33.0 Compact 24 (rw,wdelay,root_squash,no_subtree_check,anonuid=1000,anongid=1000)

Hang on / home/hadoop / home/hadoop on server2,3,4

Set up password-free login for hadoop users of 2Bing 3 and 4.

Set the password for each hadoop

Switch to the hadoop user and generate the key:

Ssh-keygen

And then distribute it:

Ssh-copy-id 172.25.33.2/3/4

Add java environment variables for each machine

Vim / etc/profile

Source / etc/profile

Change the master group of files in the / home/hadoop directory to hadoop

Edit [root@server1 hadoop] # vim hdfs-site.xml

Change the value value to 2.

Use hadoop formatting to start hdfs

Use jps to view discoveries

[hadoop@server1 hadoop] $jps

17846 Jps

17737 SecondaryNameNode

17545 NameNode

There is no datanode locally. Looking at it on 3pr 4, I found that the data node has already passed over.

Jps

[hadoop@server3 ~] $jps

3005 DataNode

3110 Jps

At this time, there may be a variety of errors caused by various reasons. Therefore, if a node cannot be started, you can have various logs under log/ in the corresponding directory of hadoop/ to facilitate sending errors.

The main reason for this error is that namenode cannot be started. After sending errors, it is found that port 50070 has been occupied by java, so, decisively a killall-9 java, kill him, and then reformat, boot, that is, OK ~

Create a hdfs execution directory:

Bin/hdfs dfs-mkdir / user

Bin/hdfs dfs-mkdir / user/hadoop

Bin/hdfs dfs-put etc/hadoop/ input

Bin/hdfs dfs-cat input/*

You can also see it in the browser

Use bin/hdfs dfsadmin-report to view resource output and usage.

Start Explorer:

Yarn

Yarn is the resource manager of hadoop. His design idea is to split jobTracker into two separate services, one is the full play resource manager, ResourceManager and each application-specific ApplicationMaster.

ResourceManager is responsible for the resource allocation and management of the entire system, while AM is responsible for the management of individual applications.

On the whole, it is still a Mhammer S structure.

Hdfs resource migration:

Share and hang the root directory of the new machine, and start the datanode node

[hadoop@server5 hadoop] $sbin/hadoop-daemon.sh start datanode

Hosts parsing is also required on 5, and a sshkey password-free key is required.

Execute bin/hdfs dfsadmin-report on server1 to see the server5 in the node.

On server1, add the datanode control file:

Vim hdfs.site.xml

Dfs.replication

two

Dfs.hosts.exclude

/ home/hadoop/hadoop/etc/hadoop/exludes

Add control resource information

[hadoop@server1 hadoop] $vim / home/hadoop/hadoop/etc/hadoop/exludes

172.25.33.3 / server3's resources are operated on.

Refresh the resource node:

[hadoop@server1 hadoop] $bin/hdfs dfsadmin-refreshNodes

It will indicate success, but the actual resource transfer is in progress.

Use [hadoop@server1 hadoop] $bin/hdfs dfsadmin-report

View the resource change status.

When the resource is in progress, use the

[hadoop@server3 hadoop] $sbin/hadoop-daemon.sh stop datanode

Stop the resources of this node.

Using bin/hdfs dfsadmin-report, you will find that Resource 3 has stopped.

There is no resource information for using jps.

Thank you for reading! This is the end of this article on "sample Analysis of hadoop/hdfs". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.