In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shares with you the content of the sample analysis of hadoop/hdfs. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Hadoop/hdfs
First of all, hadoop is an open source software framework implemented by the apache Foundation in Java, which implements distributed computing of massive data in a cluster composed of a large number of computers.
Hadoop/hdfs and mfs are both distributed file systems, and the comparison is as follows
1. Both hdfs and mfs are similar to goolefs, that is, a cluster composed of multiple chunkserver in a master+.
2. There is the problem of master single point failure.
3. All support online expansion
4. They all deal with large amounts of data, but they are weak in dealing with large and small files.
Differences:
1. Hadoop is implemented based on java, and mfs based on C++.
2. Snapshot function provided by mfs
3 、
Therefore, first of all, to ensure the normal operation of the java platform, download and install the jdk codebook, generally decompress it to / usr/local, and then create a soft connection.
Hadoop stand-alone mode:
[root@server1 local] # ln-s jdk1.7.0_79/ java
Go to / etc/profile to modify the java path (java_home,classpath,path).
Export JAVA_HOME=/usr/local/java
Export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
Export PATH=$PATH:$JAVA_HOME/bin
Source makes the configuration file effective.
[root@server1 java] # source / etc/profile
Check whether the configuration is correct
[root@server1 java] # echo $JAVA_HOME
/ usr/local/java
[root@server1 java] # echo $CLASSPATH
.: / usr/local/java/lib:/usr/local/java/jre/lib
[root@server1 java] # echo $PATH
/ usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/java/bin:/usr/local/java/bin
* if the system command cannot be used because of a configuration file error, you need to use an absolute path to invoke the command
Write a java Mini Program to test the correctness of the java environment:
[root@server1 java] # cat test.java
Class test
{
Public static void main (String args []) {
System.out.println ("hello world")
}
}
[root@server1 java] #! jav
Javac test.java
[root@server1 java] # java test
Hello world
At this point, the java environment configuration is complete.
Hadoop/hdfs installation.
[root@server1 hadoop] # tar-xf hadoop-2.7.3.tar.gz-C / usr/local/
Create a soft connection:
[root@server1 local] # ln-s hadoop-2.7.3/ hadoop
By default, hadoop runs in non-distributed mode, in which case debug can be done.
Try to use hdfs to count the data:
[root@server1 hadoop] # pwd
/ usr/local/hadoop
$mkdir input
$cp etc/hadoop/*.xml input
$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs [a murz.] +'
$cat output/*
[root@server1 hadoop] # cat output/*
1dfsadmin
1dfs.replication
Hdfs pseudo-distribution model
Many templates for hadoop configuration are available in share/doc, so move them to / var/www/html and view them using a web page to modify the configuration file:
[root@server1 doc] # cp hadoop/ / var/www/html/-r
[root@server1 hadoop] # pwd
/ usr/local/hadoop/etc/hadoop
[root@server1 hadoop] # vim core-site.xml
Fs.defaultFS
Hdfs://172.25.33.1:9000
Namenode IP can also use hostnames, but it's best to have resolution.
[root@server1 hadoop] # vim hdfs-site.xml
Dfs.replication
The number of chips per slice, which is configured as 1 in stand-alone mode.
[root@server1 hadoop] # vim hadoop-env.sh
24 # The java implementation to use.
25 export JAVA_HOME=/usr/local/java
Ssh will be used to log in multiple times when starting hdfs, so set ssh to enter the password to log in.
[root@server1 hadoop] # ssh-keygen / / enter all the way
[root@server1 hadoop] # ssh-copy-id 172.25.33.1 / distribute the key to 172.25.33.1, that is, the IP just set in core-site
Access without a password using ssh 172.25.33.1 is a success.
Format namenode nodes
[root@server1 hadoop] # bin/hdfs namenode-format
Start hdfs
[root@server1 hadoop] # sbin/start-dfs.sh
Use jps to view the resource information of this node
[root@server1 hadoop] # jps
2918 NameNode
3011 DataNode
3293 Jps
3184 SecondaryNameNode
At this point, information about hdfs can be accessed through 50070 of the web page.
* if you find that you cannot access it, prompt not found to check whether the local 50070 is open, then clean up all temporary files under / tmp/, clear the browser's cache, and access it again.
Create a hdfs directory to facilitate mapreduce work.
Bin/hdfs dfs-mkdir / user
Bin/hdfs dfs-mkdir / user/root
* * if the hdfs directory is not created here, an error may occur without this directory.
Put something in the item distributed system
Bin/hdfs dfs-put etc/hadoop/ input
Some test methods for running mapreduce:
$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs [a murz.] +'
Test the result of the output
[root@server1 hadoop] # bin/hdfs dfs-get output output
[root@server1 hadoop] # cat output/*
6dfs.audit.logger
4dfs.class
3dfs.server.namenode.
2dfs.period
2dfs.audit.log.maxfilesize
2dfs.audit.log.maxbackupindex
1dfsmetrics.log
1dfsadmin
1dfs.servers
1dfs.replication
1dfs.file
Or execute directly:
[root@server1 hadoop] # bin/hdfs dfs-cat output/*
At the same time, you can also check whether the result is successful in the browser.
Using bin/hdfs dfs enter, you will find a lot of commands that can be executed.
Distributed hdfs
Stop the bin/stop-dfs.sh.
Start three virtual machines:
Create a hadoop user, uid=1000, for three virtual machines
Mv hadoop-2.7.3/ / home/hadoop/
Vim / etc/profile
Export JAVA_HOME=/home/hadoop/java
Export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
Export PATH=$PATH:$JAVA_HOME/bin
Source / etc/profile
Change the master and group of all files under / home/hadoop/ are hadoop
Nfs-utils implementation / home/hadoop directory sharing is installed on all 2rem 3rem 4.
Modify the configuration file to share / home/hadoop
Vim / etc/exports
/ home/hadoop 172.25.33.0 Compact 24 (rw,anonuid=1000,anongid=1000)
Add slaves to server1
[root@server1 hadoop] # pwd
/ home/hadoop/hadoop/etc/hadoop
Edit the slaves file to add the host IP to be shared
172.25.33.3
172.25.33.4
Start nfs
Service nfs start
And check the status of rpcbind and rpcidmapd
If both states are not running, nfs may not be able to start
Put the two together
Execute showmount-e on server1 to view your own shares
Use exports-v to output your own share
Exportfs-v
/ shared 172.25.33.2 Compact 24 (rw,wdelay,root_squash,no_subtree_check)
/ var/www/html/upload
172.25.33.2 Compact 24 (rw,wdelay,root_squash,no_subtree_check)
/ home/hadoop 172.25.33.0 Compact 24 (rw,wdelay,root_squash,no_subtree_check,anonuid=1000,anongid=1000)
Hang on / home/hadoop / home/hadoop on server2,3,4
Set up password-free login for hadoop users of 2Bing 3 and 4.
Set the password for each hadoop
Switch to the hadoop user and generate the key:
Ssh-keygen
And then distribute it:
Ssh-copy-id 172.25.33.2/3/4
Add java environment variables for each machine
Vim / etc/profile
Source / etc/profile
Change the master group of files in the / home/hadoop directory to hadoop
Edit [root@server1 hadoop] # vim hdfs-site.xml
Change the value value to 2.
Use hadoop formatting to start hdfs
Use jps to view discoveries
[hadoop@server1 hadoop] $jps
17846 Jps
17737 SecondaryNameNode
17545 NameNode
There is no datanode locally. Looking at it on 3pr 4, I found that the data node has already passed over.
Jps
[hadoop@server3 ~] $jps
3005 DataNode
3110 Jps
At this time, there may be a variety of errors caused by various reasons. Therefore, if a node cannot be started, you can have various logs under log/ in the corresponding directory of hadoop/ to facilitate sending errors.
The main reason for this error is that namenode cannot be started. After sending errors, it is found that port 50070 has been occupied by java, so, decisively a killall-9 java, kill him, and then reformat, boot, that is, OK ~
Create a hdfs execution directory:
Bin/hdfs dfs-mkdir / user
Bin/hdfs dfs-mkdir / user/hadoop
Bin/hdfs dfs-put etc/hadoop/ input
Bin/hdfs dfs-cat input/*
You can also see it in the browser
Use bin/hdfs dfsadmin-report to view resource output and usage.
Start Explorer:
Yarn
Yarn is the resource manager of hadoop. His design idea is to split jobTracker into two separate services, one is the full play resource manager, ResourceManager and each application-specific ApplicationMaster.
ResourceManager is responsible for the resource allocation and management of the entire system, while AM is responsible for the management of individual applications.
On the whole, it is still a Mhammer S structure.
Hdfs resource migration:
Share and hang the root directory of the new machine, and start the datanode node
[hadoop@server5 hadoop] $sbin/hadoop-daemon.sh start datanode
Hosts parsing is also required on 5, and a sshkey password-free key is required.
Execute bin/hdfs dfsadmin-report on server1 to see the server5 in the node.
On server1, add the datanode control file:
Vim hdfs.site.xml
Dfs.replication
two
Dfs.hosts.exclude
/ home/hadoop/hadoop/etc/hadoop/exludes
Add control resource information
[hadoop@server1 hadoop] $vim / home/hadoop/hadoop/etc/hadoop/exludes
172.25.33.3 / server3's resources are operated on.
Refresh the resource node:
[hadoop@server1 hadoop] $bin/hdfs dfsadmin-refreshNodes
It will indicate success, but the actual resource transfer is in progress.
Use [hadoop@server1 hadoop] $bin/hdfs dfsadmin-report
View the resource change status.
When the resource is in progress, use the
[hadoop@server3 hadoop] $sbin/hadoop-daemon.sh stop datanode
Stop the resources of this node.
Using bin/hdfs dfsadmin-report, you will find that Resource 3 has stopped.
There is no resource information for using jps.
Thank you for reading! This is the end of this article on "sample Analysis of hadoop/hdfs". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.