In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Editor to share with you how to install and use Hadoop. I hope you will get something after reading this article. Let's discuss it together.
Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage.
In a nutshell, Hadoop is a software platform that makes it easier to develop and run software that handles large-scale data.
Hadoop implements a distributed file system (HadoopDistributedFileSystem), referred to as HDFS. HDFS has high fault tolerance (fault-tolerent) and is designed to be deployed on low-cost (low-cost) hardware. And it provides high transfer rate (highthroughput) to access the application's data, which is suitable for applications with very large data sets (largedataset). HDFS relaxes the requirement of (relax) POSIX (requirements) so that data in the file system can be accessed (streamingaccess) by stream. Here are the steps for installing and using Hadoop.
1. Deploy hadoop
So many of the Hadoop environment variables and configuration files mentioned earlier are on the dbrg-1 machine, so now you need to deploy hadoop to other machines to ensure that the directory structure is consistent.
Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-0.13.0wukong2:/home/wukong
Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-0.13.0wukong3:/home/wukong
Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-configwukong2:/home/wukong
Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-configwukong3:/home/wukong
At this point, it can be said that Hadoop has been deployed on various machines.
If you want to add a new node, the new node should repeat the previous 2 and 3 steps. After installing hadoop locally, copy the hadoop-config from other nodes, and modify the / ets/hosts and .ssh / authorized_keys of all other machines to add the identification of the new node.
two。 Start hadoop
After the Hadoop installation is complete, you can start. Before starting, we need to format namenode, enter the ~ / hadoop directory, and execute the following command
Wukong@wukong1:~/hadoop$bin/hadoopnamenode-format
No surprise, it should prompt you that the formatting was successful. If it doesn't work, go to the hadoop/logs/ directory and check the log files.
Now it's time to officially start hadoop. There are many startup scripts under bin/ that you can start according to your needs.
* start-all.sh starts all Hadoop daemons. Including namenode,datanode,jobtracker,tasktrack
* stop-all.sh stops all Hadoop
* start-mapred.sh starts the Map/Reduce daemon. Including Jobtracker and Tasktrack
* stop-mapred.sh stops Map/Reduce daemon
* start-dfs.sh launches HadoopDFS daemon .Namenode and Datanode
* stop-dfs.sh stops DFS daemon
Here, simply activate all guardians
Wukong@wukong1:~/hadoop$$bin/start-all.sh
Similarly, if you want to stop hadoop, then
Wukong@wukong1:~/hadoop$bin/stop-all.sh
3.hadoop file system operation
Wukong@wukong1:~/hadoop$bin/hadoopdfsadmin-report to view current file system status
Totalrawbytes:107354136576 (99.98GB)
Usedrawbytes:8215538156 (7.65GB)
% used:7.65%
Totaleffectivebytes:143160206 (136.52MB)
Effectivereplicationmultiplier:57.38702384935098
Datanodesavailable:2
Name:192.168.100.3:50010
State:InService
Totalrawbytes:39395708928 (36.69GB)
Usedrawbytes:3089165011 (2.87GB)
% used:7.84%
Lastcontact:TueJul1013:09:24CST2007
Name:192.168.100.2:50010
State:InService
Totalrawbytes:67958427648 (63.29GB)
Usedrawbytes:5126373145 (4.77GB)
% used:7.54%
Lastcontact:TueJul1013:09:25CST2007
Describe the total capacity and effective data size of wukong2 and wukong3 as datanode nodes.
4. Use hadoop to do calculations
Let's take a look at how to use Hadoop to do calculations after it is installed. Hadoop is calculated based on the mapreduce model. MapReduce is a simplified distributed programming model that allows programs to be automatically distributed to a large cluster of ordinary machines for concurrent execution. Just as java programmers can ignore memory leaks, MapReduce's run-time system solves the distribution details of input data, performs scheduling across machine clusters, handles machine failures, and manages communication requests between machines. This pattern allows programmers to deal with the resources of very large distributed systems without any experience in concurrent processing or distributed systems.
Hadoop comes with some testing examples:
Wukong@wukong1:~/hadoop$jar-tfhadoop-0.13.0-examples.jar
Then you can see the following: grep,wordcount,sort and so on.
Let's create a new folder on the file system: grepin
Wukong@wukong1:~/hadoop$./bin/hadoopdfs-mkdirgrepin
Create a file locally, test.txt
Wukong@wukong1:~/hadoop$cat/tmp/tmp_miao/test.txt
Test
Transfer to the file system
Wukong@wukong1:~/hadoop$./bin/hadoopdfs-put/tmp/tmp_miao/test.txtgrepin
Then wukong@wukong1:~/hadoop$./bin/hadoopdfs-lsrgrepin, you can see:
/ user/wukong/grepin/test.txt50
It means the file has been uploaded.
Then you can run grep:
$. / bin/hadoopjarhadoop-0.13.0-examples.jargrepgrepingrepouttest
The input file is in grepin, and the calculation result is written into grepout. If it does not exist before grepout, the result cannot be written into an existing file.
The files in grepout are as follows: (it seems to specify several reducertask, will generate several result files, and specify the number of reducertask through mapred.reduce.tasks in hadoop-site.xml)
Wukong@wukong1:~/hadoop$./bin/hadoopdfs-lsrgrepout
/ user/wukong/grepout/part-000008
/ user/wukong/grepout/part-000010
After reading this article, I believe you have a certain understanding of "how to install and use Hadoop". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.