How to install and use Hadoop 07/08 Update SLTechnology News&Howtos

How to install and use Hadoop

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to install and use Hadoop. I hope you will get something after reading this article. Let's discuss it together.

Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage.

In a nutshell, Hadoop is a software platform that makes it easier to develop and run software that handles large-scale data.

Hadoop implements a distributed file system (HadoopDistributedFileSystem), referred to as HDFS. HDFS has high fault tolerance (fault-tolerent) and is designed to be deployed on low-cost (low-cost) hardware. And it provides high transfer rate (highthroughput) to access the application's data, which is suitable for applications with very large data sets (largedataset). HDFS relaxes the requirement of (relax) POSIX (requirements) so that data in the file system can be accessed (streamingaccess) by stream. Here are the steps for installing and using Hadoop.

1. Deploy hadoop

So many of the Hadoop environment variables and configuration files mentioned earlier are on the dbrg-1 machine, so now you need to deploy hadoop to other machines to ensure that the directory structure is consistent.

Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-0.13.0wukong2:/home/wukong

Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-0.13.0wukong3:/home/wukong

Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-configwukong2:/home/wukong

Wukong@wukong1:~/hadoop-config$scp-r/home/wukong/hadoop-configwukong3:/home/wukong

At this point, it can be said that Hadoop has been deployed on various machines.

If you want to add a new node, the new node should repeat the previous 2 and 3 steps. After installing hadoop locally, copy the hadoop-config from other nodes, and modify the / ets/hosts and .ssh / authorized_keys of all other machines to add the identification of the new node.

two。 Start hadoop

After the Hadoop installation is complete, you can start. Before starting, we need to format namenode, enter the ~ / hadoop directory, and execute the following command

Wukong@wukong1:~/hadoop$bin/hadoopnamenode-format

No surprise, it should prompt you that the formatting was successful. If it doesn't work, go to the hadoop/logs/ directory and check the log files.

Now it's time to officially start hadoop. There are many startup scripts under bin/ that you can start according to your needs.

* start-all.sh starts all Hadoop daemons. Including namenode,datanode,jobtracker,tasktrack

* stop-all.sh stops all Hadoop

* start-mapred.sh starts the Map/Reduce daemon. Including Jobtracker and Tasktrack

* stop-mapred.sh stops Map/Reduce daemon

* start-dfs.sh launches HadoopDFS daemon .Namenode and Datanode

* stop-dfs.sh stops DFS daemon

Here, simply activate all guardians

Wukong@wukong1:~/hadoop$$bin/start-all.sh

Similarly, if you want to stop hadoop, then

Wukong@wukong1:~/hadoop$bin/stop-all.sh

3.hadoop file system operation

Wukong@wukong1:~/hadoop$bin/hadoopdfsadmin-report to view current file system status

Totalrawbytes:107354136576 (99.98GB)

Usedrawbytes:8215538156 (7.65GB)

% used:7.65%

Totaleffectivebytes:143160206 (136.52MB)

Effectivereplicationmultiplier:57.38702384935098

Datanodesavailable:2

Name:192.168.100.3:50010

State:InService

Totalrawbytes:39395708928 (36.69GB)

Usedrawbytes:3089165011 (2.87GB)

% used:7.84%

Lastcontact:TueJul1013:09:24CST2007

Name:192.168.100.2:50010

State:InService

Totalrawbytes:67958427648 (63.29GB)

Usedrawbytes:5126373145 (4.77GB)

% used:7.54%

Lastcontact:TueJul1013:09:25CST2007

Describe the total capacity and effective data size of wukong2 and wukong3 as datanode nodes.

4. Use hadoop to do calculations

Let's take a look at how to use Hadoop to do calculations after it is installed. Hadoop is calculated based on the mapreduce model. MapReduce is a simplified distributed programming model that allows programs to be automatically distributed to a large cluster of ordinary machines for concurrent execution. Just as java programmers can ignore memory leaks, MapReduce's run-time system solves the distribution details of input data, performs scheduling across machine clusters, handles machine failures, and manages communication requests between machines. This pattern allows programmers to deal with the resources of very large distributed systems without any experience in concurrent processing or distributed systems.

Hadoop comes with some testing examples:

Wukong@wukong1:~/hadoop$jar-tfhadoop-0.13.0-examples.jar

Then you can see the following: grep,wordcount,sort and so on.

Let's create a new folder on the file system: grepin

Wukong@wukong1:~/hadoop$./bin/hadoopdfs-mkdirgrepin

Create a file locally, test.txt

Wukong@wukong1:~/hadoop$cat/tmp/tmp_miao/test.txt

Test

Transfer to the file system

Wukong@wukong1:~/hadoop$./bin/hadoopdfs-put/tmp/tmp_miao/test.txtgrepin

Then wukong@wukong1:~/hadoop$./bin/hadoopdfs-lsrgrepin, you can see:

/ user/wukong/grepin/test.txt50

It means the file has been uploaded.

Then you can run grep:

$. / bin/hadoopjarhadoop-0.13.0-examples.jargrepgrepingrepouttest

The input file is in grepin, and the calculation result is written into grepout. If it does not exist before grepout, the result cannot be written into an existing file.

The files in grepout are as follows: (it seems to specify several reducertask, will generate several result files, and specify the number of reducertask through mapred.reduce.tasks in hadoop-site.xml)

Wukong@wukong1:~/hadoop$./bin/hadoopdfs-lsrgrepout

/ user/wukong/grepout/part-000008

/ user/wukong/grepout/part-000010

After reading this article, I believe you have a certain understanding of "how to install and use Hadoop". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.