How to install and deploy Hadoop 07/15 Update SLTechnology News&Howtos

How to install and deploy Hadoop

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will give you a detailed explanation on how to install and deploy Hadoop. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

1.Hadoop environment variable

Set the environment variables required by Hadoop in hadoop_env.sh under the / home/dbrg/HadoopInstall/hadoop-conf directory, where JAVA_HOME is a must. The HADOOP_HOME variable may or may not be set, and if not, HADOOP_HOME defaults to the parent directory of the bin directory, which is / home/dbrg/HadoopInstall/hadoop in this article. Mine is set up like this.

ExportHADOOP_HOME=/home/dbrg/HadoopInstall/hadoop

ExportJAVA_HOME=/usr/java/jdk1.6.0

From this place, you can see the advantages of creating a link hadoop for hadoop0.12.0. When you update the version of hadoop later, you don't need to change the configuration file, you just need to change the link.

2.Hadoop profile

As mentioned earlier, in the hadoop-conf/ directory, open the slaves file, which is used to specify all slave nodes and specify a hostname one line. This is the dbrg-2,dbrg-3 in this article, so the slaves file should look like this

Dbrg-2

Dbrg-3

All the configuration items for Hadoop are included in the hadoop-default.xml in the conf/ directory, but direct modification is not allowed! You can define the item we need in hadoop-site.xml under the hadoop-conf/ directory, and its value overrides the default value in hadoop-default.xml. It can be customized according to your actual needs. The following is my profile:

Fs.default.name dbrg-1:9000 Thenameofthedefaultfilesystem.Eithertheliteralstring "local" orahost:portforDFS. Mapred.job.tracker dbrg-1:9001 ThehostandportthattheMapReducejobtrackerrunsat.If "local", thenjobsarerunin-processasasinglemapandreducetask. Hadoop.tmp.dir / home/dbrg/HadoopInstall/tmp Abaseforothertemporarydirectories. Dfs.name.dir / home/dbrg/HadoopInstall/filesystem/name DetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethenametable.Ifthisisacomma-delimitedlistofdirectoriesthenthenametableisreplicatedinallofthedirectories,forredundancy. Dfs.data.dir / home/dbrg/HadoopInstall/filesystem/data DetermineswhereonthelocalfilesystemanDFSdatanodeshouldstoreitsblocks.Ifthisisacomma-delimitedlistofdirectories,thendatawillbestoredinallnameddirectories,typicallyondifferentdevices.Directoriesthatdonotexistareignored. Dfs.replication 1 Defaultblockreplication.Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.Thedefaultisusedifreplicationisnotspecifiedincreatetime.

3. Deploy Hadoop

So many Hadoop environment variables and configuration files mentioned in the previous Hadoop installation and deployment process are on the dbrg-1 machine, so now you need to deploy hadoop to other machines to ensure that the directory structure is consistent.

[dbrg@dbrg-1:~] $scp-r/home/dbrg/HadoopInstalldbrg-2:/home/dbrg/

[dbrg@dbrg-1:~] $scp-r/home/dbrg/HadoopInstalldbrg-3:/home/dbrg/

At this point, it can be said that Hadoop has been deployed on various machines, so let's start Hadoop.

4. Start Hadoop

You need to start Hadoop after the Hadoop installation and deployment is complete. Before starting, we need to format namenode, enter the ~ / HadoopInstall/hadoop directory, and execute the following command

[dbrg@dbrg-1:hadoop] $bin/hadoopnamenode-format

No surprise, it should prompt you that the formatting was successful. If it doesn't work, go to the hadoop/logs/ directory and check the log files.

Now it's time to officially start hadoop. There are many startup scripts under bin/ that you can start according to your needs.

* start-all.sh starts all Hadoop daemons. Including namenode,datanode,jobtracker,tasktrack

* stop-all.sh stops all Hadoop

* start-mapred.sh starts the Map/Reduce daemon. Including Jobtracker and Tasktrack

* stop-mapred.sh stops Map/Reduce daemon

* start-dfs.sh launches HadoopDFS daemon .Namenode and Datanode

* stop-dfs.sh stops DFS daemon

Here, simply activate all guardians

[dbrg@dbrg-1:hadoop] $bin/start-all.sh

Similarly, if you want to stop hadoop, then

[dbrg@dbrg-1:hadoop] $bin/stop-all.sh

HDFS operation

Run the hadoop command in the bin/ directory to see all the supported operations and their usage in Haoop. Here are a few simple operations as examples.

Create a catalog

[dbrg@dbrg-1:hadoop] $bin/hadoopdfs-mkdirtestdir

Create a directory called testdir in HDFS

Copy a file

[dbrg@dbrg-1:hadoop] $bin/hadoopdfs-put/home/dbrg/large.ziptestfile.zip

Copy the local file large.zip to the root directory / user/dbrg/ of HDFS, which is called testfile.zip

View existing files

[dbrg@dbrg-1:hadoop] $bin/hadoopdfs-ls.

This is the end of the article on "how to install and deploy Hadoop". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.