The fastest deployment configuration strategy for Hadoop distributed cluster 04/27 Update SLTechnology News&Howtos

The fastest deployment configuration strategy for Hadoop distributed cluster

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This article only introduces the simplest deployment configuration in which apache hadoop is fully distributed without optimizing performance.

There are dozens of tuning parameters for hadoop in the actual production environment.

Introduction to Hadoop

The core structure of Hadoop framework is HDFS and MapReduce.

HDFS is a distributed storage scheme for massive data.

MapReduce provides computing for huge amounts of data.

Deployment environment

Centos 7, 3 or 4 if you need secondarynamenode

Respectively, namenode datanode0 datanode1 secondarynamenode is not configured for the time being.

Hadoop2.7.5 deployment steps go to the official website to download apache hadoop2.7.5 's binaray package, which is in tar.gz format. Download to namenode directly using wget or curl. After unzipping the tar.gz package using the command tar xf xxxxxxx-hadoop-xxxx.tar.gz, you will see that there is a hadoop directory under the current directory to copy the extracted directory to the corresponding files, such as: some people are used to putting it under / opt, others are used to being under / usr/local, this varies from person to person. At present, my practice is to create a directory such as / app, and then copy the hadoop directory to this directory. The operation is as follows: mkdir / app creates an app directory and then uses cp-r hadoop-xxxx / app/ to copy the hadoop-xxxx directory to / app/. Note here that the-r parameter configures ssh secret-free login permissions, that is, namenode can log in to other nodes without a password to configure etc/hadoop/core-site.xml.

Specify namenode

Fs.defaultFS

Hdfs://namenode/

Configure etc/hadoop/hdfs-site.xml

Specify the number of copies

Dfs.replication

one

Configure etc/hadoop/yarn.xml

Specify the host of the resource manager

Yarn.resourcemanager.hostname

Localhost

Specify the host of the node manager

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Configure etc/hadoop/mapred-site.xml

Specify mapreduce frame

Mapreduce.framework.name

Yarn

Configure the etc/hadoop/slave file and add datanode. The context of this article is datanode0 and datanode1.

Copy the directory of hadoop from namenode to other nodes, including all binary, scripts, and configuration files. By default, the directory location and structure of other nodes are the same as namenode.

Format hdfs

Hdfs namenode-format

Just start the cluster using sbin/start-all.sh

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.