Building method of hadoop3.3 Cluster 07/19 Update SLTechnology News&Howtos

Building method of hadoop3.3 Cluster

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "the method of building hadoop3.3 cluster". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

An open source framework for storing massive data and running distributed analysis applications on a distributed server cluster. Its core components are HDFS and MapReduce.

Concept

HDFS is a distributed file system: the server namenode which stores the file metadata information and the server Datanode that actually stores the data are introduced to store and read the data distributed.

MapReduce is a computing framework: the core idea of MapReduce is to assign computing tasks to servers in the cluster to execute. Through the split of computing tasks (Map computing / Reduce computing), the tasks are distributed according to the task scheduler (JobTracker).

Service fsimage: metadata image file (the directory tree of the file system. Edits: operation log of metadata (records of modifications made to the file system) NameNode handles read and write requests from the client; configures copy policy; saves metadata information of HDFS, such as namespace information, block information, etc. When it runs, this information is stored in memory (the saved fsimage+edits). But this information can also be persisted to disk. SecondaryNameNode is dedicated to merging data from edits files in NameNode to fsimage, and then sending it to namenode to prevent edits from being too large. NodeManager manages every node in a YARN cluster. Such as monitoring resource usage (CPU, memory, hard disk, network), tracking node health, and so on. ResourceManager is the master node of the Yarn cluster, which is responsible for coordinating and managing the resource DataNode of the whole cluster (all NodeManager). It is responsible for storing the data block block; sent by client and performing the read and write operation of the data block. Hot backup: B is a hot backup of an if an is broken. Then b runs the job to replace an immediately. Cold backup: B is a cold backup of a, if an is broken. Then b can't take the place of a right away. However, some information about an is stored on b to reduce the loss after an is broken.

Cluster

Environment:

Centos7

Jdk1.8.0_241 / hadoop-3.3

The new version 3.3 used in this article to build a cluster (one master and two slaves)

192.168.41.128 server1192.168.41.129 server2192.168.41.130 server3# disable selinux/etc/selinux/config # configure secret-free login ssh-keygen ssh-copy-id-I. ssh / id_rsa.pub root@server2 ssh-copy-id-I. ssh / id_rsa.pub root@server3

Install jdk slightly..

Download and decompress: tar zxvf hadoop-3.3.0.tar.gz

For more information on configuration: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

# Administrators should use the etc/hadoop/hadoop-env.sh and optionally the etc/hadoop/mapred-env.sh and etc/hadoop/yarn-env.sh scripts to do site-specific customization of the Hadoop daemons' process environment. In the original words on the official website, it means to specify JAVA_HOMEexport JAVA_HOME=/usr/java/jdk1.8.0_241-amd64#etc/hadoop/core-site.xml fs.defaultFS hdfs://server1:9000 hadoop.tmp.dir / opt/hadoop-3.3.0/tmp # etc/hadoop/hdfs-site.xml and specify the number of copies of the data Less than or equal to the number of slave nodes dfs.replication 2 dfs.namenode.secondary.http-address server1:50090 # etc/hadoop/yarn-site.xml,yarn configure resource manager to provide unified resource management and scheduling yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname server1 # etc/hadoop/mapred-site.xml,mapreduce execution engine mapreduce.framework.name yarn

Initialize hdfs: bin/hdfs namenode-format

Modify the executive role

# sbin/start-dfs.sh,sbin/stop-dfs.shHDFS_DATANODE_USER=rootHDFS_DATANODE_SECURE_USER=hdfsHDFS_NAMENODE_USER=rootHDFS_SECONDARYNAMENODE_USER=root#sbin/start-yarn.sh,sbin/stop-yarn.shYARN_RESOURCEMANAGER_USER=rootHADOOP_SECURE_DN_USER=yarnYARN_NODEMANAGER_USER=root

Configure the slave node etc/hadoop/works and modify the corresponding host

Start sbin/start-all.sh

Access to http://192.168.41.128:9870/, that is, the host + port can be accessed. The following description shows that it is successful.

This is the end of the content of "hadoop3.3 Cluster Building method". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.