What are the modes configured by Hadoop 04/22 Update SLTechnology News&Howtos

What are the modes configured by Hadoop

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the modes of Hadoop configuration". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the modes of Hadoop configuration".

Install Hadoop cluster

1. It is necessary to set up a server as the master node.

2. This node hosts daemons for NameNode and JobTracker.

3. It will also act as a base station to contact and activate the DataNode and TaskTracker daemons on all slave nodes.

4. Therefore, we need to customize a means for the primary node to access each node in the cluster remotely.

How do you make the primary node remotely access each node in the cluster?

Use the password-less (passphraseless) SSH protocol.

What is the SSH protocol?

SSH uses standard public key encryption to generate a pair of user authentication keys-a public key and a private key. The public key is stored locally on each node in the cluster, and the private key is sent by the primary node when it attempts to access the remote node. Combining these two pieces of information, the target machine can authenticate the login attempt.

Define a public account

It is convenient to transfer from the user account of one node to another user account on the target machine.

For Hadoop, the accounts on all nodes should have the same user name.

How do I check if SSH is installed on the node?

Which ssh

Which sshd

Which ssh-keygen

Profile description for Hadoop

The settings for Hadoop are mainly contained in the XML configuration file, which were hadoop-default.xml and hadoop.site.xml before version 0.20. As the name implies, hadoop-default.xml contains the default settings that Hadoop will use unless these settings are explicitly overridden in hadoop-site.xml. Therefore, in practice you only need to deal with hadoop-sitex.xml. In version 0.20, this file is split into three XML files: core-site.xml, hdfs-site,xnl, and mapred-site.xml. This refactoring better corresponds to the Hadoop subsystem they control.

Three modes of Hadoop configuration

Stand-alone mode

Pseudo-distributed mode

Fully distributed mode

What does stand-alone mode mean?

Stand-alone mode is the default mode of Hadoop. When the source package of Hadoop is decompressed for the first time, Hadoop cannot understand the hardware installation environment, so it conservatively chooses the minimum configuration. All three XML files (or hadoop-site.xml prior to version 0.20) are empty in this default mode.

When the configuration file is empty, Hadoop runs entirely locally. Because there is no need to interact with other nodes, stand-alone mode does not apply to HDFS, nor does it load any Hadoop daemons. This mode is mainly used to develop and debug the application logic of MapReduce programs without interacting with daemons to avoid additional complexity.

Pseudo-distributed mode?

Pseudo-distributed mode runs Hadoop on a "single node cluster", where all daemons are running on the same machine. This mode adds code debugging to stand-alone mode, allowing you to check memory usage, HDFS input and output, and other daemon interactions.

Simple configuration:

Core-site.xml:

Fs.default.name

Hdfs://localhost:9000

Mapred-site.xml

Mapred.job.tracker

Localhost:9001

Hdfs-site.xml

Dfs.replication

one

We have hostnames and ports for NameNode and JobTracker in core-site.xml and mapred-site.xml, respectively. The default number of replicas for HDFS is specified in hdfs-site.xml because it runs on only one node, where the number of replicas is 1. We also need to specify the location of SNN in the file masters and the location of the slave node in the file salves

Vi masters

Localhost

Vi slaves

Localhost

Although all daemons run on the same node, they still communicate with each other through the same SSH protocol as if they were distributed in a cluster.

What do you use to load the daemon?

Bin/start-all.sh

What do you use to see if the daemon starts?

Jps

How do I shut down the daemon for Hadoop?

Bin/stop-all.sh

What does fully distributed mode mean?

Hadoop clusters that are really used in production environments.

Cluster node description:

The master node of the master- cluster, hosting NameNode and JobTracker daemons

The node where the backup- resides the SNN daemon

Slave nodes of hadoop1, hadoop2, and hadoop3- clusters, hosting DataNode and TaskTracker daemons.

Configuration file:

Core-site.xml:

Fs.default.name

Hdfs://master:9000

Mapred-site.xml

Mapred.job.tracker

Master:9001

Hdfs-site.xml

Dfs.replication

three

Two things different from pseudo-distribution:

1. Explicitly declare the hostname where the NameNode and JobTracker daemons are located

2. The backup parameters of HDFS are increased to take the advantage of distributed storage, and the availability and reliability of data can be improved by replication on HDFS.

Vi masters

Backup

Vi slaves

Hadoop1

Hadoop2

Hadoop3

After copying these files to all nodes on the cluster, be sure to format the HDFS to be ready to store the data

Bin/hadoop namenode-format

How to switch between modes?

One trick that is useful when starting to use Hadoop is to use symbolic links instead of constantly editing XML files to switch between Hadoop modes. To do this, you need to generate a separate configuration directory for each mode and put the appropriate version of the XML file accordingly.

You can then use Linux's ln commands, such as ln-s conf.cluster conf, to switch between different configurations. This technique also helps to temporarily separate a node from the cluster to debug a MapReduce program through pseudo-distribution patterns, but you need to make sure that these patterns have different file storage locations on the HDFS and that all daemons should be stopped before changing the configuration.

Cluster user Interface based on Web

NameNode provides general reports through port 50070, depicting the status view of the HDFS on the cluster.

Through this interface, you can browse the file system, check the status of each DataNode in the cluster, and check the log of the Hadoop daemon in detail to determine whether the cluster is currently running correctly.

JobTracker provides views through port 50030, including the run-time status of tasks in MapReduce, as well as detailed reports of the entire job. Detailed reports refer to: these logs describe which node performs which tasks, and the ratio of time or resources required to complete each task, as well as the configuration of Hadoop for individual jobs.

Thank you for your reading, the above is the content of "what is the mode of Hadoop configuration?" after the study of this article, I believe you have a deeper understanding of what the mode of Hadoop configuration has, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.