Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to manage Hadoop

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to manage Hadoop". In daily operation, I believe many people have doubts about how to manage Hadoop. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts about "how to manage Hadoop"! Next, please follow the small series to learn together!

equalizer

The equalizer program is a hadoop daemon that reallocates blocks by moving them from busy datanodes to relatively idle datanodes.

HDFS does not automatically move blocks from old datanodes to new datanodes to balance clusters. Users need to run the equalizer automatically. The equalizer creates a log file in the standard log directory that records each reallocation (one line at a time). 3. dfs.balance.bandwidthPerSec attribute

The dfs.balance.bandwidthPerSec attribute in hdfs-site.xml specifies the bandwidth size MB/s of data replicated between different nodes by the equalizer when sizing the cluster.

1. How to use the equalizer?

The equalizer adheres to a block replica placement policy, spreading replicas across racks to reduce data corruption.

It keeps moving blocks until the cluster reaches equilibrium, i.e., the utilization (ratio of used space to space capacity on that node) of each datanode and the utilization (ratio of used space to space capacity in the cluster) of the cluster are very close, with the gap not exceeding a given threshold.

2. Criteria for judging equilibrium?

It keeps moving blocks until the cluster reaches equilibrium, i.e., the utilization (ratio of used space to space capacity on that node) of each datanode and the utilization (ratio of used space to space capacity in the cluster) of the cluster are very close, with the gap not exceeding a given threshold.

How do you determine the basic equilibrium?

The equalizer can be started by calling the following command:

% start-balancer.sh

The-threshold parameter specifies a threshold (in percentage format) to determine if the cluster is balanced.

This flag is optional, and if not used, the default threshold is 10%.

How many equalizers are there in a cluster?

Only one equalizer is running in the cluster at any one time.

5. When does the equalizer operate?

The equalizer runs until it becomes equalized in the cluster; after that, the equalizer cannot move any blocks or loses contact with the namenode.

6. What attribute specifies the bandwidth at which the equalizer replicates data between different nodes?

To reduce cluster load and avoid disturbing other users, the equalizer is designed to run in the background.

Bandwidth for replicating data between different nodes is also limited.

The default value is a tiny 1MB/s, which can be specified in bytes by the dfs.balance.bandwidthPerSec attribute in the hdfs-site.xml file.

Assign and remove nodes Assign new nodes 1. Assign new nodes Specific steps

1), configure hdfs-site.xml, pointing to namenode;

2) Configure mapred-site.xml to point to jobtracker;

3) Start datanodes and jobtracker daemons;

Note: It is best to specify some audited nodes in which new nodes are included.(????)

All datanodes connected to namenode

All datanodes allowed to connect to namenode are placed in a file with the file name specified by the dfs.hosts attribute

The file is placed in namenode's local file system, with each line corresponding to the network address of a datanode (reported by datanode-can be viewed via namenode's web page???).

If you need to specify more than one network address for a single datanode, you can put multiple network addresses on a single line, separated by spaces.

mapred.hosts (all tasktrackers connected to jobtracker)

All tasktrackers allowed to connect to jobtracker are also specified in the same file, by the mapred.hosts attribute

Typically, since the nodes of the cluster run both the datanode and tasktracker daemons, dfs.hosts and mapred.hosts point to a single file at the same time, the include file.

4. Steps to add a new node to the cluster

Add the network address of the new node to the include.

Run the following command to update the namenode's audited set of datanodes.

% hadoop dfsadmin -refreshNodes

3) Update slaves file with new nodes.

In this case, the Hadoop control script includes the new node in future operations.

Start a new datanode.

Restart MapReduce cluster.

Check whether the new datanodes and tasktrackers appear in the web interface.

Remove old node 1, datanode fault

HDFS tolerates data node failures, but this does not mean that arbitrary termination of data nodes is permitted.

Taking a three-replica strategy as an example, if three datacodes on different racks are shut down at the same time, the probability of data loss is very high.

2. tasktracker failure

Hadoop's tasktracker is also fault-tolerant.

If you close a tasktracker that is running a task, jobtracker will recognize the failure and reschedule the task on another tasktracker.

3. To remove a node, the node needs to appear in the execlude file.

For HDFS, file names are controlled by the dfs.hosts.exclude attribute;

For MapReduce, the file name is controlled by the mapred.hosts.execlude attribute.

These files list several nodes that are not allowed to connect to the cluster.

Typically, these two attributes point to the same file.

4. Steps to remove nodes from cluster

1) Add the network address of the node to be removed to the exclude file. Do not update include files.

Restart the MapReduce cluster to terminate the tasktracker running on the node to be removed.

Execute the following command to update the namenode settings with a new set of audited datanodes:

% hadoop dfsadmin -refreshNodes

4) Go to the web interface to check whether the management status of the pending datanodes has changed to "Decomposition In Progress".

Copy these data node blocks to other data nodes.

5) When the status of all datanodes changes to "Decommissioned" Yes, it indicates that all blocks have been copied. Close nodes that have been deactivated.

Remove these nodes from the include file and run the following command:

% hadoop dfsadmin -refreshNodes

7) Remove nodes from slaves file.

Delegate and de-node related knowledge points 1, dfs.hosts attribute and mapred.hosts attribute Why is the file specified different from slaves file?

The dfs.hosts and mapred.hosts properties are used by namenode and jobtracker to determine which worker nodes can be connected to.

Hadoop control scripts use slaves files to perform cluster-wide operations, such as restarting clusters.

Hadoop daemons never use slaves files.(???????????)

2. What is the correct way to remove a datanode node?

The user informs the namenode of several datanodes to exit so that blocks can be copied to other datanodes before those datanodes go down.

How do I know if a tasktracker can connect to a jobtracker?

You can connect to jobtracker only if tasktracker appears in include files and not exclude files.

Note:

If no include file is specified, or if the include file is empty, this means that all nodes are in the include file.

How do you determine if a datanode has been de-commissioned?

If a datanode appears in both include and exclude files, the node can be joined, but is quickly delegated.

If no include file is specified or if the include file is empty, all nodes are included.

At this point, the study of "how to manage Hadoop" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report