Cdh summed up 002 after reading the official documents 07/01 Update SLTechnology News&Howtos

Cdh summed up 002 after reading the official documents

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Cloudera Manager contains internal rack-aware scripts, but you must specify the rack where the hosts in the cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify racks for each host. HDFS,MapReduce and YARN will automatically use the rack you specify.

Https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_specify_rack.html

2. The operating system reference links supported by different versions of cdh are

Https://blog.csdn.net/high3011/article/details/78131668

Https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#concept_xdm_rgj_j1b

3. Deactivate the host and stop all roles on the host without having to deactivate the roles on each service separately. Retirement applies only to HDFS DataNode,MapReduce TaskTracker,YARN NodeManager and HBase RegionServer roles. If other roles are running on the host, they are stopped. After all roles on the host have been deactivated and stopped, the host can be removed from the service. You can deactivate multiple hosts in parallel.

4. If the number of DataNode is equal to the replication factor of any file stored in HDFS (the default is 3), you cannot deactivate DataNode or hosts with DataNode. For example, if any file has a replication factor of 3 and you have three DataNode, you cannot deactivate DataNode or hosts with DataNode. If you try to deactivate DataNode or a host with DataNode in this case, DataNode will be deactivated, but the deactivation process will not be completed. You must abort your retirement and restart DataNode.

/ / this sentence means that there are only three datanode, but you have to disable one, but the replication factor is 3. You must modify the replication factor before you can disable a datanode.

Note: when you exit DataNode, the block is not deleted from the storage directory. You must delete the data manually.

5. When an DataNode is decommissioned, NameNode ensures that each block from the DataNode is still available throughout the cluster as indicated by the replication factor. This process involves copying blocks from DataNode in small batches. If the DataNode has thousands of blocks, it may take several hours to retire. Before using DataNodes to deactivate the host, you should first adjust the HDFS:

(1) run the following command to identify any problems in the HDFS file system:

Hdfs fsck /-list-corruptfileblocks-openforwrite-files-blocks-locations 2 > & 1 > / tmp/hdfs-fsck.txt

(2) fix any problems reported by the fsck command. If the command output lists corrupted files, use the fsck command to move them to the lost + found directory or delete them:

Hdfs fsck file_name-move or hdfs fsck file_name-delete

(3) increase the heap size of DataNodes. DataNodes should configure a heap size of at least 4 GB to allow iterations and maximum flow to increase.

Go to the HDFS service page.

Click the configuration tab.

Select Scope > DataNode.

Select Category > Resource Management.

Set the data heap size in bytes as recommended.

(4) set DataNode balanced bandwidth:

Select Scope > DataNode.

Expand the categories > performance category.

Configure the DataNode Balancing Bandwidth property to the bandwidth on disk and network. You can use a value below this value to minimize the impact of retirement on the cluster, but the tradeoff is that retirement will take longer.

Click Save changes to commit the changes.

(5) increase the replication work multiplier for each iteration to a larger number (the default is 2, but it is recommended to be 10):

Select Scope > NameNode.

Expand the categories > Advanced category.

Configure the copy work multiplier per iteration property to a value of 10, for example.

To apply this configuration attribute to other role groups as needed, edit the values of the appropriate role groups. See modify configuration properties using Cloudera Manager.

Click Save changes to commit the changes.

(6) increase the maximum number of replication threads and the maximum hard limit of replication threads:

Select Scope > NameNode.

Expand the categories > Advanced category.

Configure the hard limits for the maximum number of replication threads on DataNode and the number of replication threads on the DataNode attribute to 50 and 100, respectively. You can reduce the number of threads (or use default values) to minimize the impact of decommissioning on the cluster, but the tradeoff is that decommissioning will take longer.

To apply this configuration attribute to other role groups as needed, edit the values of the appropriate role groups. See modify configuration properties using Cloudera Manager.

Click Save changes to commit the changes.

(7) restart HDFS service.

For other tuning recommendations, see performance considerations.

The reference link is: https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_mc_decomm_host.html

6. Adjust HBase before retiring DataNode

To increase the speed of scrolling to restart the HBase service, set the Region Mover Threads property to a higher value. This increases the number of areas that can be moved in parallel, but puts extra pressure on HMaster. In most cases, Region Mover Threads should be set to 5 or lower. Restart the host

Only hosts that have been decommissioned with Cloudera Manager can be restarted.

7. Retired datanode

Performance consideration

(1) exiting the DataNode does not happen immediately because the process requires copying a potentially large number of blocks. During phase-out, the performance of the cluster may be affected.

Retirement is carried out in two steps:

Step 1: the committee status of DataNode is marked as decommissioned and data is copied from this node to other available nodes. The node is still decommissioned until all blocks are copied. You can view this status from NameNode Web UI. Go to the HDFS service and select Web UI > NameNode Web UI. When all data blocks are copied to another node, the node is marked as disabled.

Step 2: when all data blocks are copied to another node, the node is marked as disabled.

Retirement can affect performance in the following ways:

A there must be enough disk space on other active DataNode to copy data. After retirement, the remaining active DataNode has more blocks, so it may take more time to deactivate these DataNode in the future.

B will increase network traffic and disk I / O when replicating blocks.

C data balance and data location may be affected, which may cause performance degradation of any running or submitted jobs.

D deactivating a large number of DataNode at the same time degrades performance.

E if a small number of DataNode are to be eliminated, the speed at which data is read from these nodes limits decommissioned performance, because when reading blocks from DataNode, decommissioning maximizes network bandwidth and extends the bandwidth used for replicated blocks to other DataNode. Clusters. To avoid the performance impact in the cluster, Cloudera recommends that you deactivate only a few DataNode at the same time.

F you can reduce the bandwidth available to balance DataNode and the number of replication threads to reduce the performance impact of replication, but this will cause the decommissioning process to take longer to complete. See adjust the HDFS before retiring the DataNode.

/ / Cloudera recommends that you add DataNode and deactivate DataNode in smaller groups in parallel. For example, if the replication factor is 3, add two DataNode and deactivate both DataNode at the same time.

8. Excluding decommissioning performance

When retiring from DataNode, the following conditions also affect performance:

(open the file

The block cannot be relocated because there is not enough DataNode to satisfy the block placement policy. )

Open a file

Writes on DataNode do not involve NameNode. If there are blocks associated with an open file located on DataNode, they are not relocated until the file is closed. This usually happens when:

Clusters using HBase

Open the Flume file

Long-running task

To find and close open files: take the following 5 steps

Step one:

You can configure the location of this directory using the NameNode log directory property. By default, this directory is located at: / var/log/hadoop-hdfs/

Step 2:

Run the following command to verify that the log provides the required information:

Grep "Is current datanode" NAME | head

The sixth column of the log file shows the block ID, which should be related to DataNode decommissioning. Execute the following command to view the relevant log entries:

Grep "Is current datanode" NAME | awk'{print $6}'| sort-u > blocks.open

Step 3:

Run the following command to return the open file, its blocks, and a list of the locations of those blocks:

Hadoop fsck /-files-blocks-locations-openforwrite 2 > & 1 > openfiles.out

Step 4:

View the openfiles.out file created by the command for the block that displays blocks.open. Also verify that the DataNode IP address is correct.

Step 5:

Using the list of open files, perform the appropriate action to restart the process to close the file.

For example, primary compression closes all files in the HBase area.

The block cannot be relocated because there is not enough DataNode to satisfy the block placement policy.

For example, for a 10-node cluster, if you set mapred.submit.replication to the default value of 10 when you try to unconfigure a DataNode, it will be difficult to relocate the block associated with the map / reduce job. This condition results in errors in the NameNode log similar to the following:

Org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault: Not able to place enough replicas, still in need of 3 to reach 3

Use the following steps to find the number of files whose block replication policy is equal to or higher than the current cluster size:

The first step

Provide a list of open files, their blocks, and their locations by running the following command:

Hadoop fsck /-files-blocks-locations-openforwrite 2 > & 1 > openfiles.out

Step 2:

Run the following command to return a list of files with a given replication factor:

Grep repl= openfiles.out | awk'{print $NF}'| sort | uniq-c

For example, when the replication factor is 10:00, retire one:

Egrep-B4 "repl=10" openfiles.out | grep-v''| awk'/ ^\ / {print $1}'

Step three

Check the path and decide whether to reduce the replication factor for files or remove them from the cluster.

9. Two ways to delete the host

1. Completely delete the host from the Cloudera Manager.

2. Remove hosts from the cluster, but reserve them for other clusters managed by Cloudera Manager

Both methods deactivate hosts, delete roles, and delete managed service software, but retain data catalogs.

10 、

Maintenance mode allows you to disable alerts for hosts, services, roles, or the entire cluster. This is useful when you need to perform actions in the cluster (make configuration changes and restart various elements) and do not want to see alerts generated as a result of these actions.

Placing an entity in maintenance mode does not prevent events from being logged; it only suppresses alerts generated by those events. You can view the history of all events logged for these entities while they are in maintenance mode.

Reference link:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.