Record the hadoop cluster optimization of production environment and the missing handling of pid files 10/18 Update SLTechnology News&Howtos

Record the hadoop cluster optimization of production environment and the missing handling of pid files

2025-10-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. optimizing preparation

Optimization needs comprehensive analysis according to the actual situation.

1. Close the swap partition of the system (if not)

In Hadoop, if you use the system default settings, the swap partition will be used frequently, and the cluster will constantly issue warnings.

The user has complete control over the amount of data processed by each job and the various buffers used in each Task.

Echo "vm.swappiness = 0" > > / etc/sysctl.conf

Note: try not to use swap partitions, note that it is not disabled

2. Resource and configuration information

Datanode resources and configuration information of 2 namenode,5 stations

The service distribution table is as follows:

Software version:

Hadoop (hdfs+yarn) 2.7.3

Hbase 1.2.4

Commands for viewing CPU information:

View CPU model

# cat / proc/cpuinfo | grep name | cut-d:-f2 | uniq

View number of physical CPU

# cat / proc/cpuinfo | grep "physical id" | sort | uniq-c | wc-l

Check the number of core without physical CPU, that is, the number of cores

# cat / proc/cpuinfo | grep "cpu cores" | uniq

Check the number of logical CPU

# cat / proc/cpuinfo | grep "processor" | wc-l

Total number of CPU cores = number of physical CPU * number of cores per physical CPU

Total number of logical CPU = number of physical CPU * number of cores per physical CPU * number of hyperthreads

Resources:

16GB of memory

CPU8 (8 physical CPU, single core, single thread)

3. Dfs.datanode.max.xcievers (dfs.datanode.max.transfer.threads)

These two are one parameter, except that the previous one is the parameter before hadoop1.0, which indicates the number of threads responsible for file operations on the datanode. If there are too many files to be processed, and if this parameter is set too low, some files will not be processed, and the following exception will be reported:

ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration (192.168.10.103 ERROR org.apache.hadoop.dfs.DataNode 50010): DataXceiver: java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256

All file operations in the linux system are bound to a socket, which can be further thought of as a thread. This parameter specifies the number of such threads. In datanode, there is a special thread group to maintain these threads, and a daemon thread to monitor the volume of this thread group, which is responsible for monitoring whether the number of threads has reached online, and throws an exception if it exceeds it. Because if there are too many such threads, the system memory will burst.

For the parameter dfs.datanode.max.transfer.threads, also according to the existing configuration 8192

Dfs.datanode.max.transfer.threads

8192

Default value: 4096

4 、 dfs.namenode.handler.count

The number of threads used by NameNode to process RPC requests from DataNode

NameNode has a worker thread pool for handling remote procedure calls from clients and calls from cluster daemons. A larger number of handlers means a larger pool to handle concurrent heartbeats from different DataNode and client-side concurrent metadata operations. For large clusters or clusters with a large number of clients, you usually need to increase the default value of the parameter dfs.namenode.handler.count by 10. The general principle for setting this value is to set it to the natural logarithm of the cluster size multiplied by 20, that is, 20 logN, N is the cluster size. If this value is set too low, the obvious situation is that DataNode always times out when connecting to NameNode or the connection is rejected, but when the remote procedure call queue of NameNode is very large, the delay of remote procedure call will increase.

The cluster size is 7 and the natural logarithm of 7 is about 2, so the configuration here is 20*log7 = 40.

Dfs.namenode.handler.count

forty

Default value: 10

5 、 dfs.datanode.handler.count

Number of threads used by DataNode to connect RPC requests to NameNode

Set to 20

Dfs.datanode.handler.count

twenty

Default value: 10

II. Related preparations for PID

The pid files for hadoop and hbase are placed under / tmp by default, as you can see from the startup and shutdown script, but this directory is cleaned regularly. This creates a problem. When the service is shut down again, the loss of the service pid file will result in a situation in which the jps does not display the service, but the management page of the web can be accessed normally. My solution is to manually generate the pid file that is short of water, then modify the configuration file and restart the service.

1. Pid configuration of NameNode, DataNode and SecondaryNameNode

Hadoop-env.sh

Export HADOOP_PID_DIR=/data/hadoop_data/pids

Export HADOOP_SECURE_DN_PID_DIR=$ {HADOOP_PID_DIR}

Pid configuration of JobHistory

Mapred-env.sh

Export HADOOP_MAPRED_PID_DIR=/data/hadoop_data/pids

Pid configuration for NodeManager and ResourceManager

Yarn-env.sh

Export YARN_PID_DIR=/data/hadoop_data/pids

Pid configuration for HMaster and HRegionServer

Hbase-env.sh

Export HBASE_PID_DIR=/data/hadoop_data/pids

The command rules of the PID file can be found in the script

I do not have all the pid file paths configured here. What I need to configure are MapReduce and yarn.

2. The service startup of each node in the cluster and the existence of PID files

192.168.10.101:

$jps

194921 HMaster

194352 DFSZKFailoverController

539317 JobHistoryServer

193972 NameNode

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-namenode.pid

Hadoop-hduser-zkfc.pid

Hbase-hduser-master.pid

192.168.10.102:

$jps

371963 DFSZKFailoverController

371811 NameNode

372121 HMaster

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-namenode.pid

Hadoop-hduser-zkfc.pid

Hbase-hduser-master.pid

192.168.10.103:

$jps

500043 JournalNode

500164 NodeManager

522618 HRegionServer

499932 DataNode

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-datanode.pid

Hadoop-hduser-journalnode.pid

Hbase-hduser-regionserver.pid

192.168.10.104:

$jps

234784 NodeManager

234636 JournalNode

235070 HRegionServer

234525 DataNode

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-datanode.pid

Hadoop-hduser-journalnode.pid

Hbase-hduser-regionserver.pid

192.168.10.105:

$jps

310371 HRegionServer

48404 NodeManager

48285 JournalNode

48174 DataNode

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-datanode.pid

Hadoop-hduser-journalnode.pid

Hbase-hduser-regionserver.pid

192.168.10.106:

$jps

100855 HRegionServer

435319 DataNode

435456 NodeManager

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-datanode.pid

Hbase-hduser-regionserver.pid

192.168.10.107:

$jps

410010 NodeManager

484955 HRegionServer

409847 DataNode

Existing pid files:

Directory / data/hadoop_data/pids

Hadoop-hduser-datanode.pid

Hbase-hduser-regionserver.pid

Third, the following are the specific operating steps

All of the following operations are performed under the hadoop cluster service management user hduser

1. Generate PID file

Create the corresponding PID file according to the actual situation

The jps command can see which services are running, and then look at the pid file directory / data/hadoop_data/pids. If the service starts but does not have a corresponding pid file, you need to create a pid file for that service.

Echo PID > / tmp/mapred-hduser-historyserver.pid

/ tmp/yarn-hduser-resourcemanager.pid

/ tmp/yarn-hduser-nodemanager.pid

2. Stop the service

What needs to be said here is the startup and shutdown order of the hadoop+hbase+zookeeper cluster service.

Startup sequence

Zookeepeer- > hadoop- > hbase

Stop order

Hbase- > hadoop- > zookeepeer

Stop the hbase cluster

$cd / data/hbase/bin

$. / stop-hbase.sh

Stop the hadoop cluster

$cd / data/hadoop/sbin

$. / stop-all.sh

Stop the MapReduce History Service

$cd / data/hbase/bin

$. / mr-jobhistory-daemon.sh stop historyserver

3. Modify the configuration file

Back up the configuration file to be modified

Modify the configuration file

Vim / data/hadoop/etc/hadoop/mapred-env.sh

Add:

Export HADOOP_MAPRED_PID_DIR=/data/hadoop_data/pids

Vim / data/hadoop/etc/hadoop/yarn-env.sh

Add:

Export YARN_PID_DIR=/data/hadoop_data/pids

Vim / data1/hadoop/etc/hadoop/hdfs-site.xml

The modifications are as follows:

Dfs.datanode.max.transfer.threads

8192

Add as follows:

Dfs.namenode.handler.count

forty

Dfs.datanode.handler.count

twenty

Vim / data/hbase/conf/hbase-env.sh

Add as follows:

Export HADOOP_HOME=/data/hadoop

Vim / data/hbase/conf/hbase-site.xml

The modifications are as follows:

Hbase.rootdir

Hdfs://masters/hbase

Copy the configuration file to another node in the appropriate configuration file directory

$for ip in 102 103 104 105 106 107 Shido scp / data/hadoop/etc/hadoop/mapred-env.sh 192.168.10

$for ip in 102 102 104 105 106 107 Shido scp / data/hadoop/etc/hadoop/yarn-env.sh 192.168.10

$for ip in 102 102 104 105 106 107 Shido scp / data/hadoop/etc/hadoop/hdfs-site.xml 192.168.10

$for ip in 102 102 104 105 106 107 Shido scp / data/hbase/conf/hbase-env.sh 192.168.10

$for ip in 102 102 104 105 106 107 Shido scp / data/hbase/conf/hbase-site.xml 192.168.10

Remove unnecessary log4j jar packages from HBase

Cd / data/hbase/lib

Mv slf4j-log4j12-1.7.5.jar slf4j-log4j12-1.7.5.jar.bak

4. Start the service

Start the hadoop cluster

$cd / data/hadoop/sbin

$. / start-all.sh

Start the hbase cluster

$cd / data/hbase/bin

$. / start-hbase.sh

Start the MapReduce History Service

$cd / data/hbase/bin

$. / mr-jobhistory-daemon.sh start historyserver

5. Verification

View the pid file

192.168.10.101

$cd / data/hadoop_data/pids

$ll

Total 24

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-namenode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-zkfc.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-master.pid

-rw-r--r-- 1 hduser hadoop 33 Jun 6 22:49 hbase-hduser-master.znode

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 mapred-hduser-historyserver.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 yarn-hduser-resourcemanager.pid

192.168.10.102

$ll

Total 12

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-namenode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-zkfc.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-master.pid

192.168.10.103

$ll

Total 20

-rw-r--r-- 1 hduser hadoop 6 Jun 6 22:48 hadoop-hduser-datanode.pid

-rw-r--r-- 1 hduser hadoop 6 Jun 6 22:48 hadoop-hduser-journalnode.pid

-rw-r--r-- 1 hduser hadoop 6 Jun 6 22:49 hbase-hduser-regionserver.pid

-rw-r--r-- 1 hduser hadoop 43 Jun 6 22:49 hbase-hduser-regionserver.znode

-rw-r--r-- 1 hduser hadoop 6 Jun 6 22:48 yarn-hduser-nodemanager.pid

192.168.10.104

$ll

Total 20

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-datanode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-journalnode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-regionserver.pid

-rw-r--r-- 1 hduser hadoop 43 Jun 6 22:49 hbase-hduser-regionserver.znode

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 yarn-hduser-nodemanager.pid

192.168.10.105

$ll

Total 20

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-datanode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-journalnode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-regionserver.pid

-rw-r--r-- 1 hduser hadoop 43 Jun 6 22:49 hbase-hduser-regionserver.znode

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 yarn-hduser-nodemanager.pid

192.168.10.106

$ll

Total 16

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-datanode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-regionserver.pid

-rw-r--r-- 1 hduser hadoop 43 Jun 6 22:49 hbase-hduser-regionserver.znode

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 yarn-hduser-nodemanager.pid

192.168.10.107

$ll

Total 16

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 hadoop-hduser-datanode.pid

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:49 hbase-hduser-regionserver.pid

-rw-r--r-- 1 hduser hadoop 43 Jun 6 22:49 hbase-hduser-regionserver.znode

-rw-r--r-- 1 hduser hadoop 7 Jun 6 22:48 yarn-hduser-nodemanager.pid

Check to see if the call is normal

Note:

Item 1: if there is a script or service for monitoring the service in the production environment, you need to stop the monitoring of the service first, so as to prevent the monitoring from being triggered when the service stops, and then automatically restart the service.

Item 2: be sure to synchronize the configuration files to each node (if the configuration files of each node are consistent)

Reference:

Official website documentation:

Http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html

Official website configuration reference:

Http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Reference article:

Https://blog.csdn.net/qq_26091271/article/details/50411383

Https://www.cnblogs.com/hanganglin/p/4563716.html

Https://blog.csdn.net/odailidong/article/details/79656188

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.