A brief Analysis of HMaster start-up process 07/09 Update SLTechnology News&Howtos

A brief Analysis of HMaster start-up process

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Many details are inconvenient to be written in too much detail in this article, such as follow-up detail analysis.

0. HMaster is initialized by HBaseCommandLine first

0.1 check whether IP binding (https://issues.apache.org/jira/browse/HBASE-8148) has been performed to obtain the address

0.2 create a RPCServer through HbaseRPC

0.2.1 first get the RPCEngine (WritableRPCEngine) and initialize the RPCServer through it (Server:HBaseServer:RPCServer)

0.2.1.1 initialize CallQueue (ipc.server.max.queue.size: backward compatibility, ipc.server.max.callqueue.length, default is handler* DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER) and ReplicationQueue (ipc.server.max.callqueue.size, default is 1024 × 1024 × 1024), as well as SizeBasedThrottler (threshold=ipc.server.max.callqueue.size), Listener and Responder, etc.

0.2.1.1.1 initialize Responder and create a selector with a frequency of purgeTimeout (default is 2 × DEFAULT_HBASE_RPC_TIMEOUT)

0.2.1.1.2 initialize Listener, obtain the listening address and bind to ServerSocket, where backlog length= ipc.server.listen.queue.size, initialize a threadpool with the size of ipc.server.read.threadpool.size, initialize and start ipc.server.read.threadpool.size Reader, and finally register the connection event

0.3 start the RPCServer that has been initialized

0.3.1 launch Responder, take response from responseQueue and write back, where Responder has an optimization similar to HADOOP RPC that responds immediately when responseQueue has only one value.

0.3.2 start Listener: check ConnectionList every 10 seconds and the number of connections exceeds ipc.client.idlethreshold. If the timeout is 2 × ipc.client.connection.maxidletime, clean up the ipc.client.kill.max (default is 10).

0.3.3 start Handler, take Call from CallQueue to call rpcserver, and send the return value to responder for processing

0.4 initialize HMaster into ZookeeperWatcher

0.4.1 initialize ZookeeperWatcher: through Zkutil or a Zookeeper Client object, where sessionTimeout=zookeeper.session.timeout (default 180s), maxretry=zookeeper.recovery.retry, and retryIntervalMillis=zookeeper.recovery.retry.intervalmill

0.4.1.1 enter the Zookeeper link (next decomposition)

0.5 initialize Health Check Thread, check frequency is hbase.node.health.script.frequency default 10 seconds

1.HMaster executes startup to start the process.

1.1 call becomeActiveMaster and enter the blocking state until Active

1.1.1 initialize ActiveMasterManager:ZookeeperListener

1.1.2 register zookeeperWatcher with ActiveMasterManager for snooping

1.1.3 stallIfBackupMastes skips the table

1.1.4 initialize ClusterStatusTracker:ZookeeperNodeTracker, start it, and register HMaster with ClusterStatusTracker

1.1.5 blockUntilBecomingActiveMaster:Add a ZNode for ourselves in the backup master directory since we are notthe active master.If we become the active master later, ActiveMasterManagerwill delete this node explicitly. If wecrash before then, ZooKeeper will delete this node for us since it isephemeral.

Call finishInitialization to enter the initialization completion phase

1.2.1 initialize filesystemManager:MasterFileSystem

1.2.1.1 initialize SplitLogManager:ZookeeperListener if hbase.master.distributed.log.splitting is enabled

1.2.1.1.1

1.2.1.2 create initialization directory: check the existence of rootdir, check the existence of tempdir and clean up, create oldlogdir

1.2.2 initialize FSTableDescriptors- > tableDescriptor

1.2.3 initialize ExecutorService

1.2.4 initialize ServerManager: where a HConnection is obtained through HConnectionManager, where the size of the connection pool is hbase.zookeeper.properties.maxClientCnXns (default 300) + 1

1.2.5 initialize all ZK-based tracker:

1.2.5.1 initializing CatalogTracker

1.2.5.1.1 get a HConnection

1.2.5.1.2 initialize RootRegionTracker:ZookeeperNodeTracker (rootServerZnode)

1.2.5.1.3 initialize MetaRegionTracker:ZookeeperNodeTracker (assignmentnode/first_meta_region)

1.2.5.2 start CalalogTracker

1.2.5.2.1 start RootRegionTracker: start track RR

1.2.5.2.2 start MetaRegionTracker: start trackMR

1.2.5.3 obtain balancer instance through LoadBalancerFactory

1.2.5.4 initialize AssginmentManager and manage the allocation of region: this includes initializing timeoutMonitor (hbase.master.assignment.timeoutmonitor.period defaults to 10sdirection hbase.master.signment.timeoutmonitor.timeout default 30min) and timerUpdater (hbase.master.assignment.timerupdater.period defaults to 10s)

1.2.5.5 register zookeeperWatcher with assginmentManager and add it to the first place of ListenerList

1.2.5.6 initialize RegionServerTracker...

1.2.5.7 start RegionServerTracker...

1.2.5.8 initialize DrainingServerTracker...

1.2.5.9 start DrainingServerTracker...

1.2.5.10 initialize SnapshotManager...

1.2.7 initialize MasterCoprocessorHost

1.2.8 start the service thread: including MASTER_OPEN_REGION (hbase.master.executor.openregion.threads,5), MASTER_CLOSE_REGION (hbase.master.executor.closeregion.threads,5), MASTER_SERVER_OPERATIONS (hbase.master.executor.serverops.threads,3), MASTER_META_SERVER_OPERATIONS (hbase.master.executor.serverops.threads,5), MASTER_TABLE_OPERATIONS;, and initialize and run LogCleaner,HFileCleaner, finally start HealCheckChore, and RPCServer starts to accept requests

1.2.9 wait for RS status report: wait until the following three conditions are met:

A.themaster is stopped

B.theroomhbase.master.wait.on.regionservers.maxtostart' number of regionservers is reached

C.the 'hbase.master.wait.on.regionservers.mintostart' is reached AND

There have been no new region serverin for 'hbase.master.wait.on.regionservers.interval defaults to 1.5s' time AND

The'hbase.master.wait.on.regionservers.timeout default 4.5s'is reached

1.2.10 check which RS is not registered with ZK: register the launched RS and record it to serverManager

1.2.11 start AssignManager: start TimeoutMonitor

1.2.12 perform a splitlog operation: perform a splitlog operation by MasterFileSystem, scan the hlogdir to see if the regionserver to which it belongs is online, if not online, add to the deadWorkers list of splitlogManager and create a znode under the splitlog path for all hlog in Zk, wait for other RegionServer SplitlogWorker to get the task and process it (see the next RegionServer startup process for details). If hbase.master.distributed.log.splitting is closed, it will be handled by HMaster, which is not listed here.

1.2.13 assign ROOT and MATA region: check-ROOT- and .meta. Whether it has been assigned, if not, it will be assigned by AssignmentManager:

1.2.13.1

1.2.14 enable shutdownHandler: ServerManager checks the expiration of deadNotExpiredServers, processes expiredServer and submits the shutdown process to ExecutorService

1.2.15 AssginmentManager for JoinCluster: restore the tables of InDisablingState and EnabingState

1.2.16 fix sub-region

1.2.17 start Balancer and let Chore run it every 300s, and run it in a single thread: when Region is in transition or RS is offline, balance is not performed

1.2.18 startCatalogJanitorChore start

1.2.16 execute postCP post Master startup

1.3 start Stop Check Thread and check once a second

Over.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.