In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
The scene that caused the Hbase to die
HMaster
The scenario in which HMaster will stop with an exception (execute abort ()) is as follows:
Master service stoppage caused by 1.zk exception is the most common scenario, involving operations including, but not limited to, the following:
A) Zk link timeout, which is configured through zookeeper.session.timeout. The default is 3 minutes. If fail.fast.expired.active.master is configured with a value of false (default is false), it will not immediately abort, but will attempt to restore the expired session of zk.
B) after opening the region, you need to delete the opened node from the zk. If the zk has the node, but the deletion fails
C) during the split region process, when the split node is deleted from zk
D) when the Master node changes
E) when creating a unassigned node from zk
F) when taking off the regoin of disabled, delete the region of disabled from zk if a zk exception occurs
G) there are also many exceptions that occur when manipulating zk nodes.
two。 In assign, if region is set to offlined state, but the previous state of region is not closed or offlined
3. In assign, if you can't get from .meta. Read region information from the table
4. When adding a new hbase cluster to a running hbase cluster, if the / hbase/unassigned node of the zk has no data
5. If an uncaught exception occurs when the region is allocated in bulk using the thread pool, the implementation is as follows:
6. An exception occurred while starting the service thread for master
7. When you check the hbase log path in hdfs, when you find the server of dead, you need to read the log from hdfs. If there is an io exception, you need to check the hdfs file system. If the fsOk status is true, but there is an io exception when checking through the FSUtils utility class.
8. When verifying and allocating the region of-ROOT-, if zk is abnormal, or other exceptions (other exceptions will be retried 10 times), such as "- ROOT- is onlined on the dead server".
HRegionServer
The scenario in which HRegionServer stops (executing abort ()) service abnormally is as follows:
1. If an IOException exception occurs while reading and writing hdfs, a file system check (checkFileSystem) of hdfs is initiated at this time.
An uncaught exception occurred in the service thread of 2.Regionserver
3. An exception occurred while starting HRegionServer
4. An exception occurred during a HLog rollback
5. During flush memstore, if persistence fails, RS will be restarted, and the contents of hlog will be reloaded into memstore during restart.
6. A zk exception occurs, including, but not limited to, the following scenarios:
A) Zk link timeout, which is configured through zookeeper.session.timeout. The default is 3 minutes, which is different from master, if the zk operation will not be retried.
B) KeeperException exception occurred while starting HRegionServer
C) during the split operation, if an exception occurs, the rollback operation will be carried out. During the rollback process, the spliting status of the region needs to be deleted from the zk. If an exception occurs in KeeperException or other rollback operations during deletion,
D) A KeeperException exception occurred while opening region
E) during hbase cluster replication, many operations that interact with zk will cause abort when there are KeeperException exceptions.
7. In close region, if an exception occurs, such as an unsuccessful flush memstore
When 8.Flush memstore, if HLog finds that the region is already in flush, it will forcibly terminate the JVM, using the Runtime.getRuntime () .halt (1) method, which will not execute the normal exit close hook, so that all the region of flush RS will not be migrated, and only after waiting for the session of ZK to time out will master find that the RS is not available and do the migration work.
Summary
There are many possibilities for Hbase to hang up, mainly caused by problems with zk or hdfs, so the availability of zk and hdfs is extremely important for hbase, about zk:
If 1.zk stops service, it will cause master and rs to hang up in many cases, and the hbase cluster will basically lose the ability to serve. Therefore, zk must be stable and reliable. When client has established a link with rs, then zk hangs, and rs can still provide services if decimal interactions with zk such as split fail to trigger abort () of rs.
two。 If rs/master gc for a long time or changes the server time, the session timeout of zk will cause rs/master to stop service. At present, there have been two accidents in which hbase stopped service due to changes in server time.
3. Do not easily artificially change the hbase node data of zk. Master/rs will rely on zk data for many operations. If it is found that it does not meet expectations, it may cause master/rs to stop service, especially master.
Master knows whether RS is available through ZK. In general, RS will exit normally when it stops service, and the node of / hbase/rs/$regionserver will be deleted from ZK when it exits normally, and Master will monitor the deletion of this node, so that the RS responsible for region will be reassigned faster (the speed depends on all region shutdown time), if it is forced to exit. For example, when kill-9 or Article 8 of HRegionServer hang up, the nodes of RS in ZK will be deleted only when waiting for the session of ZK to time out (RS uses CreateMode.EPHEMERAL mode when adding nodes to ZK, and the nodes created in this mode will be automatically deleted when session is closed), then Master will re-assign.
The process of Kill RS also exits normally (kill-9 cannot be used to force exit). RS uses the addShutdownHook method of Runtime to register the jvm close hook, and the exit logic of RS is executed in the close hook. In fact, the stop RS of hbase-daemon.sh uses kill.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.