In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to deal with hbase failures", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to deal with hbase failures" this article.
I. failure phenomenon
1. First of all, regionserver frequently exposes two types of errors:
Wal.FSHLog: Error syncing, request close of WAL:
And an error occurs:
Org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 293 actions: NotServingRegionException: 42 times
And regionserver dead failure:
Region server exitingjava.lang.RuntimeException: HRegionServer Aborted
Since the reason why regionserver hangs frequently cannot be solved by optimizing hbase itself, the analysis must be extended to hbase-related processes. Closely related to hbase is zookeeper. We analyze the log of zk in detail. For example, regionserver has a regionserver dead error message at 03:03:17, so we analyze the log of zk before and after this time period. From the log, the timeout for regionserver and zk is 40 seconds, "the sessions negotiated with zookeeper from dead regionserver were of 40s". Then check the gc length of regionserver, which is indeed more than 40 seconds.
The gc time is too long, the maxSessionTimeout time is more than 40 seconds, which makes zk think that regionserver has hung up dead.
Zk returns dead region to master,master and puts other regionserver in charge of regions of dead regionserver.
Other regionserver will read the wal to recover the regions, and delete the wal file after processing the wal.
After the gc of dead regionserver is completed and the service is restored, the wal cannot be found, and the error in the screenshot above (wal.FSHLog: Error syncing, request close of WAL) has been generated.
Dead regionserver learned from zk that he was dead, so he shut himself down (Region server exiting,java.lang.RuntimeException: HRegionServer Aborted)
4. Ultimate reason: tickTime timeout
After the above analysis, it is the maxSessionTimeout that has a gc time of more than 40 seconds that causes the regionserver to hang up. However, we were puzzled because we set the zookeeper.session.timeout timeout to be 240 seconds, well over 40 seconds. It's very strange!
After asking for help from the hbase community and similar problems with google, we finally found the cause (for more links, please refer to https://superuser.blog/hbase-dead-regionserver/):
It turns out that our HBase does not set tickTime. In the end, the maximum timeout of the session between hbase and zk is not determined by the zookeeper.session.timeout parameter, but by the maxSessionTimeout of zk. Zk adjusts the final timeout value, minSessionTimeout=2*tickTime, maxSessionTimeout=20*tickTime, based on the two parameters minSessionTimeout and maxSessionTimeout. For our big data cluster, the tickTime of zk is set to the default (2000ms) of 2 seconds, so the final timeout for hbase and zk is 40 seconds.
After adjusting the tickTime of zk to 6 seconds and the corresponding zookeeper.session.timeout to 120 seconds, the problem that regionserver hangs frequently is finally solved.
The above is all the contents of the article "how to deal with hbase failures". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.