In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "the method of Zookeeper bug investigation". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "the method of Zookeeper bug investigation".
cause
One day, a colleague suddenly said that zk client could not connect to server. Considering that the business code had not changed recently, it was suspected that it was caused by what the operation and maintenance students had done. They hurriedly contacted the operation and maintenance students, and they did make changes recently. In order to avoid expanding the scope of influence, let the operation and maintenance students roll back the changes, which can be accessed normally after the rollback.
Recurrence problem
After asking the operation and maintenance students, we get the change process: because there is a performance risk on a server in the zk cluster, we need to change to a new instance. So the OPS first adds the new machine to the zk cluster, modifies the configuration on the old server and restarts one by one. After the restart, the role of the new zk is leader. When the zk status is normal, the OPS student thinks that the change is complete.
The unexpected result is to use the mntr command to check, all the machine status is normal, but zk client can not be accessed, as soon as the access is stuck, the problem can be stably reproduced in the test environment.
Problem troubleshooting
1 guess that the zk port did not listen successfully. The login server uses netstat to check that the three ports opened by server are all normal, and the telnet test is also used to connect.
2 guess that the number of synchronous nodes is less than half, or that follower is not connected to leader to trigger re-election, but it is quickly excluded, because it is said that using the mntr command to check the status of nodes is normal, and the corresponding log records cannot be found in the log.
3 We use stat to observe the connection of server, and run zk client to find that server received the request from client, but did not reply to the message. It seems that the reason is that zk server did not process the request from client.
Follow up here, you should enter the source code, because you are not familiar with the zk source code, consulted a boss, suggested that we see zk request processing class CommitProcessor.
We found the reason at CommitProcessor. The code is as follows:
@ Override public void start () {... If (workerPool = = null) {workerPool = new WorkerService ("CommitProcWork", numWorkerThreads, true);}...} public void shutdown () {LOG.info ("Shutting down"); halt (); if (workerPool! = null) {workerPool.join (workerShutdownTimeoutMS);} if (nextProcessor! = null) {nextProcessor.shutdown ();}} replication
Calling workerPool.join in shutdown actually turned off the switch for request processing, but did not set workerPool to null. In the start method, the WorkerService is created based on the workerPool==null and the request begins to be processed.
Revalidate the solution after modification.
At this point, I believe that everyone has a deeper understanding of the "Zookeeper bug troubleshooting method", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.