Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of automatic exit troubleshooting of RocketMQ process

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

RocketMQ process automatically quit the sample analysis of troubleshooting, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

1. Background

A RocketMQ cluster consists of four masters and four slaves, and suddenly three of the servers are "unexpectedly" offline at the same time. The monitoring shows as follows:

Check the monitoring graphics of the three machines in turn, and the timestamp is almost a perfect match, isn't it? 2. Fault analysis

If there is a problem, do not say a word, restart each server immediately, restore the cluster as soon as possible, reduce the impact on the business, and then start to analyze the log.

The Java process exits automatically (rocketmq itself is a java process). One of the most common problems is due to a memory overflow or a memory leak that causes the process to send Crash and so on. Because it is not configured in our startup parameters

-XX:+HeapDumpOnOutOfMemoryError

-XX:HeapDumpPath=/opt/jvmdump

These two parameters cannot be directly based on whether the dump file is generated. Then check the GC log, download the GC log locally, and then use an online gc log analysis tool: https://gceasy.io/, which will be graphically displayed after uploading the gc log, as shown below:

It is normal to find garbage collection.

Since the Java process did not exit due to problems such as memory overflows, what is the reason? Let's take a look at the broker log at that point. The key log screenshots are as follows:

It is found that shutdownHook is printed out in the broker log, indicating that the exit hook function at startup was executed before the process exited, indicating that broker stopped normally, and it could not be a kill-9 command. It must have been shown that the shutodown or kill command had been executed, so immediately use the history command to check the history command, but did not execute the command at the specified time. After switching to the root command, you also used the history command, and found no clue.

However, I always believe that the manual execution of the kill command led to the exit of the process. After searching on the Internet, I learned that you can check the system command calls by consulting the system log / var/log/messages, so download the log file locally, start searching for the kill keyword, and find the following log:

It was found that the last kill command was a little after 1: 00 a. M. on the 25th, stopped the rocketmq cluster and restarted it using bin/mqbroker-c conf/broker-b.conf &.

There is a problem with this command. Without using nohup, if the session fails, the process will be exited. To verify, let's check the log when the process exits:

It is found that there is indeed a log related to Removed at the point of failure.

The basic analysis of the cause of the failure is in place, and the operation and maintenance staff did not use nohup to start at startup, so immediately check the mode of the newly started cluster and restart the newly started Broker.

RocketMQ elegant restart Tips:

First, turn off the write permission for broker with the following command:

Bin/mqadmin updateBrokerConfig-b 192.168.x.x:10911-n 192.168.x.x:9876-k brokerPermission-v 4

Check the write TPS of the broker through rocketmq-console, and when the write TPS is reduced to 0, use kill pid to close the rocketmq process. Warm reminder: after the write permission of broker is turned off, non-sequential messages will not be rejected immediately, but messages will not be sent to the broker after the client routing information is updated, so this process needs to wait.

Start rocketmq

Nohup bin/mqbroker-c conf/broker-a.conf / dev/null 2 > & 1 &

Restore write permissions for this node

Bin/mqadmin updateBrokerConfig-b 192.168.x.x:10911-n 192.168.x.x:9876-k brokerPermission-v 6 is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report