How to troubleshoot problems caused by incorrect migration of rocketmq 07/01 Update SLTechnology News&Howtos

How to troubleshoot problems caused by incorrect migration of rocketmq

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you how to troubleshoot the problems caused by rocketmq error migration, I hope you will gain something after reading this article, let's discuss it together!

Due to the aging of the test environment machine, a machine in the test environment rocketmq cluster was shut down (that machine has a namesrv, a master, and another slave). Later, the operation and maintenance staff told us that the cluster might go bad at any time, and suggested that the contents of the cluster be migrated. Later, we discussed it to a new test environment and migrated rocketmq for the first time. It was easy to think, which led to the problem troubleshooting caused by the rocketmq error migration.

Cause of the problem

Because only the rocketmq deployment package is copied during the operation, but the location of the configuration file is not in the rocketmq deployment package, this leads to some of the following problems.

Configuration like this, the path is specified separately, and the original deployment package is not together. How to migrate the rocketmq machine in the future? remember to copy these configuration-related files over and read them automatically when you start. Although this conclusion looks very simple now, when you see a problem from the topic block, you will think of it for a while. Let's take a look at the step-by-step bar.

Migration operation

My approach is like this: copy the installation package of the rocketmq deployment on the original broken machine to the new machine, then modify the hosts of the new machine and then command to start, and check that the cluster is set up with the command. Then I modified the hosts of another cluster in the cluster and changed the original machine pointing to the dying machine to a new machine address and then restarted it after a period of time.

If the original hosts is:

Now change it to:

Note: the only advantage of configuring hosts is that the next time you change the machine, you only need to modify the hosts file, and there is no need to modify the rocketmq configuration file.

Problem phenomenon

Phenomenon 1:

It should be:

It turns out to be:

Phenomenon 2:

The queue for topic should be:

In fact, some topic become:

Some topic queues become only:

When I first saw this phenomenon, it was very strange. Why did this happen?

Analysis.

The idea of adding a new machine is sorted out, and what on earth has been done? because there is no topics.json,subscriptionGroup.json and other information about the original configuration of the newly migrated machine, because the two machines have been restarted one after another.

If the original hosts is:

Now change it to:

Since the new machine has been started and does not have any topic information, the client (the business where the sender is located) is also restarted and connected to the namesrv of the new machine. Because there is no topic-related information on it, the sender business needs to send a message at this time. It just happened to be sent to the new machine to get the getTopicRouteInfo, but did not get the information. Go to the default topicTBW102. Because the default read and write queue is 4, and the other machine has the topic information, so it is 8, this causes one of the problems.

So here's what happened:

Another phenomenon is that the client does not send data operations, and the contents of the other machine are restarted. Because broker will regularly report to namesrv (only the original broker will report, but the new topic information will not), in fact, only one of the two namesrv has been registered (topic has only one broker information). If you send and get getTopicRouteInfo later, you can only get one. So the other one doesn't have any information.

That's what happened.

This phenomenon has been explained.

* * Note: * * due to the possibility of the operation of the test environment (and uncertainty, there are many test environments, I have no idea when many people will operate), it is only a personal guess, and it can also explain the above phenomenon. Welcome to discuss how there is doubt or what is wrong.

Deal with

Copy the corresponding configuration of the original fast-breaking machine to the new machine (according to reason, the data problem should also be copied, but the new one has already entered the data, and the copy is messy), restart, the phenomenon is normal.

Note: migration should be noted that instead of starting the service, you should copy some of the running configuration data and so on.

After reading this article, I believe you have a certain understanding of "how to troubleshoot the problems caused by wrong migration of rocketmq". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.