Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the pits encountered after the theme expansion and slicing of RocketMQ production environment

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to solve the problems encountered after the theme of RocketMQ production environment is expanded. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Review of the case 1.1 status of the cluster

The cluster information is as follows:

For example, the routing information of the business principal name topic_dw_test_by_order_01 is shown in the figure:

Current consumer information:

The configuration information of broker is as follows: brokerClusterName = DefaultCluster

BrokerName = broker-a

BrokerId = 0

DeleteWhen = 04

FileReservedTime = 48

BrokerRole = ASYNC_MASTER

FlushDiskType = ASYNC_FLUSH

BrokerIP1=192.168.0.220

BrokerIP2-192.168.0.220

NamesrvAddr=192.168.0.221:9876;192.168.0.220:9876

StorePathRootDir=/opt/application/rocketmq-all-4.5.2-bin-release/store

StorePathCommitLog=/opt/application/rocketmq-all-4.5.2-bin-release/store/commitlog

AutoCreateTopicEnable=false

AutoCreateSubscriptionGroup=false

Note: the company has strict control over topic and consumer groups, and project teams need to apply to operation and maintenance personnel when they need to use them, so broker clusters do not allow automatic creation of themes and consumer groups.

Due to the steady increase in the volume of business, the project team feels that the number of queues on this topic is too small, which is not conducive to increasing consumers to improve their spending power, so they put forward the need to increase the queue to the operation and maintenance personnel.

1.2.The online expansion queue of RocketMQ

Through the company's self-developed message operation and maintenance platform, OPS directly expands the capacity of topic by specifying a cluster. The underlying layer of the OPS platform actually uses the updateTopic command provided by RocketMQ, which is described as follows:

The picture comes from "inside RocketMQ Technology".

You can see from the figure above that you can use the-c command to specify that queues are created on all broker in the cluster. In this example, the number of queues is set from 4 to 8, as follows: sh. / mqadmin upateTopic-n 192.168.0.220 DefaultCluster 9876-c DefaultCluster-t topic_dw_test_by_order_01-r 8-w 8

The execution effect is shown in the figure, indicating that the update is successful.

Let's look at the effect of the command execution from rocketmq-console:

As you can see from the figure above, the number of queues for topics has been expanded to 8, and queues have been created on both broker of the cluster. 1.3 message delivery

We can see from RocketMQ series that RocketMQ supports online topic online expansion mechanism, so there is no need to restart message sender and message consumer. Over time, we can see that all queues of topic participate in the message load, as shown in the figure:

We can clearly see that all 16 queues (8 queues for each broker) participated in the sending of messages, and the operation and maintenance brother happily completed the expansion of topic. 2. Problem exposure

The topic was subscribed by five consumer groups and was suddenly notified that two consumer groups reported that some of the messages in the queue were not consumed, resulting in the downstream system not being processed in time and attracting the attention of users.

3. Problem analysis.

At that time, when the project team submitted to the message group, my first reaction was to look at the consumer queue and open the consumption of the topic, as shown in the figure:

It is found that there is no backlog in the number of queues. Remarks (since the production is 4 masters and 4 slaves, there are 8 queues on each broker, so there are a total of 32 queues). At that time, due to the urgency, we did not find this interface the first time, but there was only one consumer, feeling that there was no backlog of messages, and because of the same cluster, there was no problem with other consumer groups, only two consumer groups had problems, suspected to be the problem of application, so we restarted. Print thread stack and other methods.

Zhuge Liang: in fact, the completion was wrong. Why do you say so? Because the project team (business side) has informed that part of the business has not been processed, indicating that there must be a backlog of messages in the queue, when the judgment made according to their own knowledge combined with the monitoring page seen conflicts with the feedback of the business side, there must be something wrong with your own judgment.

Just when we were "in full swing" that there was a problem with the project, another team member put forward a new point of view. When he got feedback from the business side, he learned that the same topic was subscribed by five consumer groups, and only two of them had problems, so he used rocketmq-console to find the difference between the two, to find the difference, to find the rules, and to solve the problem.

Through comparison, he found that only two clients in the faulty consumer group were consuming (usually 4-node consumption in the production environment), while in the non-problematic consumer group, only four processes were dealing with it, that is, the phenomenon was found: the faulty consumer group did not participate in the consumption. As the figure above shows, only one of the processes is processing eight queues, and the other eight queues are not consuming.

So now we want to analyze why there are 16 queues in topic, but there is only one consumer queue spending and the other consumer not doing anything.

First of all, according to the RocketMQ message queue load mechanism, there are two consumers, only one consumer is consuming, and one obvious feature is that only the queues on broker-an are consuming, and none of the queues on broker-b are consuming. So are there any rules for these two?

While wondering why this phenomenon occurs, another colleague on the team is wondering whether the queues on the broker-b (corresponding to our production environment are broker-c and broker-d) are not consumed, or are these queues on the new expanded machines? Is it true that the subscription relationship was not created on the new cluster during the expansion?

After raising the question, we began to verify the conjecture. By referring to the time when broker-c and broker-d (corresponding to our production environment) were created in our system from 2018 to July, we basically came to the conclusion that no subscription message was created on the new cluster during the expansion, so the message could not be consumed.

Then, the OPS will immediately create a subscription group, as shown in the figure:

After creating a consumer group, when you check the consumption of topic, another consumer group begins to process messages, as shown in the following figure:

4. Review of the problems

Potential cause: the DefaultCluster cluster has undergone a cluster expansion, adding an additional broker server (broker-b) from the original message server (broker-a), but the original topics and consumer groups on broker-a have not been expanded to broker-b servers.

Trigger reason: increase the number of cluster queues from 4 to 8 after receiving the expansion requirements of the project team, so that the topic will have 8 queues in both an and b of the cluster, but Broker does not allow the automatic creation of consumer groups (subscription relationships), and consumers cannot pull messages from the queues on broker-b, resulting in the accumulation of messages on broker-b queues and cannot be consumed.

Solution: the operation and maintenance staff create the corresponding subscription message on the broker-b through commands, and the problem is solved.

Lesson: when you expand your cluster, you need to synchronize the topic.json and subscriptionGroup.json files on the cluster.

According to RocketMQ theory, when a consumer initiates a message pull request to Broker, if the subscription message for this consumer group does not exist on broker, if automatic creation is not allowed (autoCreateSubscriptionGroup is set to false), and the default is true, the message will not be returned to the client. The code is as follows:

After the problem was solved, the team members also shared his approach to troubleshooting the problem: finding the pattern of the problem, inferring the problem, and then verifying the problem. The law can be the law of the problem itself or it can be poor in comparison with the normal.

Thank you for reading! This is the end of this article on "how to solve the pits encountered after RocketMQ production environment theme expansion and slicing". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report