In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Codis migration slot encountered excessive value data resulting in redis process congestion
Background of the question:
In the afternoon of 2016-11-08, consult codis to begin the migration of slots. During the migration process, dba found that the redis in group1 could not connect, and the proxy process exited abnormally and could not be restarted.
When the redis can connect normally, the dashboard is shown as:
When the problem occurred, the redis was not connected, and the Keys could not display it correctly, only Nan.
If the proxy process exits abnormally, you will receive an alarm from Wechat:
[('NoneType' object has no attribute' _ getitem__') 10.20.4.1 KDDI 19153 (KDDI)] (codis_proxy) [r_connection], threshold: 1 currentVOO LastRod 1 minutes.,info:failed [2016-11-08 15:32:35]
Treatment plan:
1. Add the alternate proxy server to the domain name, and remove the domain name of the exception proxy when it takes effect.
two。 Troubleshoot the dashboard log, and the problems are as follows:
It can be seen from the log that the slot_359 migration completed during the migration process, and an error was reported during the solt_360 migration process, indicating that the redis read timed out, and the migration task reported an error after a period of time ([info] migration start: {SlotId:360 NewGroupId:4 Delay:1 CreateAt:1478588644 Percent:0 Status:error Id:0000003317})
3. Manually cancel the migration task
Log in to zookeeper:/usr/local/xywy/zookeeper-3.4.6/bin/zkCli.sh-server 10.20.4.1purl 2181
Rmr / zk/codis/db_techzixun/migrate_tasks/0000003317
4. Restart the proxy process
If it cannot be restarted, it may affect the state of the slot when canceling the migration task. If you check the log, you can see that the error is:
You can see from the log that the slot status is pre_migrate. Just adjust it to online.
Adjustment method:
After logging in to zookeeper
Check the value of the corresponding slot: get / zk/codis/db_techzixun/slots/slot_392
Modify status to online:set / zk/codis/db_techzixun/slots/slot_392 {"product_name": "techzixun", "id": 392, "group_id": 1, "state": {"status": "online", "migrate_status": {"from":-1, "to":-1}, "last_op_ts": "0"}}
Restart proxy when all slot states are online
5. Restart proxy and add it to the domain name, and the fault has been restored.
Follow-up troubleshooting:
The above steps are to deal with the failure, but the reason why the migration went wrong needs to be further investigated.
Check the log of error redis in group1. The questions are as follows:
[113248] 08 Nov 15 writing to target 1620.764 # slotsmgrt: writing to target 10.20.2.4 writing to target 6916, error 'accept: Resource temporarily unavailable', nkeys = 1, onekey =' jbzt_nk_20219_arclist', cmd.len = 1418791367, pos = 65536, towrite = 65536
Use the redis command to scan for large key:
Redis-cli-p xxxx-h xxxx-- bigkeys
It is found that the list capacity of 'jbzt_nk_20219_arclist'' is about 3G.
Thus it can be seen that the cause of the problem is the migration of value-based key or list.
Solution:
This problem is a codis program problem, which can not be solved by technical means in codis. You can only compress the key or list with too much value or choose other ways to use it.
Remaining questions:
Problem description:
When there is an error in the process of codis migrating the slot, the slot status will always be in the process of migration, and the slot status will be that proxy cannot be online during the migration, so the slot status will be changed to online manually. However, at this time, the key routing information that was not migrated successfully is also recorded as the pre-migration group, and the group information of the pre-migration slot has been updated to the migrated group, which leads to the error that the key that was not migrated successfully cannot be connected.
Solution:
Migrate the slot where the key cannot be connected to the original group. When looking for key, you can confirm which ip port it is routed to through proxy.log to determine the original group, and use the binascii.crc32 algorithm binascii.crc32 ('key_name') 24 under python to calculate the slot where the key is located.
Examples are as follows:
There are 2 group in codis. They are group1,group2. Among the 100 slots in group1, there are 1, 2, 3, 4, 5, 5, 6. There is no key in the 6 key,group2. At this time, the 100th slot in the group1 is migrated to the group2. During the migration process, 1,2 and 3key have been migrated successfully due to the excessive value, resulting in a migration error. In this case, the 100th slot belongs to group2 and the slot status is in the process of migration. After the 100th slot status is modified to online by the manual delete migration task, the connection proxy lookup 4, 5 and 6key failed to connect. The reason is that the routing information of 4,5 and 6key is still the 100th slot in group1, but at this time, 100th slot is no longer under group1, so it can be solved by migrating 100th slot back to group1.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.