Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Redis practice Series: principle and Optimization of Codis data Migration

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Codis introduction

Codis is an implementation scheme of Redis cluster, which is similar to Redis cluster of Redis community. A larger Redis node cluster is built based on slot slicing mechanism. For Redis clients connected to codis, except for some commands that are not supported, there is no obvious difference from open source Redis Server. The client code basically needs to be modified, and Codis-proxy will calculate slot according to the visited slot. The request is then forwarded to the corresponding Redis-server, and the intermediate codis-proxy is invisible to the client, so according to the needs of the customer's business, codis can be used to build large-scale Redis services, or simply to share requests with multiple Redis-server to improve system throughput.

Compared with the well-known twproxy in the industry, in addition to supporting Redis forwarding, coids also supports non-stop data migration, so that users can easily add or decrease nodes when the capacity or throughput requirements change. This paper mainly analyzes the migration principle of codis, and puts forward a feasible optimization point.

This article is based on the codis3.0 version.

(the picture is from the Internet)

Implementation principle of Codis Migration

When Codis-dashboard starts, it runs four background threads (goroutine), including background redis state synchronization, proxy state synchronization, slot event handling, sync event handling, and provides slot-related RestFUL API to define the attribution relationship between slot and Redis-group, and to define and trigger migration.

The following structure defines an attribution relationship and migration relationship between slot and Redis-group. GroupId represents the redis-group to which the slot with index Id belongs, while Action is used to indicate a migration. Action.TargetId indicates the target redis-group to be migrated by the slot, and the Id,Action.State indicates the status of the migration, which mainly includes Pending, Preparing, Prepared, Migrating and Finished.

Type SlotMapping struct {

Id int `json: "id" `

GroupId int `json: "group_id" `

Action struct {

Index int `json: "index,omitempty" `

State string `json: "state,omitempty" `

TargetId int `json: "target_id,omitempty" `

UpdatedAt int64 `json: "updated_at,omitempty" `

} `json: "action" `

}

"A manual migration process can be triggered with the following command:"

Codis-admin-- dashboard=ADDR-slot-action-- create-- sid=ID-- gid=ID. For example, if you migrate slot 10 to group 5, you can execute "codis-admin-- dashboard=ADDR-slot-action-- create-- sid=10-- gid=5"

If you are migrating multiple slot to the same server, you can use the following command to define several migration operations at once, codis-admin-- slot-action-- create-range-- beg=ID-- end=ID-- gid=ID. For example, if you migrate slot 10x15 to group 5, you can execute "codis-admin-- dashboard=ADDR-slot-action-create--range-- beg=10-end=15-- gid=5".

During the execution of a migration, the state of the Action of the slot changes as follows:

You can also trigger codis to rebalance. The command is: codis-admin-- dashboard=ADDR-rebalance-- confirm,codis will automatically migrate the slot to some newly joined nodes, so that the responsible slot of each node is balanced.

Testing of Codis Migration

After testing, for a 64G cluster (composed of 8 nodes, each node is 8G), using redis-benchmark to write full data, the value length of each key is 32 bytes, a total of 341446298 (340 million) pieces of data are written, and the capacity is expanded to 128G, that is, 512 slot are migrated.

The test results are as follows:

From the test results, the migration speed is very slow, and it takes about 1 hour for each slot to be migrated. Therefore, when using codis, you need to monitor the amount of data. When the data is insufficient, you need to expand the capacity in a timely manner. Otherwise, the failure handling and recovery time may affect the online business when there is insufficient space.

Codis Migration Code Analysis and bottleneck Analysis

From the test results, the migration speed is indeed very slow, and in extreme cases, it may affect the online business, so it is necessary to analyze and optimize the migration process. The following interpretation of the key implementation code handleSlotRebalance, StartDaemonRoutines, ProcessSlotAction, and analysis of optimization and improvement.

01

Analysis of handleSlotRebalance implementation

The main logic of this function is divided into three parts:

1) find the slot to be migrated

2) assign slot to each new node

3) generate migration operation

The logic of the above code is:

1) according to the number of nodes and the number of slot slots (fixed 1024), calculate the number of slot slots that should be responsible for each node, expressed as bound

2) for each redis-group, find the slot that needs to be migrated, and represent it as pending.

Generate a migration plan:

1) traverse all the redis-group. If the existing slot is less than the number of slot slots you should be responsible for, you need to migrate some slots.

2) all redis-group, decide the list of slot to be migrated, expressed as plans

After traversing the migration plan, use create actionRange to generate a series of slot action and save them to etcd. The next step is for the background thread to go to the etcd to retrieve the slot operation and process it separately.

02

StartDaemonRoutines

This code is a background task that starts when dashboard starts, triggers a slot operation every 5 seconds, and runs only one slot operation task.

03

Analysis of ProcessSlotAction implementation

There are two steps: Topom.SlotActionPrepare and Topom.processSlotAction.

As can be seen from the above code:

The following is an analysis of the implementation of processSlotAction:

As can be seen:

04

Bottleneck analysis

From the above analysis, we can draw the following conclusion:

The advantage of this design is that the migration process has little impact on the customer's business, but there are some obvious disadvantages:

As the expansion will generally have a certain amount of advance, and will be carried out in the business trough period, so the migration scheme can be optimized to improve the migration efficiency without too much impact on business access.

Codis code optimization

According to the analysis of migration implementation above, the idea of optimization is as follows:

1. Parallelization of Slot migration

From the analysis of the code implementation, there are two points to choose from:

Finally, for the consideration of code simplification, option 2 is chosen, taking into account the following points:

The following code is optimized to start up to 10 threads to handle the slot event.

At the same time, modify the SlotActionPrepare and select a slot whose status is Pending and does not belong to the same redis-server for processing.

2. Multikey migration

Modify the migration instruction of redis-server to support the migration of multiple key at a time. For flexibility, pass in the number of migrations from outside. The code is obvious, as shown below:

Codis Migration Optimization Test results

It has been verified that for a 64G cluster, using redis-benchmark to write full data, the value length of each key is 32 bytes, a total of 341446298 (340 million) pieces of data are written, and the capacity is expanded to 128G, that is, 512 slot are migrated. The final test results are as follows:

Therefore, the migration performance has been greatly improved after optimization. Of course, the current configuration also takes into account that the business access of customers will not be affected as much as possible. The amount of data in a migration is not maximized. In some cases, the configuration can be modified and more key can be migrated at a time, so that the migration can be completed more quickly.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report