How to design and implement storage QoS 05/03 Update SLTechnology News&Howtos

How to design and implement storage QoS

2025-05-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to design and implement storage QoS, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

1. Resource preemption problem

With the adjustment of storage architecture, many application services will run in the same resource pool, providing unified storage capacity. There may be a variety of traffic types within the resource pool, such as IO traffic of upper-layer business, data migration, repair and compression within storage. Different traffic determines the order of IO sent to hardware in a competitive way, so it is impossible to ensure the IO service quality of certain traffic. For example, internal data migration traffic may take up too much bandwidth to affect business traffic reading and writing, resulting in a decline in the quality of service provided by storage. Due to the uncertainty of the result of resource competition, storage can not provide a stable cluster environment.

As shown in the following traffic map, vehicles are retrograde, congestion occurs at will, pedestrians cross, chat unscrupulously, and eventually there are traffic jams and even safety accidents.

two。 How to solve the problem of resource preemption

Compared with the previous traffic map, everyone may have their own views on how to avoid this phenomenon. Here we first introduce two nouns.

QoS, namely quality of service, provides end-to-end quality of service according to the different needs of different types of service.

Storage QoS, under the condition of ensuring service bandwidth and IOPS, allocates storage resources reasonably, effectively alleviates or controls the preemption of resources by application services, and realizes the effects of traffic monitoring, rational allocation of resources, important quality of service assurance and internal traffic avoidance. It is an essential key technology in the storage field.

So what should QoS do? The following is introduced with an example of traffic.

2.1 Traffic classification

From the previous picture, we can see that no matter what kind of car it is, it is self-centered and free from any constraints. the first way we can get there first is to classify the roads. for example, it is divided into bus lanes, small car lanes, truck lanes, non-motorized lanes and crosswalks, etc. under normal circumstances, bus lanes are only allowed to run, but motor vehicles are not allowed on non-motorized lanes. In this way, we can ensure that there is no restricted interference between lanes and lanes.

Similarly, there will be a lot of traffic in the storage. We can assign different "lanes" to different traffic types. For example, the traffic traffic lanes can be divided into wider lanes, while the internal compressed traffic lanes can be relatively narrower. This introduces a more important overview in QoS, that is, traffic classification, which allows more accurate and personalized current limit control according to the classification results.

2.2 Traffic priority

It is not possible to rely on classification alone, because there are always some special circumstances, such as ambulance rescue, police car arrest, etc., we cannot say that this driveway can only run ordinary private cars, and some special vehicles (ambulances, fire engines and police cars, etc.) should have priority.

For storage, business traffic is our special vehicle. We need to ensure the stability of business traffic. For example, the bandwidth and IOPS of business traffic are not limited, while internal traffic such as migration and repair need to limit its bandwidth or IOPS and assign it a fixed "lane". In the case of sufficient resources, internal traffic can drive quietly in its own lane, but when resources are tight, such as a sudden increase in business traffic or a persistent high water level, it is necessary to limit the reasonable width of internal flow. in extreme cases, it can be paused. Of course, if internal traffic stops or cannot meet the reading and writing needs of normal business traffic, you need to consider expansion at this time.

Another important concept in QoS is priority division, which can make up for the deficiency of pre-allocation scheme to some extent by implementing pre-allocation strategy when resources are sufficient, dynamically adjusting low-priority service resources when resources are tight, and avoiding or suspending them appropriately.

2.3 Traffic Monitoring

As mentioned earlier, when resources are insufficient, we can dynamically adjust the thresholds of other traffic, so how do we know that resources are insufficient? At this time, we need a component for traffic monitoring.

When we travel, we often use maps to get to our destination as quickly as possible by choosing the right route. In general, lines are marked with different colors for line congestion, such as red for traffic jams and green for unblocked traffic.

There are two ways for storage to know the current traffic of a machine or disk:

Statistics of machine load. For example, we often go to the machine to check the io of each disk through iostat naming, which is decoupled from the application on the machine and only focuses on the machine itself.

Count the read and write traffic sent by each application, for example, if a storage node application is deployed on a certain machine, then we can count the read and write bandwidth and IOPS sent by this application.

The second method can achieve more detailed traffic classification within the application than the first. For example, a storage application node mentioned earlier contains a variety of traffic. We cannot limit all traffic uniformly through the granularity of the machine.

3. Common QoS current limiting algorithm 3.1 fixed window algorithm

It is divided into several current limiting windows according to time, for example, 1 second is the size of one current limiting window.

Each window has a counter, which is added to each request counter.

When the counter size exceeds the limit (for example, only 100 requests can be passed in a second), other requests in the window will be discarded or queued until the next time node counter is cleared before processing the request.

The ideal flow control effect of the fixed window algorithm is shown in the figure on the left. Assuming that the maximum number of requests allowed within 1 second is 100, then the maximum number of requests within 1 second will not exceed 100.

But in most cases we get the graph on the right, that is, there may be a doubling of traffic. For example, during the previous T1~T2 period, there were no requests, and T2~T3 received 100 requests, all of which passed. The next current limit window counter is cleared, and then there are 100 requests in T3T4 time, and all of them are processed successfully. During this time period, even if there are any requests in the T4T5 period, they cannot be processed, so the set threshold is exceeded. In the end, the number of requests processed in T2~T4 is 200s, so the traffic is doubled.

Summary

The algorithm is easy to understand and easy to implement.

If the flow control is not fine enough, it is easy to double the flow.

It is suitable for a model where the flow is flat and allows the flow to double.

3.2 sliding window algorithm

As mentioned earlier, the fixed window algorithm is prone to uncontrollable traffic (traffic doubling). Sliding window can be regarded as an upgraded version of fixed window, which can avoid the problem of traffic doubling caused by fixed window.

The time window is subdivided into several cells, such as one window in the previous second (a maximum of 60 requests are allowed), and now it is divided into three cells in a second, with a maximum of 20 requests allowed in each cell.

Each interval has an independent counter, and it can be understood that an interval is a current-limiting window in the fixed window algorithm.

When the time of an interval runs out, the sliding window moves back one partition, the old partition (T1~T2) is discarded, and the new partition (T4~T5) is added to the sliding window, as shown in the figure.

Summary

The flow control is more accurate, and the problem of doubling traffic caused by fixed window algorithm is solved.

The granularity of interval division is not easy to determine, too small granularity will increase computing resources, and too large granularity will cause the overall flow curve to be not smooth enough, making the system load go up and down.

It is suitable for stable traffic, and there is no model of sudden increase in traffic.

3.3 funnel algorithm

All water droplets (requests) will first be stored through the funnel (waiting in line)

When the funnel is full, the excess water will be discarded or put into a waiting queue

The other end of the funnel will discharge the water droplets at a fixed rate.

As far as the funnel is concerned, he does not know when the droplets (requests) will flow in, but he can always ensure that the speed of the effluent will not exceed the set threshold, and the requests are always processed at a relatively smooth speed, as shown in the figure, after the system is limited by the funnel algorithm, the flow can be guaranteed below a constant threshold.

Summary

Stable processing speed can achieve the effect of rectification and mainly protect the downstream system.

Unable to cope with the sudden increase in traffic, all requests will be slowed down through the funnel, so it is not suitable for current restriction scenarios with sudden traffic.

It is suitable for models that do not have a sudden increase in traffic or want to achieve traffic integration at a fixed rate.

3.4 token bucket algorithm

The token bucket algorithm is an improvement of the funnel algorithm, which mainly solves the situation where the funnel algorithm can not cope with traffic bursts.

Generate tokens at a fixed rate and put them into buckets, such as N tokens per second

If the number of tokens in the token bucket is greater than the token bucket size M, the extra tokens will be discarded

When all requests arrive, the token is first obtained from the token bucket, and the request is executed if the token is not obtained. if no token is obtained, the request is discarded or queued for the next attempt to get the token.

As shown in the figure, the maximum number of tokens can be stored in the bucket if the token release rate is 100 seconds. When the request speed is greater than the other release rate, the request will be limited to 100 seconds. If there is no request for a certain period of time, the number of tokens in the token bucket will slowly increase to 200, which means that the request can be executed at a time of 200, which allows concurrency of traffic within the set threshold.

Summary

Traffic smoothing

Allow concurrency of traffic within a specific threshold

A model that is suitable for rectification and allows a certain degree of sudden increase in traffic.

As far as the algorithm is concerned, there is no best or worst algorithm, so we need to combine the actual traffic characteristics and system requirements and other factors to choose the most appropriate algorithm.

Fourth, storage QoS design and implementation of 4.1 requirements

Generally speaking, a machine will deploy at least one storage node, which is responsible for read and write requests from multiple disks, while storage requests are divided into many types, such as read and write traffic of normal business, repair traffic of disk damage, space compression traffic after data holes in data deletion, EC migration traffic to reduce the cost of multi-copy storage, and so on. Different traffic on the same storage node will compete with each other to preempt system resources. To better ensure the quality of service, you need to limit and control the bandwidth and IOPS of the traffic. For example, the following conditions need to be met:

You can limit the bandwidth and IOPS of the traffic at the same time. A single bandwidth or IOPS restriction will cause another parameter to affect the stability of the system without control. For example, only the bandwidth is controlled, but the IOPS is not limited. For scenarios with a large number of small IO, the ioutil of the machine will be too high.

Disk granularity current limit can be achieved to avoid disk traffic overload caused by machine granularity current limitation. For example, as shown in the figure, the maximum bandwidth of ec traffic limit node is 10Mbps, and the expected effect is to allocate 2Mbps to each disk, but it is very possible that the 10Mbps is all allocated to the first disk.

Traffic classification control can be supported, and different current limiting parameters can be set according to different traffic characteristics. For example, business traffic needs to be protected, so we cannot limit business traffic, while other traffic such as EC and compression are internal traffic, and appropriate current limiting threshold can be configured according to their characteristics.

The dynamic adaptation of the current limiting threshold can be supported. Because the business traffic cannot be controlled, it is like a "runaway horse" for the system, which may suddenly increase, decrease or maintain the peak. In view of the sudden increase or sustained peak, the system needs to allocate resources as much as possible, which means that the dynamic suppression setting of the current limiting threshold of internal traffic is to suspend and avoid.

4.2 algorithm selection

There are many QoS algorithms mentioned earlier. Here, we choose the sliding window algorithm according to the actual demand for the following reasons:

The system needs to control the internal flow, but the internal flow is relatively stable and smooth.

It can avoid traffic emergencies and affect business traffic.

In addition to sliding the window, the QoS component also needs to add a cache queue, which cannot be discarded after the request is restricted. It needs to be added to the cache queue and wait for the next time window to execute, as shown in the following figure.

4.3 bandwidth and IOPS restrictions at the same time

In order to control the bandwidth and IOPS at the same time, the QoS component will consist of two parts: the IOPS control component is responsible for controlling the IOPS of reading and writing, and the bandwidth control component is responsible for controlling the bandwidth of reading and writing. The bandwidth control is similar to that of IOPS. For example, if the bandwidth limit threshold is 1Mbps, it means that you can only read and write 1048576Bytes size data at most in one second. Assuming that IOPS is limited to 20iops, it means that a maximum of 20 read and write requests can be sent in a second, regardless of the size of each read and write request.

The two components are isolated from each other and influence each other as a whole. For example, when the IOPS control is very low, the corresponding bandwidth may also be small, and when the bandwidth control is very small, the corresponding IOPS will be relatively small.

Take the repair traffic as an example, and test it in three groups.

Group 1: 20iops-1Mbps

Group 2: 40iops-2Mbps

Group 3: 80iops-4Mbps

The test results are shown in the figure above, from which we can see that the qos module can control the bandwidth and iops of the traffic within the set threshold range.

4.4 Traffic classification restrictions

In order to distinguish different traffic, we mark and classify the traffic, and initialize a QoS component for different traffic on different disks. QoS components are independent and do not affect each other, and finally achieve disk granularity bandwidth and IOPS control.

4.5 dynamic threshold adjustment

Although the QoS current limiting scheme mentioned above can well control the internal traffic bandwidth or IOPS within the threshold range, it has the following shortcomings

Do not perceive the current situation of business traffic, when business traffic suddenly increases or continues to peak, internal traffic and business traffic will still be preempted, which can not achieve the effect of traffic avoidance or pause.

The current limits of different traffic on the disk are independent of each other. When the overall traffic bandwidth of the disk or IOPS is overloaded, the internal traffic threshold can not be dynamically lowered, which will also affect the quality of service of business traffic.

Therefore, some improvements need to be made to the QoS component to add a traffic monitoring component, which mainly monitors the bandwidth and IOPS of different traffic types. The dynamic QoS current limit scheme supports the following features:

The traffic growth rate is obtained through the monitoring component, and if there is a sudden increase in traffic, the sliding window threshold is dynamically lowered to reduce the internal traffic; when the traffic returns to a flat level, the initial sliding window threshold is restored to make full use of system resources.

The overall disk traffic is obtained by the monitoring component, and when the overall traffic exceeds the set threshold, the sliding window size is dynamically reduced; when the overall traffic is lower than the set threshold, the sliding window is restored to the initial threshold.

The following sets the threshold for overall disk traffic: 2Mbps, the threshold for the 40iops traffic is 10Mbps-600iops.

When the overall disk traffic reaches the disk threshold, the threshold of other internal traffic will be dynamically adjusted. From the test results, we can see that there are some fluctuations in the ec traffic adjusted by the dynamic threshold. After the overall disk traffic goes down, the ec traffic threshold will return to the initial threshold (10Mbps-600iops), but it can be seen that the overall disk traffic is not controlled below the 2Mbps-40iops, but fluctuates in this range. Therefore, during initialization, we need to ensure that the set internal traffic threshold is less than the overall disk traffic threshold, so as to achieve a more stable internal traffic control effect.

4.6 pseudocode implementation

The storage QoS mentioned earlier is mainly to limit the read and write bandwidth and IOPS. How should it be implemented? IO read and write mainly involves the following interfaces.

Read (p [] byte) (n int, err error) ReadAt (p [] byte, off int64) (n int, err error) Write (p [] byte) (written int, err error) WriteAt (p [] byte, off int64) (written int, err error)

Therefore, the above interfaces need to be re-encapsulated, mainly by adding current-limiting components.

Implementation of bandwidth control component

Read implementation

/ / suppose c is a current limiting component func (self * bpsReader) Read (p [] byte) (n int, err error) {

Size: = len (p) size = self.c.assign (size) / / request to read the file size

N, err = self.underlying.Read (p [: size]) / / read the corresponding size data self.c.fill (size-n) according to the application size / / if the read data size is less than the application size, fill the unused count into the current limit window return}

The following situations will occur after the Read current limit

Read size n

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.