Master the methods and steps of Tengine load balancing algorithm for lightweight open source Web CVM 04/19 Update SLTechnology News&Howtos

Master the methods and steps of Tengine load balancing algorithm for lightweight open source Web CVM

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

The following shows you the methods and steps to master the lightweight open source Web CVM Tengine load balancing algorithm. I hope it can give you some help in practical application. There are many things involved in load balancing, not many theories, and there are many books online. Today, we will use the accumulated experience in the industry to do an answer. Take Alibaba as an example.

In the Ali layer 7 traffic ingress access layer (Application Gateway) scenario, Nginx's official Smooth Weighted Round-Robin (SWRR) load balancing algorithm can no longer perform its skills perfectly. By implementing the new load balancing algorithm Virtual Node Smooth Weighted Round-Robin (VNSWRR), Tengine not only elegantly solves the shortcomings of SWRR algorithm, but also improves the processing power of QPS by about 60% compared with Nginx's official SWRR algorithm.

problem

The access layer Tengine realizes dynamic service discovery through the self-developed dynamic upstream module, that is, the runtime dynamically perceives the back-end application machine capacity expansion, weight adjustment, health check and other information. At the same time, this function can do a lot of things, for example, users can achieve the purpose of real online drainage pressure measurement by adjusting the weight of a machine applied at the back end. However, these operations may cause irreversible bloodshed under Nginx's native SWRR algorithm.

In the access layer (Application Gateway) scenario, Nginx's load balancing algorithm SWRR will cause the QPS of the machine whose weight is adjusted to soar instantly, as shown in the figure above. When the weight is adjusted to 2, traffic will be centrally forwarded to the machine at a certain time.

The processing time complexity of Nginx's SWRR algorithm is O (N), and the processing power of Nginx will decrease linearly in large-scale back-end scenarios.

To sum up, it is urgent to modify the load balancing forwarding strategy and optimize the performance of the access layer Tengine.

Analysis of Native SWRR algorithm

Before introducing the case list, let's briefly introduce the SWRR forwarding strategy and characteristics of Nginx's load balancing algorithm:

The full name of SWRR algorithm is Smooth Weighted Round-Robin Balancing. As the name implies, this algorithm has one more smooth (smoothing) feature than other weighted polling (WRR) algorithms.

Let's describe the algorithm with a simple column:

Suppose there are three machines with weights of 5, 1 and 1, respectively, where array s represents the list of machines, n represents the number of machines, the cw of each machine is initialized to 0, ew is initialized to machine weight, tw represents the sum of the ew of all machines selected in this round, and best represents the selected machines. The simple description is that each time you select the machine with the highest cw value in the machine list, the cw of the selected machine will be subtracted from tw, thus reducing the chance of being selected next time. The simple pseudo code is described as follows:

Best = NULL

Tw = 0

For (I = random ()% n; I! = I | | falg; I = (I + 1)% n) {

Flag = 0

S [I] .cw + = s [I] .ew

Tw + = s [I] .ew

If (best = = NULL | | s [I] .cw > best- > cw) {

Best = & s [I]

}

Best- > cw-= tw

Return best

The weight value before the request number selection is selected by the selected server.

0 {5penny 1pr 1} A {- 2je 1pr 1}

1 {3pm 2pm 2} A {- 4pm 2pm 2}

2 {1meme3pr 3} B {1jcmmer 4pr 3}

3 {6mai Mui 3jue 4} A {- 1 mai Yu Yu 3jue 4}

4 {4mai Mui 2jue 5} C {4m Ling Yu 2jue Yu 2}

5 {9 ~ (th) talk 1 ~ (1)) A {2 ~ (th) ~ (th))

6 {7,0,0} A {0,0,0}

The order of SWRR algorithm selection is: {A, A, B, A, C, A, A}

The order of common WRR algorithm selection may be: {C, B, A, A}.

Compared with ordinary WRR algorithm, SWRR has the characteristics of smoothness and dispersion.

The murder caused by the increase of weight

Judging from the above description, the SWRR algorithm seems to be perfect, but it still has some defects in some scenarios. Let's take a real case list to see what defects it has:

One morning, the traffic dispatcher hurried to my station. At that time, he looked particularly nervous and thought there must be something wrong. Sure enough: "Why is it that when I adjust the weight of a machine in the central computer room from 1 to 2, the access layer Tengine does not forward traffic according to this weight ratio?" At that time, the changing trend of the machine QPS is shown in the following figure:

Note: the dark blue curve indicates the QPS change of the machine whose weight is increased, and the light green curve represents the average QPS of the single machine in the cluster.

At that time, I was at a loss when I saw the flow trend change chart, but fortunately, if there is data in the map, we can first analyze several characteristic numbers of the chart; as some of the data are sensitive, detailed data analysis will not be carried out here. Directly describe its phenomena and causes:

At that time, the traffic of the weighted machine was basically 1x2 of the total traffic of the application room, and it took a period of time for the machine to return to the expected weight ratio. The reason is that the Tengine of the access layer is dynamically aware of the changes in the information of the backend machines, and the official SWRR algorithm strategy of Nginx selects the machine with the largest weight in the current machine list to forward traffic for the first time. This further causes the access layer Tengine that has sensed the weight change of the backend machine to forward the first request to the machine whose weight is increased.

Performance degradation on a large scale

The following is the function hotspot diagram of 2000 backends configured in upstream to test Nginx in the reverse proxy scenario. The CPU consumption of the ngx_http_upstream_get_peer function accounts for 39%. This is because the time complexity of selecting machines in the SWRR algorithm is O (N) (where N represents the number of backend machines), which means that each request has to execute nearly 2000 cycles to find the corresponding backend machine for this forwarding.

Pressure testing environment

CPU model: Intel (R) Xeon (R) CPU E5-2682 v4 @ 2.50GHz

Pressure testing tool:. / wrk-T25-d5m-c500 http://ip/t2000

Tengine core configuration: configure 2 worker processes, stress source-persistent connection-> Tengine / Nginx-short connection-> backend

Let's do an experiment. The control variable is the number of server configured in upstream, and observe the changes of QPS processing capacity and response time RT of Nginx in different scenarios. It can be found from the figure that when the number of server in the backend upstream increases by 500, the QPS processing capacity of the Nginx decreases by about 10%, and the RT increases by about 1ms.

Basically, from the above analysis, it has been confirmed that there are two defects in the SWRR algorithm, so let's start to solve the problem.

Improved VNSWRR algorithm

Although the classical WRR algorithm (such as random number implementation) can achieve O (1) in time complexity, it can also avoid the selection defect of increasing the weight of SWRR algorithm. However, in some scenarios (such as small traffic), it may cause traffic inequality at the back end, especially when there is too much uncertainty in the scenario where traffic surges instantly. Therefore, it is conceived whether there is an algorithm that can not only have the smooth and decentralized characteristics of SWRR algorithm, but also have O (1) time complexity. So there is the Virtual Node Smooth Weighted Round-Robin (VNSWRR) algorithm.

Here is a column to illustrate this algorithm: the weights of 3 machines A, B and C are 1, 2 and 3 respectively N represents the number of back-end machines, and TW represents the sum of the weights of back-end machines.

Key points of algorithm

O the initialization order of virtual nodes is strictly selected according to the SWRR algorithm to ensure that the machines in the initialization list can distribute enough hashes.

O Virtual nodes are initialized in batches at run time to avoid intensive computing sets. After each batch of virtual nodes are used, initialize the next batch of virtual nodes list. Only min (n, max) virtual nodes are initialized each time.

Algorithm description

O when the Tengine program starts or senses changes in backend machine information, TW virtual nodes are built and only N nodes are initialized for the first time (note: TW represents the sum of backend machine weights, N represents the number of backend machines)

O each process sets a random starting point to poll locations, such as the list corresponding to Step 1 in the figure above.

After reading the above methods and steps to master the Tengine load balancing algorithm for lightweight open source Web CVM, if there is anything else you need to know, you can find what you are interested in in the industry information or find our professional and technical engineers for answers. Technical engineers have more than ten years of experience in the industry.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.