How to use LVS+DR to build a cluster to achieve load balancing 09/19 Update SLTechnology News&Howtos

How to use LVS+DR to build a cluster to achieve load balancing

2025-09-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

The following brings you how to use LVS+DR to build a cluster to achieve load balancing. I hope it can give you some help in practical application. Load balancing involves many things, not many theories, and there are many books on the Internet. Today, we will use the accumulated experience in the industry to do an answer.

Overview and working principle of DR Mode

Overview of DR Mode Services:

Direct Routing (Direct routing)-in the same region, on the same network segment

Director assigns requests to different real server. Real server responds directly to the user after processing the request, so that the director load balancer handles only half of the connection between the client and the cloud server. The load balancer handles only half of the connections, avoiding new performance bottlenecks and increasing the scalability of the system. Direct Routing is made up of and uses physical layer (modify MAC address) technology, so all servers must be in one network segment.

How LVS DR mode works: modify MAC address

DR mode routes the message directly to the target real server. In DR mode, the scheduler dynamically selects a server according to the load of each real server, the number of connections, etc., and does not modify the target IP address and target port, nor encapsulate the IP message, but changes the target MAC address of the data frame of the request message to the MAC address of the real server. The modified data frame is then sent on the local area network of the server group. Because the MAC address of the data frame is the MAC address of the real server and is on the same local area network. Then according to the communication principle of the local area network, the real reset must be able to receive the data packet sent by the LB. When the real server receives the request packet, it unlocks the IP packet header and finds that the target IP is VIP. (at this point, only your own IP matches the target IP, so we need to configure VIP on the local loopback excuse. In addition: since network interfaces will broadcast responses with ARP, but all other machines in the cluster have lo interfaces for this VIP, the responses will conflict. So we need to turn off the ARP response of the real server's lo interface. The real server then makes a request response, and then sends the response packet back to the customer based on its routing information, and the source IP address is still VIP.

The actual topology diagram of DR mode and the workflow flow chart of LVS+DR mode: MAC conversion process

Equipment list for example scenario: director dispenser, DIP: 192.1681.70

VIP: 192.168.1.63

Basic information of the request sent by ① client: IP:192.168.1.101 sends a request to the target vip and Director receives it. The information of IP packet header and data frame header is as follows:

According to the load balancing algorithm, the ② Director dispatcher selects a realserver of an active (assuming 192.168.1.62), and sends the mac address of the network card where the RIP is located as the destination mac address to the LAN. The information of IP packet header and data frame header is as follows:

③ realserver (192.168.1.62) receives the frame in the local area network, removes the data frame, finds that the destination IP (VIP) matches the local, and processes the message. The message is then re-encapsulated and sent to the local area network. The IP header and data frame header information of the reply request are as follows:

④ if client and VS are on the same network segment, then client (192.168.57.135) will receive this reply message. If it crosses the network segment, the message is returned to the user through the gateway/ router via Internet.

Summary of DR mode:

1. Forwarding is achieved by modifying the destination MAC address of the packet on the scheduler LB. Notice that the source address is still CIP and the destination address is still VIP address.

2. The requested message goes through the scheduler, while the message processed by the RS response does not need to go through the scheduler LB, so it is very efficient to use when the concurrent traffic is large (compared with NAT mode)

3. Because DR mode is forwarded through MAC address rewriting mechanism, all RS nodes and scheduler LB can only be in one local area network.

4. RS hosts need to bind VIP addresses on the LO interface, and ARP suppression needs to be configured.

5. The default gateway of the RS node does not need to be configured as LB, but is directly configured as the gateway of the superior route, which allows the RS to go out of the network directly.

6. Because the scheduler in DR mode only rewrites the MAC address, the scheduler LB cannot rewrite the target port, so the RS server has to use the same port as VIP to provide services.

One: experimental objectives

1: correctly understand the working principle of DR

2: use LVS+DR to build a cluster to achieve load balancing

3: understand several scheduling modes and scheduling parameters of LVS

4: understand ipvsadm command parameters

5: test the stress of the website with the ab command

6: actual combat: 1000 requests are processed at the same time. 1000 concurrent requests are executed at a time.

Two: experimental topology

Three: experimental environment

1: prepare 3 sets

Dispenser: xuegod63 VIP:eth0:1:192.168.1.63

DIP:eth0:192.168.1.70

Real server xuegod62: RIP:eth0: 192.168.1.62

VIP:lo:1 192.168.1.63

Real server xuegod64: RIP:eth0: 192.168.1.64

VIP:lo:1 192.168.1.63 2:iptables-F, clear Rul

3:selinux shuts down

4:red had version 6.564-bit operating system

Four: experimental code

Dispenser-xuegod63

1: configure DIP and RIP addresses

DIP: [root@xuegod63 ~] # ifconfig eth0 192.168.1.70

VIP: [root@xuegod63 ~] # ifconfig eth0:1 192.168.1.63

[root@xuegod63 ~] # vim / etc/sysconfig/network-scripts/ifcfg-eth0 # to configure the following information

IPADDR=192.168.1.70

NETMASK=255.255.255.0

GATEWAY=192.168.1.1

DNS1=202.106.46.151

2: generate eth0:1 configuration file

[root@xuegod63 network-scripts] # cp ifcfg-eth0 ifcfg-eth0:1

[root@xuegod63 network-scripts] # vim ifcfg-eth0\: 1 # write the following

DEVICE=eth0:1

NM_CONTROLLED=yes

IPADDR=192.168.1.63

NETMASK=255.255.255.0

ONBOOT=yes

TYPE=Ethernet

PREFIX=24

DEFROUTE=yes

IPV4_FAILURE_FATAL=yes

NAME= "eth0:1"

HWADDR=00:0C:29:12:EC:1E # Mac address must be written as the same as eth0, otherwise you can't get eth0:1 as a network card device.

[root@xuegod63 network-scripts] # service network restart

[root@xuegod63 network-scripts] # ifconfig # check if eth0 and eth0:1 are available

Inet addr:192.168.1.63 Bcast:192.168.1.255 Mask:255.255.255.0

3: configure LVS-DR rules:

[root@xuegod63 network-scripts] # rpm-ivh / mnt/Packages/ipvsadm-1.25-9.el6.x86_64.rpm

[root@xuegod63] # ipvsadm-A-t 192.168.1.63 rr 80-s rr

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 80-r 192.168.1.62-g

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 purl 80-r 192.168.1.64-g

Note:-g indicates DR mode, and-m represents IP tun mode

[root@xuegod63] # ipvsadm-L-n

IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port Forward Weight ActiveConn InActConn

TCP 192.168.1.63:80 rr

-> 192.168.2.62 Route 80 1 0

-> 192.168.2.64 80 Route 1 0

Note: of the three LVS modes, only NAT mode needs to enable route forwarding. DR and TUN modes do not need to be turned on.

Rule profile for 4:LVS: / etc/sysconfig/ipvsadm

Find the configuration file method: because: / etc/init.d/ipvsadm save can be saved. So the configuration file must find the relevant save path in / etc/init.d/ipvsadm.

[root@xuegod63 ~] # vim / etc/init.d/ipvsadm

[root@xuegod63 ~] # / etc/init.d/ipvsadm save

[root@xuegod63 ~] # cat / etc/sysconfig/ipvsadm

-A-t 192.168.1.63 virtual 80-s wrr

-a-t 192.168.1.63 virtual 80-r 192.168.1.62 virtual 80-g

-a-t 192.168.1.63 virtual 80-r 192.168.1.64 virtual 80-g

RealServer:xuegod62:

1. Configure RIP eth0, bridging mode

[root@xuegod62 ~] # ifconfig eth0 192.168.1.62 Universe 24

2. Loopback interface-vip

[root@xuegod62 ~] # ifconfig lo:1 192.168.1.63 netmask 255.255.255.255

[root@xuegod62 network-scripts] # cp ifcfg-lo ifcfg-lo:1

[root@xuegod62 network-scripts] # cat ifcfg-lo:1

DEVICE=lo:1

IPADDR=192.168.1.63

NETMASK=255.255.255.255

ONBOOT=yes

NAME=loopback

[root@xuegod62 ~] # service network restart

3. Disable the ARP forwarding function.

[root@xuegod62 ~] # echo 1 > / proc/sys/net/ipv4/conf/eth0/arp_ignore

[root@xuegod62 ~] # echo 2 > / proc/sys/net/ipv4/conf/eth0/arp_announce

[root@xuegod62 ~] # vim / etc/sysctl.conf # finally added

Net.ipv4.conf.eth0.arp_ignore = 1

Net.ipv4.conf.eth0.arp_announce = 2

[root@xuegod62 ~] # sysctl-p # is permanent: (note whether the network card of the actual link ok of realserver is eth0)

Parameter description:

Parameters.

Action

Arp_announce is: 2

Use the most appropriate local address for the query target. For example, if you receive a VIP arp request packet on the eth0 interface. The kernel determines whether the VIP address is the same as the IP on the eth0 interface. If it is the same, reply to this package. If it is not the same, discard and do not respond.

Arp_ignore is: 1

Answer only ARP query requests where the destination IP address is to access the network interface (eth0).

My own understanding:

When setting the parameter, set arp_ignore to 1, which means that when someone else's arp request comes, if the received network card device does not have this ip in the arp cache table, it will not respond. The default is 0. As long as any network card device on this machine has this ip, it will respond to the arp request and send the mac address.

4. The gateway points to the public network egress router IP:

[root@xuegod62 ~] # vim / etc/sysconfig/network-scripts/ifcfg-eth0

GATEWAY=192.168.1.1

5, start port 80

[root@xuegod62 ~] # echo 192.168.1.62 > / var/www/html/index.html

[root@xuegod62 ~] # service httpd restart

RealServer:xuegod64:

1. Configure ip eth0, bridging mode

[root@xuegod64 ~] # ifconfig eth0 192.168.1.64 Universe 24

2, loopback interface

[root@xuegod64 ~] # ifconfig lo:1 192.168.1.63 netmask 255.255.255.255

[root@xuegod64 network-scripts] # cp ifcfg-lo ifcfg-lo:1

[root@xuegod64 network-scripts] # cat ifcfg-lo:1

DEVICE=lo:1

IPADDR=192.168.1.63

NETMASK=255.255.255.255

ONBOOT=yes

NAME=loopback

3. Disable ARP forwarding.

[root@xuegod64 ~] # echo 1 > / proc/sys/net/ipv4/conf/eth0/arp_ignore

[root@xuegod64 ~] # echo 2 > / proc/sys/net/ipv4/conf/eth0/arp_announce

Permanent: (note whether the network card of realserver's actual link ok is eth0)

[root@xuegod64 ~] # vim / etc/sysctl.conf # finally added

Net.ipv4.conf.eth0.arp_ignore = 1

Net.ipv4.conf.eth0.arp_announce = 2

[root@xuegod64] # sysctl-p

4. The gateway points to the public network egress router IP:

[root@xuegod64 ~] # vim / etc/sysconfig/network-scripts/ifcfg-eth0

GATEWAY=192.168.1.1

5, start port 80

[root@xuegod64 ~] # echo 192.168.1.64 > / var/www/html/index.html

[root@xuegod64 ~] # service httpd restart

test

Physical machine testing:

Http://192.168.1.63/

Note: when testing, do not test on the dispenser. That kind of test won't work.

Several scheduling modes of LVS:

[root@xuegod63] # ipvsadm-h

Parameters.

Scheduling algorithm

-s rr

Round robin (default algorithm)

-s wrr

Cyclic method with weight

-s lc

Least connection method

-s wlc

Least connection method with weight

-s lblc

Local-based least connection method

-s dh

Target hash method

-s sh

Source hash method

-s sed

Shortest expected delay method

-s nq

Never queuing method

Detailed description of each scheduling algorithm

Number

Parameters.

Action

Round robin scheduling (Round Robin) (rr for short)

Through the "round call" scheduling algorithm, the scheduler distributes external requests sequentially to the real servers in the cluster, and it treats each server equally, regardless of the actual number of connections and system load on the server.

Weighted rotation (Weighted Round Robin) (wrr for short)

The scheduler schedules access requests according to the different processing capacity of the real server through the "weighted round call" scheduling algorithm. This ensures that powerful servers can handle more access traffic. The scheduler can automatically inquire about the load of the real server and adjust its weight dynamically.

Minimum link (Least Connections) (LC)

The scheduler dynamically schedules network requests to the server with the least number of established links through the least connection scheduling algorithm. If the real server of the cluster system has similar system performance, the load can be better balanced by using the "minimum connection" scheduling algorithm.

Weighted minimum links (Weighted Least Connections) (WLC)

When the performance of the servers in the cluster system varies greatly, the scheduler uses the "weighted least link" scheduling algorithm to optimize the load balancing performance, and the servers with higher weights will bear a larger proportion of the active connection load. The scheduler can automatically inquire about the load of the real server and adjust its weight dynamically.

Locality-based minimum links (Locality-Based Least Connections) (LBLC

The "locality-based least link" scheduling algorithm is a load balancing for target IP addresses, which is mainly used in Cache cluster systems at present. The algorithm finds the server most recently used by the target IP address according to the requested target IP address, and sends the request to the server if the server is available and not overloaded; if the server does not exist, or if the server is overloaded and has half of the workload, then use the principle of "least links" to select an available server and send the request to the server.

Locality-based minimum links (Locality-Based Least Connections with Replication) with replication (LBLCR)

The "locality-based least link with replication" scheduling algorithm is also a load balancing for target IP addresses, and is currently mainly used in Cache cluster systems. It differs from the LBLC algorithm in that it maintains a mapping from a destination IP address to a set of servers, while the LBLC algorithm maintains a mapping from a destination IP address to a server. The algorithm finds the server group corresponding to the target IP address according to the requested target IP address, and selects a server from the server group according to the principle of "minimum connection". If the server is not overloaded, the request is sent to the server; if the server is overloaded, a server is selected from the cluster according to the "minimum connection" principle, the server is added to the server group, and the request is sent to the server. At the same time, when the server group has not been modified for some time, the busiest server is removed from the server group to reduce the degree of replication.

Destination address hash (Destination Hashing) (DH)

The "destination address hash" scheduling algorithm finds the corresponding server from the statically assigned hash table as a hash key (Hash Key) according to the target IP address of the request, and sends the request to the server if the server is available and not overloaded, otherwise it returns empty.

Source address hash (Source Hashing) (SH)

The "source address hash" scheduling algorithm finds the corresponding server from the statically assigned hash table as a hash key (Hash Key) according to the requested source IP address, and sends the request to the server if the server is available and not overloaded, otherwise it returns empty.

Shortest expected delay (Shortest Expected Delay Scheduling SED) (SED)

Based on wlc algorithm. This must be exemplified.

The weights of the three ABC machines are 123 and the number of connections is 123 respectively. So if you use the WLC algorithm, a new request may be assigned to any one of the ABC when it comes in. Such an operation will be performed after using the sed algorithm.

A (1x 1) / 1

B (1x 2) / 2

C (1x 3) / 3

According to the operation result, the connection is given to C.

10:

Minimum queue scheduling (Never Queue Scheduling NQ) (NQ)

No queues are required. If the number of connections of a realserver is 0, it will be allocated directly, and there is no need for sed operation.

Note: the scheduling algorithm takes effect immediately after configuration, just like the iptables configuration rules.

Test different scheduling algorithms

Example 1: a cyclic method for testing LVS's LVS-DR wrr- with weight

Scheduling algorithm for [root@xuegod63] # ipvsadm-C # before emptying

[root@xuegod63] # ipvsadm-A-t 192.168.1.63 wrr 80-s wrr

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 80-r 192.168.1.62-g-w 10

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 80-r 192.168.1.64-g-w 20

Test: on the physical machine, refresh this link 9 times: http://192.168.1.63/

# A total of 9 connections, xuegod62:xuegod64 is 1:2 relationship. The greater the weight, the more connections you get.

Example 2: if a real server has a weight of 0, it will no longer be assigned to his client's request

[root@xuegod63] # ipvsadm-C

[root@xuegod63] # ipvsadm-A-t 192.168.1.63 wrr 80-s wrr

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 80-r 192.168.1.62-g-w 0

[root@xuegod63] # ipvsadm-a-t 192.168.1.63 80-r 192.168.1.64-g-w 20

On the physical machine, refresh the link nine times: http://192.168.1.63/

View:

# found that there are incoming packages, but outbound ones are 0. There are 0 because when the packet goes out, it is delivered directly to real server, but not to Director.

Ipvsadm more parameters description

Parameters.

Description

-A-add-service

Add a new virtual server record to the virtual server table of the kernel. Also

Is to add a new virtual server.

-E-edit-service

Edit a virtual server record in the kernel virtual server table.

-D-delete-service

Delete a virtual server record in the kernel virtual server table.

-C-clear

Clears all records in the kernel virtual server table.

-R-restore

Restore virtual server rules

-S-save

Save the virtual server rules and output them in a readable format with the-R option

-a-add-server

Add a new real server record to a record in the kernel virtual server table. That is, to add a new real server to a virtual server.

-e-edit-server

Edit a real server record in a virtual server record

-d-delete-server

Delete a real server record from a virtual server record

-L |-l-list

Show kernel virtual server table

-Z-zero

Virtual service table counter zero (clear the current number of connections, etc.)

-set tcp tcpfin udp

Set the connection timeout value

-start-daemon

Start the synchronization daemon. It can be followed by master or backup to indicate whether LVS Router is master or backup. The VRRP function of keepalived can also be used in this function.

-stop-daemon

Stop the synchronization daemon

-h-help

Display help information

-t-tcp-service service-address

It shows that the virtual server provides tcp services.

-u-udp-service service-address

It shows that the virtual server provides udp services.

-f-fwmark-service fwmark

The description is a service type marked by iptables.

-s-scheduler scheduler

The scheduling algorithm used is rr | wrr | lc | wlc | lblc | lblcr | dh | sh | sed | nq. The default scheduling algorithm is: wlc.

-p-persistent [timeout]

Lasting and solid service. This option means that it comes from the same customer.

Multiple requests from users will be processed by the same real server. The default value for timeout is 300 seconds.

-r-real-server server-address

Real server [Real-Server:port]

-g-gatewaying

Specifies that the working mode of LVS is direct routing mode (also the default mode of LVS)

-I-ipip

Specifies that the operating mode of LVS is tunnel mode

-m-masquerading

Specifies that the operating mode of LVS is NAT mode

-w-weight weight

The weight of the real server

-mcast-interface interface

Specify the synchronization interface for multicast

-c-connection

Show LVS's current connections such as: ipvsadm-L-c

-t timeout

Displays the timeout value of tcp tcpfin udp, such as: ipvsadm-L-timeout

-daemon

Show synchronous daemon status

-stats

Display statistics

-rate

Display rate information

-n-numeric

Output the digital form of the IP address and port

-sort

Sort output for virtual servers and real servers

-L-n

View the rules to display the kernel virtual server table

-L-n-c

Check how the client connects to the dispatcher and real server

Example 1: view how the client connects to the dispatcher and real server

Example 2: view rate

Example 3-zero virtual service table counter is cleared (clear the current number of connections, etc.)

Example 4: delete a record

Actual combat: process 1000 requests at the same time. 1000 concurrent requests to be executed at a time

Detailed explanation of stress Test Command parameters of ab website under Linu

Format

. / ab [options] [http://]hostname[:port]/path

Parameters.

Action

-c

The number of requests executed in the test session. By default, only one request is executed

-t

The number of requests generated at a time. The default is one at a time.

-p

Maximum number of seconds to test

-T

A file that contains data that requires POST.

-v

Content-type header information used by POST data

-V

Set the detail level of the display information

-w

Display the version number and exit.

-I

Output the results in the format of an HTML table. By default, it is a table with two columns wide on a white background.

-C

Output the results in the format of an HTML table. By default, it is a table with two columns wide on a white background.

-P

Provide BASIC authentication trust to a transit agent. The user name and password are separated by a: and sent in base64 code. This string is sent regardless of whether the server needs it (that is, whether the 401 authentication requirement code is sent).

-n

The total number of requests executed in the test session. By default, only one request is executed

Syntax: ab-n digit-c numeric http:// link

-n requests Number of requests to perform

# the total number of requests executed in the test session. By default, only one request is executed

-c concurrency Number of multiple requests to make # the number of requests generated at a time. The default is one at a time.

Actual combat: process 1000 requests at the same time. 1000 concurrent requests are executed at a time.

In xuegod64: test VIP:

[root@xuegod64] # ab-n 1000-c 1000 http://192.168.1.63/index.html

Stress Test Command of ab website under linux

[root@xuegod63] # ab-n 1000-c 1000 http://192.168.1.63/index.html

You can test the load of the next two machines.

View status:

[root@xuegod63] # ab-n 1000-c 1000 http://192.168.1.63/index.html

This is ApacheBench, Version 2.3

Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.63 (be patient patience)

Completed 100 requests

Completed 200 requests

Completed 300 requests

Completed 400 requests

Completed 500 requests

Completed 600 requests

Completed 630 requests

Completed 800 requests

Completed 900 requests

Completed 1000 requests

Finished 1000 requests # completes 1000 requests

Server Software: Apache/2.2.15 # version of the httpd server being tested: / / platform apache version 2.2.15

Server Hostname: 192.168.1.63 # server hostname

Server Port: 80 # server port

Document Path: / index.html # tested page documentation

Document Length: 13 bytes # document size

[description: check the index.html size in xuegod62. It's really 13 bytes.

[root@xuegod62 html] # ll-h

-rw-r--r-- 1 root root 13 May 5 17:57 index.html]

Concurrency (concurrent) Level: 1000 # concurrency

Time taken for tests: 2.166 seconds # time spent on the entire test

Complete requests: 1000 # number of completed requests

Failed requests: number of failed requests for 0 #

Write errors: 0

Total transferred: 281120 bytes # Total number of bytes transferred throughout the test

HTML transferred: 13052 bytes # the amount of HTML content transferred throughout the scene

Requests per second: 461.77 [# / sec] (mean) # requests processed per second. / / the indicator you are most concerned about is one, which is equivalent to the number of transactions per second in the server. The mean in parentheses indicates that this is an average.

Time per request: 2165.597 [ms] (mean) # the second indicator that people are most concerned about is the average request response time, which is indicated by the mean in parentheses.

Time per request: 2.166 [ms] (mean, across all concurrent requests) # time per request: 2.166 [milliseconds] (that is, in all concurrent requests) / / average of the actual elapsed time per request. Since cpu does not actually process concurrent requests at the same time, but rotates them one by one according to the time slices obtained by each request, basically the first Time per request time is approximately equal to the second Time per request time multiplied by the number of concurrent requests.

Transfer rate: 126.77 [Kbytes/sec] received # transfer rate: / / average traffic per second on the network, which can help eliminate the problem of longer response time caused by excessive network traffic

Percentage of the requests served within a certain time (ms) # percentage of requests that provide services within a certain period of time (milliseconds)

50% 581

66% 1053

75% 1075

80% 1089

90% 1393

95% 1793

98% 1800

99% 1804

1807 (longest request)

/ / the response of all requests in the entire scenario. There is a response time for each request in the scenario, in which 50% of the users have a response time of less than 1093 milliseconds, 60% of the users have a response time of less than 1247 milliseconds, and the maximum response time is less than 7785 milliseconds

After reading the above about how to use LVS+DR to build a cluster to achieve load balancing, if there is anything else you need to know, you can find what you are interested in in the industry information or find our professional and technical engineers for answers. Technical engineers have more than ten years of experience in the industry.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.