What are the implementation methods of load balancing technology 04/05 Update SLTechnology News&Howtos

What are the implementation methods of load balancing technology

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you the relevant knowledge points about the implementation of load balancing technology, the content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article, let's take a look at it.

The implementation of load balancing technology is mainly divided into the following categories:

HTTP redirecting load

DNS domain name resolution load

Reverse proxy load

IP load (NAT load and IP tunnel load)

Direct routing (LVS-DR)

IP Tunnel (LVS-TUN)

In a narrow sense, load balancer cannot be understood as allocating the same amount of work to all actual servers, because the carrying capacity of multiple servers is different, which may be reflected in the differences in hardware configuration and network bandwidth. it may also be because a server has multiple functions, and what we call "balancing" means that all servers are not overloaded and can function programmatically.

I. http redirection

When a http agent, such as a browser, requests a URL from the web server, the web server can return a new URL through the Location tag in the http response header information.

This means that the HTTP agent needs to continue to request the new URL to complete the automatic jump.

Performance defects:

1. Throughput limit

The throughput of the primary site server is evenly distributed to the transferred server.

Now suppose that using the RR (Round Robin) scheduling strategy, the throughput of the sub-server is 1000reqs/s, then the throughput of the main server must reach 3000reqs/s in order to fully play the role of three sub-servers, so if there are 100 sub-servers, then the throughput of the main server can be imagined to be high?

On the contrary, if the * * throughput of the primary service is 6000reqs/s, then the average throughput allocated to the sub-server is 2000reqs/s, while the * throughput of the current sub-server is 1000reqs/s, so the number of sub-servers has to be increased to 6.

2. Redirect access depth is different.

Some redirect a static page, some redirect compared to complex dynamic pages, then the load difference of the actual server is unpredictable, but the master server knows nothing. Therefore, it is not good to use redirection method for load balancing in the whole station.

We need to weigh the cost of transferring the request against the cost of processing the actual request, and the smaller the former is relative to the latter, the greater the significance of redirection, such as downloading.

You can go to many mirror download sites to try, and you will find that basic downloads are redirected using Location.

DNS load balancing

DNS is responsible for providing domain name resolution service. When visiting a site, you need to obtain the IP address that the domain name points to through the DNS server of the site domain name. In this process, the DNS server completes the mapping of the domain name to the IP address.

Similarly, this mapping can also be one-to-many, in which case the DNS server acts as a load balancing scheduler, just like the http redirection transformation strategy, which distributes users' requests to multiple servers, but its implementation mechanism is completely different.

Use the dig command to look at the DNS setting of "baidu"

It can be seen that baidu has three A records

Compared with http redirection, DNS-based load balancing completely saves the so-called primary site, or the DNS server has acted as the function of the primary site.

But the difference is that, as a scheduler, the performance of the DNS server itself has little to worry about.

Because the DNS record can be cached by the user's browser or the DNS server of the Internet access service provider, only when the cache expires will it re-request resolution to the DNS server of the domain name.

It is also said that DNS does not have the throughput limit of http, so it is theoretically possible to increase the number of actual servers.

Properties:

1. Intelligent parsing can be carried out according to the user's IP. The DNS server can look for the server closest to the user record among all available A records.

2. Dynamic DNS: update the DNS server every time the IP address changes. Of course, some delay is inevitable because of caching.

Deficiency:

1. No user can directly see which actual server DNS is parsed to, and the debugging of the server operation and maintenance staff brings inconvenience.

2. The limitation of the strategy. For example, you cannot introduce the context of HTTP requests into the scheduling policy, but in the load balancing system based on HTTP redirection introduced earlier, the scheduler works at the HTTP level. It can fully understand the HTTP request and design the scheduling policy according to the application logic of the site, such as reasonable filtering and transfer according to the different URL of the request.

3. If you want to adjust the scheduling policy according to the real-time load difference of the actual server, it requires the DNS server to analyze the health status of each server during each parsing operation. For DNS servers, there is a high threshold for this custom development, not to mention that most sites only use third-party DNS services.

4, DNS record cache, the cache of different programs on DNS servers at all levels of nodes will make you dizzy.

5. Based on the above points, the DNS server can not balance the workload very well. Whether or not to choose the DNS-based load balancing method depends on your needs.

Reverse proxy load balancing

This must be something for everyone, because almost all major Web servers are keen to support reverse proxy-based load balancing. Its core job is to forward HTTP requests.

Compared to the previous HTTP redirection and DNS parsing, the scheduler of the reverse proxy acts as a middleman between the user and the actual server:

1. Any HTTP request to the actual server must go through the scheduler

2. The scheduler must wait for the HTTP response from the actual server and give it back to the user (the first two methods do not require scheduling feedback, but the actual server sends it directly to the user)

Properties:

1. The scheduling strategy is rich. For example, different weights can be set for different actual servers to achieve the effect of those who can do more.

2. The concurrent processing ability of the reverse proxy server is high, because it works at the HTTP level.

3. Reverse proxy server needs some overhead for forwarding operation, such as creating thread, establishing TCP connection with back-end server, receiving processing results returned by back-end server, analyzing HTTP header information, frequent switching between user space and kernel space, and so on.

Although this time is not long, the forwarding overhead is particularly prominent when the back-end server takes a very short time to process the request. For example, to request static files, it is more suitable to use the DNS-based load balancing method described earlier.

4. The reverse proxy server can monitor the back-end server, such as system load, response time, availability, number of TCP connections, traffic, and so on, so as to adjust the load balancing policy according to these data.

5. The reflection proxy server allows users to forward all requests within a session cycle to a specific back-end server (sticky session). This advantage is to maintain local access to session, and to prevent the waste of resources in the dynamic memory cache of the back-end server.

IP load balancing (LVS-NAT)

Because the reverse proxy server works in the HTTP layer, its own overhead has seriously restricted scalability, thus limiting its performance limit. Can load balancing be achieved below the HTTP level?

NAT server: it works at the transport layer, and it can modify the IP packet sent to change the destination address of the packet to the actual server address.

Starting with the Linux2.4 kernel, its built-in Neftilter module maintains data filtering tables in the kernel that contain rules for controlling data filtering.

Fortunately, Linux provides iptables to insert, modify, and delete filtered tables. Even more exciting is the built-in IPVS module in the Linux2.6.x kernel, which works like the Netfilter module, but is more focused on implementing IP load balancing.

To know if your server kernel has an IPVS module installed, you can

Having output means that IPVS is already installed. The management tool of IPVS is ipvsadm, which provides a command-line-based configuration interface, through which you can quickly implement a load balancing system.

This is the famous LVS (Linux Virtual Server,Linux virtual server).

1. Open the packet forwarding option of the scheduler

Echo 1 > / proc/sys/net/ipv4/ip_forward

2. Check whether the actual server has taken the NAT server as its default gateway. If not, such as adding

Route add default gw xx.xx.xx.xx

3. Use ipvsadm to configure

Ipvsadm-A-t 111.11.11.11 ipvsadm 80-s rr

Add a virtual server.-t is followed by the server's public network ip and port.-s rr refers to the RR scheduling strategy using simple polling (this is a static scheduling strategy. In addition, LVS also provides a series of dynamic scheduling policies, such as minimum connection (LC), weighted minimum connection (WLC), minimum expected time delay (SED), etc.)

Ipvsadm-a-t 111.11.11.11 Virgo 80-r 10.10.120.210 Vera 8000-m ipvsadm-a-t 111.11.11.11R10.10.120.211 Rd 8000-m

Add two actual servers (no external network ip is required).-r is followed by the private network ip and port of the actual server.-m means NAT is used to forward packets.

Run ipvsadm-L-n to see the status of the actual server. In this way, we are done.

Experiments show that the load balancing system based on NAT is used. The NAT server, as a scheduler, can increase throughput to a new level, almost twice that of a reverse proxy server, mostly due to the lower overhead of forwarding requests in the kernel.

However, once the content of the request is too large, whether based on reverse proxy or NAT, the overall throughput of load balancer is small, which indicates that for content with high overhead, it is worthwhile to use a simple reverse proxy to build a load balancing system.

Such a powerful system still has its bottleneck, that is, the network bandwidth of the NAT server, including internal and external networks.

Of course, if you are not short of money, you can spend money on gigabit or 10 Gigabit switches, or even load balancing hardware, but what if you are a loser?

A simple and effective way is to mix NAT-based clusters with the previous DNS, such as five 100Mbps egress broadband clusters, and then use DNS to point user requests to these clusters in a balanced way. at the same time, you can also use DNS intelligent parsing to access the nearest region.

This configuration is sufficient for most businesses, but NAT servers are not good enough for large-scale sites that provide services such as downloads or videos.

Direct routing (LVS-DR)

NAT works at the transport layer (layer 4) of the network hierarchical model, while direct routing works at the data link layer (layer 2), which seems to be more awesome.

It forwards the packet to the actual server by modifying the destination MAC address of the packet (without modifying the target IP). The difference is that the response packet of the actual server will be sent directly to the customer without going through the scheduler.

1. Network settings

Suppose a load balancer scheduler, two actual servers, purchase three public network ip, one for each machine, and the default gateway of the three machines needs to be the same. * * set the same ip alias, which is assumed to be 10.10.120.193.

In this way, the scheduler will be accessed through the IP alias 10.10.120.193, and you can point the site's domain name to this IP alias.

2. Add the ip alias to the loopback interface lo

This is to keep the actual server from looking for other servers with this IP alias and run on the actual server:

Also prevent the actual server from responding to ARP broadcasts from the network for IP aliases by performing:

Echo "1" > / proc/sys/net/ipv4/conf/lo/arp_ignore

Echo "2" > / proc/sys/net/ipv4/conf/lo/arp_announce

Echo "1" > / proc/sys/net/ipv4/conf/all/arp_ignore

Echo "1" > / proc/sys/net/ipv4/conf/all/arp_announce

Once configured, you can use ipvsadm to configure the LVS-DR cluster

Ipvsadm-A-t 10.10.120.1919 80-s rr

Ipvsadm-a-t 10.10.120.193Rom 80-r 10.10.120.210Ron 8000-g

Ipvsadm-a-t 10.10.120.193RV 80-r 10.10.120.211RH 8000-g

-g means that packets are forwarded using direct routing

The advantage of LVS-DR over LVS-NAT is that LVS-DR is not limited by the broadband of the scheduler. For example, suppose that the broadband of three servers is limited to 10Mbps at the exit of the WAN switch, as long as there is no speed limit for the LAN switch that connects the scheduler and two actual servers.

In that case, using LVS-DR can theoretically achieve the * exit broadband of 20Mbps, because the response packet of its actual server can be sent directly to the client without going through the scheduler, so it has nothing to do with the exit broadband of the scheduler, only its own.

If you use LVS-NAT, the cluster can only use 10Mbps broadband. Therefore, the more the response packet far exceeds the service of the request packet, the more the scheduler should reduce the cost of transferring the request, the more it can improve the overall scalability, and ultimately the more dependent on WAN egress broadband.

Generally speaking, LVS-DR is suitable for building scalable load balancing systems, whether it is a Web server, a file server, and a video server, it has excellent performance. The premise is that you must purchase a series of legal IP addresses for the actual device.

IP Tunnel (LVS-TUN)

Request forwarding mechanism based on IP tunnel: the IP packet received by the scheduler is encapsulated in a new IP packet and transferred to the actual server, and then the response packet of the actual server can reach the user directly.

At present, Linux is mostly supported and can be implemented with LVS, which is called LVS-TUN. Unlike LVS-DR, the actual server can not be in the same WANt segment as the scheduler, and the scheduler forwards requests to the actual server through IP tunnel technology, so the actual server must also have a valid IP address.

These are all the contents of this article entitled "what are the implementation methods of load balancing Technology?" Thank you for your reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.