Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to troubleshoot packet loss in Centos7 scenarios with high concurrency

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about how to troubleshoot packet loss in Centos7 high concurrency scenarios. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can gain something after reading this article.

The background of a problem.

Online use uses 6 Aliyun to hang behind the SLB load balancer. At first, the TPS is only about 4000, but today the TPS has increased to 7500. The online application occasionally fails to resolve the domain name. When the ping domain name is on the server, it is found that there is packet loss (not limited to the private network domain name, even the baidu domain name is also lost).

[root@prd1 ~] # ping www.baidu.com

PING www.a.shifen.com (180.101.49.12) 56 (84) bytes of data.

64 bytes from 180.101.49.12 (180.101.49.12): icmp_seq=1 ttl=48 time=16.4 ms

64 bytes from 180.101.49.12 (180.101.49.12): icmp_seq=2 ttl=48 time=16.3 ms

Ping: sendmsg: disallowed operation

Ping: sendmsg: disallowed operation

In order to eliminate the problem of DNS, the ping of two machines directly ping in a zone under the same VPC has its ups and downs:

[root@prd2] # ping 172.31.6.108

PING 172.31.6.108 (172.31.6.108) 56 (84) bytes of data.

Ping: sendmsg: disallowed operation

Ping: sendmsg: disallowed operation

64 bytes from 172.31.6.108: icmp_seq=13 ttl=64 time=0.129 ms

64 bytes from 172.31.6.108: icmp_seq=14 ttl=64 time=0.137 ms

Second, the cause of the problem

Further analysis of other ESC under the same VPC that did not deploy the application did not have a problem, so the preliminary diagnosis is the problem of Centos itself.

Then I checked the request logs of SLB and Linux, and it was not determined that they were subjected to a malicious DoS attack.

Finally, the problem is found through * * kernel log (dmesg) * *:

[15283.099034] net_ratelimit: 18702 callbacks suppressed

[15283.099037] nf_conntrack: table full, dropping packet

[15283.099626] nf_conntrack: table full, dropping packet

[15283.099691] nf_conntrack: table full, dropping packet

[15283.102105] nf_conntrack: table full, dropping packet

[15283.102754] nf_conntrack: table full, dropping packet

[15283.103533] nf_conntrack: table full, dropping packet

[15283.103889] nf_conntrack: table full, dropping packet

[15283.104421] nf_conntrack: table full, dropping packet

[15283.106042] nf_conntrack: table full, dropping packet

[15283.106047] nf_conntrack: table full, dropping packet

[15288.100786] net_ratelimit: 22924 callbacks suppressed

There are a large number of kernel logs: nf_conntrack: table full, dropping packet

Three net_ratelimit and nf_conntrack1 net_ratelimit

Rate limit is also a mechanism for Linux to avoid DoS attacks, preventing every message from being logged (which can cause storage space to burst). When the kernel logs messages, use printk () to check whether the log is output through this mechanism

This limitation can be tuned through / proc/sys/kernel/printk_ratelimit and / proc/sys/kernel/printk_ratelimit_burst. The default configuration (RHEL6) is 5 and 10, respectively.

In other words, the kernel allows 10 messages to be logged every 5 seconds. Beyond this limit, the kernel discards the log and records ratelimit N: callbacks suppressed

[root@prd16 ~] # cat / proc/sys/kernel/printk_ratelimit

five

[root@prd16 ~] # cat / proc/sys/kernel/printk_ratelimit_burst

ten

Corresponding kernel code: http://fxr.watson.org/fxr/source/net/core/utils.c?v=linux-2.6

2 nf_conntrack

Nf_conntrack is a module of NAT in Linux system that tracks connection entries.

The nf_conntrack module uses a hash table to record the TCP protocol "established connection" record, and when the hash table is full, the new connection will cause a "nf_conntrack: table full, dropping packet" error.

This module was introduced in kernel 2.6.15 (released in 2006-01-03) and supports IPv4 and IPv6, replacing the IPv4-only ip_connktrack, which is used to track the status of connections for use by other modules.

It is used for all services that require NAT, such as firewalls, Docker, etc. Take the nat and state modules of iptables as an example:

Nat: modify the source / destination address of IP packets according to the forwarding rules, and rely on conntrack records so that the returned packets can be routed to the requesting machine.

State: directly match the firewall filtering rules with the connection status recorded by conntrack (NEW/ESTABLISHED/RELATED/INVALID, etc.).

For the important parameters in the nf_conntrack module, please refer to the following information.

Nf_conntrack_buckets: the size of the hash table, which can be specified when the module is loaded or modified by the sysctl command. When the system memory is greater than or equal to 4GB, its default value is "65536".

Nf_conntrack_max: the maximum number of nodes in the hash table, that is, the maximum number of connections supported by the nf_conntrack module. When the system memory is greater than or equal to 4G, its default value is "262144". For servers that handle a large number of connections, this default is relatively small.

The TCP connection time in which the time_wait state is saved in the nf_conntrack_tcp_timeout_time_wait:nf_conntrack module. The default value is "120s".

Four solutions

Adjust the parameter values in the nf_conntrack module through the sysctl interface. The service side should confirm in advance the maximum number of nf_conntrack connections that the application may use, and refer to the following command to adjust the parameter values in the nf_conntrack module through the sysctl interface.

Sudo sysctl-w net.netfilter.nf_conntrack_max=1503232

Sudo sysctl-w net.netfilter.nf_conntrack_buckets=375808 # if you use a non-4.19 kernel, this option may not be modified at run time

Sudo sysctl-w net.netfilter.nf_conntrack_tcp_timeout_time_wait=60

Filter connections that do not need to be tracked through iptables refer to the following command to add the action of "- j notrack" to the iptables rule, that is, filter connections that do not need to be tracked (track). The advantage of this approach is to get to the root of the problem, and the connections that do not need to be tracked can be directly processed by notrack, which will not take up the space of the hash table and will not cause errors.

Sudo iptables-t raw-A PREROUTING-p udp-j NOTRACK

Sudo iptables-t raw-A PREROUTING-p tcp-- dport 22-j NOTRACK

The above is the troubleshooting of packet loss in the Centos7 high concurrency scenario shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report