In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
What is the reason for the instability of the DB SERVER server network card? I believe many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
During the stress test, the DB SERVER server found that the network card was very unstable. Just more than ten minutes after the stress test, the response of the server became very slow. Packets were often lost during PING and the SSH connection was intermittent. At first, I thought that the db server did not respond when it was caused by high concurrency. You can take a look at CPU, memory and hard disk IO. We found that none of them reached a high value, or even much lower than our early warning value. Moreover, monitoring also shows that the remaining resources of the DB server are abundant! It's really strange, so what is the cause of the instability of the network card?
After learning about the situation from the relevant engineers, we know that this DB server is one of the two servers in the dual-computer hot standby, which is bound by two sets of gigabit network cards made just a few days ago. According to the engineer, the stress test was done before binding, and there was no such problem. Is there something wrong with the binding setting? So I decided to check in detail from the binding of the gigabit network card.
The failure phenomenon is illustrated:
1. Check ifcfg-bond0 and ifcfg-bond1 files
# cat / etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.58.11.11
NETMASK=255.255.255.0
GATEWAY=10.58.121.254
USERCTL=no
# cat / etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.10.10.18
NETMASK=255.255.255.0
GATEWAY=10.58.121.254
USERCTL=no
Analysis: very standard configuration, no problem. Be careful not to specify the IP address, subnet mask, or ID of a single network card here. Assign the above information to the virtual adapter (bonding).
Check ifcfg-eth0, ifcfg-eth2, ifcfg-eth3 and ifcfg-eth4 files
# cat / etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"
# cat / etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE=eth2
ONBOOT=yes
BOOTPROTO=none
MASTER=bond1
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"
# cat / etc/sysconfig/network-scripts/ifcfg-eth3
DEVICE=eth3
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"
# cat / etc/sysconfig/network-scripts/ifcfg-eth4
DEVICE=eth4
ONBOOT=yes
BOOTPROTO=none
MASTER=bond1
SLAVE=yes
USERCTL=no
ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"
Analysis: from the configuration file, eth0 and eth3 are bound to BOND0, eth2 and eth4 are bound to BOND1.
(note: gigabit full duplex with temporary network card can be like this.
Ethtool-s eth0 speed 1000 duplex full autoneg on
Ethtool-s eth2 speed 1000 duplex full autoneg on)
Check the modprobe.conf configuration file
# cat / etc/modprobe.conf
Alias eth0 bnx2
Alias eth2 bnx2
Alias eth3 bnx2
Alias eth4 bnx2
Alias scsi_hostadapter megaraid_sas
Alias scsi_hostadapter1 ata_piix
Alias scsi_hostadapter2 lpfc
Alias bond0 bonding
Options bond0 miimon=100 mode=0
Alias bond1 bonding
Options bond1 miimon=100 mode=1
# BEGINPP
Include / etc/modprobe.conf.pp
# ENDPP
Analysis: join from this document
Alias bond0 bonding
Options bond0 miimon=100 mode=0
Alias bond1 bonding
Options bond1 miimon=100 mode=1
The main purpose is to make the system load the bonding module at startup, and the external virtual network interface devices are bond0 and bond1.
In addition, miimon is used for link monitoring. For example: miimon=100, then the system monitors the link connection status every 100ms, and if one line fails, it will be transferred to another line; the value of mode indicates the working mode. There are four modes: 0meme, 1meme, 2meme, 3, and two are commonly used.
Mode=0 indicates that load balancing (round-robin) is a load balancing method, and both NICs work.
Mode=1 said that fault-tolerance (active-backup) provides redundancy and works in an active and standby mode, that is, by default, only one network card works and the other is backed up.
Note: bonding can only provide link monitoring, that is, whether the link from the host to the switch is connected. If only the external link down of the switch is down and the switch itself is not down, then bonding will continue to use the link as if there is nothing wrong with it.
There is nothing wrong with the configuration of this part.
We don't seem to see the problem yet, but there is another area that is easy to overlook, that is, the rc.local file. In order to make the Nic binding take effect immediately after each boot, we usually set rc.local. So we should also check this file.
Check the rc.local file
# cat / etc/rc.d/rc.local
Touch / var/lock/subsys/local
Ifenslave bond0 eth0 eth2
Ifenslave bond1 eth3 eth4
Analysis: this setting makes it easy to load the configuration automatically when booting.
Note: here you put eth0 and eth2 in bond0, and eth3 and eth4 in bond1. If you think about it carefully, you will find that in the second step, eth0 and eth3 are bound to BOND0, and eth2 and eth4 are bound to BOND1. It seems that the culprit of the problem is here, so if this configuration is wrong, what phenomenon will it cause?
First of all, review the principle of network card binding. We know that under normal circumstances, ethernet NICs only receive ether frames whose destination mac address is its own mac, and filter out other data frames to reduce the burden on the driver, that is, the software. But the ethernet network card also supports another mode called promisc, which can receive all the frames on the network. Many system programs, such as sniffer and tcpdump, run in this mode. The Bonding network card binding also runs in this mode, and the mac address in the driver is modified to change the mac address of the two network cards to the same, so that the data frames of a specific mac can be received. Then the corresponding data frames are transmitted to the bond driver for processing.
Well, in the rc.local file we checked, due to the carelessness of the system engineer, the Nic binding was misconfigured. Such a minor configuration error will cause one IP address to correspond to two different MAC addresses, which will obviously cause network delay and instability, which is similar to ARP***. When there are multiple different MAC corresponding to the same IP, the ARP of each machine in the network, including the router, corresponding to this IP will constantly change, and the packet will either be lost or sent to the wrong MAC.
We can check the MAC of each network card to confirm it.
Eth0 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:74
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:358839038 errors:0 dropped:0 overruns:0 frame:0
TX packets:445740732 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:84060158481 (78.2 GiB) TX bytes:324117093205 (301.8 GiB)
Interrupt:178 Memory:c6000000-c6012800
Eth2 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:76
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1319022534 errors:0 dropped:0 overruns:0 frame:0
TX packets:827575644 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:402801656790 (375.1 GiB) TX bytes:249765452577 (232.6 GiB)
Interrupt:186 Memory:c8000000-c8012800
Eth3 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:74
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:368142910 errors:0 dropped:0 overruns:0 frame:0
TX packets:445816695 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:88487806059 (82.4 GiB) TX bytes:324236716714 (301.9 GiB)
Interrupt:194 Memory:ca000000-ca012800
Eth4 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:76
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:1311065414 errors:0 dropped:0 overruns:0 frame:0
TX packets:827581593 errors:0 dropped:0 overruns:0 carrier:0
Collisions:0 txqueuelen:1000
RX bytes:400383501186 (372.8 GiB) TX bytes:249850192137 (232.6 GiB)
Interrupt:202 Memory:cc000000-cc012800
You can see that the MAC of eth0 and eth3 is the same, and the MAC of eth2 and eth4 is the same.
For the cause of the problem, modify the rc.local file immediately and change it back to the correct configuration.
Ifenslave bond0 eth0 eth3
Ifenslave bond1 eth2 eth4
Then restart the server, and then carry out the pressure test, and found that everything is fine.
The binding of Linux dual network card is a more specific operation. In the configuration, we should not only be familiar with its principle, but also be careful in deployment and implementation. A negligence will cause network instability and node paralysis.
After reading the above, have you mastered the reasons for the instability of the DB SERVER server network card? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.