What is the reason for the instability of the DB SERVER server network card? 07/01 Update SLTechnology News&Howtos

What is the reason for the instability of the DB SERVER server network card?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

What is the reason for the instability of the DB SERVER server network card? I believe many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

During the stress test, the DB SERVER server found that the network card was very unstable. Just more than ten minutes after the stress test, the response of the server became very slow. Packets were often lost during PING and the SSH connection was intermittent. At first, I thought that the db server did not respond when it was caused by high concurrency. You can take a look at CPU, memory and hard disk IO. We found that none of them reached a high value, or even much lower than our early warning value. Moreover, monitoring also shows that the remaining resources of the DB server are abundant! It's really strange, so what is the cause of the instability of the network card?

After learning about the situation from the relevant engineers, we know that this DB server is one of the two servers in the dual-computer hot standby, which is bound by two sets of gigabit network cards made just a few days ago. According to the engineer, the stress test was done before binding, and there was no such problem. Is there something wrong with the binding setting? So I decided to check in detail from the binding of the gigabit network card.

The failure phenomenon is illustrated:

1. Check ifcfg-bond0 and ifcfg-bond1 files

# cat / etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0

BOOTPROTO=static

ONBOOT=yes

IPADDR=10.58.11.11

NETMASK=255.255.255.0

GATEWAY=10.58.121.254

USERCTL=no

# cat / etc/sysconfig/network-scripts/ifcfg-bond1

DEVICE=bond1

BOOTPROTO=static

ONBOOT=yes

IPADDR=10.10.10.18

NETMASK=255.255.255.0

GATEWAY=10.58.121.254

USERCTL=no

Analysis: very standard configuration, no problem. Be careful not to specify the IP address, subnet mask, or ID of a single network card here. Assign the above information to the virtual adapter (bonding).

Check ifcfg-eth0, ifcfg-eth2, ifcfg-eth3 and ifcfg-eth4 files

# cat / etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

ONBOOT=yes

BOOTPROTO=none

MASTER=bond0

SLAVE=yes

USERCTL=no

ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"

# cat / etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2

ONBOOT=yes

BOOTPROTO=none

MASTER=bond1

SLAVE=yes

USERCTL=no

ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"

# cat / etc/sysconfig/network-scripts/ifcfg-eth3

DEVICE=eth3

ONBOOT=yes

BOOTPROTO=none

MASTER=bond0

SLAVE=yes

USERCTL=no

ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"

# cat / etc/sysconfig/network-scripts/ifcfg-eth4

DEVICE=eth4

ONBOOT=yes

BOOTPROTO=none

MASTER=bond1

SLAVE=yes

USERCTL=no

ETHTOOL_OPTS= "speed 1000 duplex full autoneg on"

Analysis: from the configuration file, eth0 and eth3 are bound to BOND0, eth2 and eth4 are bound to BOND1.

(note: gigabit full duplex with temporary network card can be like this.

Ethtool-s eth0 speed 1000 duplex full autoneg on

Ethtool-s eth2 speed 1000 duplex full autoneg on)

Check the modprobe.conf configuration file

# cat / etc/modprobe.conf

Alias eth0 bnx2

Alias eth2 bnx2

Alias eth3 bnx2

Alias eth4 bnx2

Alias scsi_hostadapter megaraid_sas

Alias scsi_hostadapter1 ata_piix

Alias scsi_hostadapter2 lpfc

Alias bond0 bonding

Options bond0 miimon=100 mode=0

Alias bond1 bonding

Options bond1 miimon=100 mode=1

# BEGINPP

Include / etc/modprobe.conf.pp

# ENDPP

Analysis: join from this document

Alias bond0 bonding

Options bond0 miimon=100 mode=0

Alias bond1 bonding

Options bond1 miimon=100 mode=1

The main purpose is to make the system load the bonding module at startup, and the external virtual network interface devices are bond0 and bond1.

In addition, miimon is used for link monitoring. For example: miimon=100, then the system monitors the link connection status every 100ms, and if one line fails, it will be transferred to another line; the value of mode indicates the working mode. There are four modes: 0meme, 1meme, 2meme, 3, and two are commonly used.

Mode=0 indicates that load balancing (round-robin) is a load balancing method, and both NICs work.

Mode=1 said that fault-tolerance (active-backup) provides redundancy and works in an active and standby mode, that is, by default, only one network card works and the other is backed up.

Note: bonding can only provide link monitoring, that is, whether the link from the host to the switch is connected. If only the external link down of the switch is down and the switch itself is not down, then bonding will continue to use the link as if there is nothing wrong with it.

There is nothing wrong with the configuration of this part.

We don't seem to see the problem yet, but there is another area that is easy to overlook, that is, the rc.local file. In order to make the Nic binding take effect immediately after each boot, we usually set rc.local. So we should also check this file.

Check the rc.local file

# cat / etc/rc.d/rc.local

Touch / var/lock/subsys/local

Ifenslave bond0 eth0 eth2

Ifenslave bond1 eth3 eth4

Analysis: this setting makes it easy to load the configuration automatically when booting.

Note: here you put eth0 and eth2 in bond0, and eth3 and eth4 in bond1. If you think about it carefully, you will find that in the second step, eth0 and eth3 are bound to BOND0, and eth2 and eth4 are bound to BOND1. It seems that the culprit of the problem is here, so if this configuration is wrong, what phenomenon will it cause?

First of all, review the principle of network card binding. We know that under normal circumstances, ethernet NICs only receive ether frames whose destination mac address is its own mac, and filter out other data frames to reduce the burden on the driver, that is, the software. But the ethernet network card also supports another mode called promisc, which can receive all the frames on the network. Many system programs, such as sniffer and tcpdump, run in this mode. The Bonding network card binding also runs in this mode, and the mac address in the driver is modified to change the mac address of the two network cards to the same, so that the data frames of a specific mac can be received. Then the corresponding data frames are transmitted to the bond driver for processing.

Well, in the rc.local file we checked, due to the carelessness of the system engineer, the Nic binding was misconfigured. Such a minor configuration error will cause one IP address to correspond to two different MAC addresses, which will obviously cause network delay and instability, which is similar to ARP***. When there are multiple different MAC corresponding to the same IP, the ARP of each machine in the network, including the router, corresponding to this IP will constantly change, and the packet will either be lost or sent to the wrong MAC.

We can check the MAC of each network card to confirm it.

Eth0 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:74

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:358839038 errors:0 dropped:0 overruns:0 frame:0

TX packets:445740732 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:84060158481 (78.2 GiB) TX bytes:324117093205 (301.8 GiB)

Interrupt:178 Memory:c6000000-c6012800

Eth2 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:76

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:1319022534 errors:0 dropped:0 overruns:0 frame:0

TX packets:827575644 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:402801656790 (375.1 GiB) TX bytes:249765452577 (232.6 GiB)

Interrupt:186 Memory:c8000000-c8012800

Eth3 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:74

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:368142910 errors:0 dropped:0 overruns:0 frame:0

TX packets:445816695 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:88487806059 (82.4 GiB) TX bytes:324236716714 (301.9 GiB)

Interrupt:194 Memory:ca000000-ca012800

Eth4 Link encap:Ethernet HWaddr D4:AE:52:7F:D1:76

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:1311065414 errors:0 dropped:0 overruns:0 frame:0

TX packets:827581593 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:400383501186 (372.8 GiB) TX bytes:249850192137 (232.6 GiB)

Interrupt:202 Memory:cc000000-cc012800

You can see that the MAC of eth0 and eth3 is the same, and the MAC of eth2 and eth4 is the same.

For the cause of the problem, modify the rc.local file immediately and change it back to the correct configuration.

Ifenslave bond0 eth0 eth3

Ifenslave bond1 eth2 eth4

Then restart the server, and then carry out the pressure test, and found that everything is fine.

The binding of Linux dual network card is a more specific operation. In the configuration, we should not only be familiar with its principle, but also be careful in deployment and implementation. A negligence will cause network instability and node paralysis.

After reading the above, have you mastered the reasons for the instability of the DB SERVER server network card? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.