Switch cpu load more than 90% (2) [new master] 05/01 Update SLTechnology News&Howtos

Switch cpu load more than 90% (2) [new master]

2025-05-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Switch cpu load more than 90% (2)

one。 Background introduction:

Come to this company more than 2 months, encountered another "traffic accident", switch cpu90% or above, the company's people on the public network, visit the number of idc

According to the situation that packet loss always occurs, the company uses cisco equipment, which is connected to 2960, 2950, and 3560 switches, and core is a 4506 switch.

The firewall is juniper and the egress router is routeros.

two。 Case appreciation

The cpu load of sharing cases among snowy people is more than 90% (2)

As above is part of the topology diagram of the network, because it is an office production network and has internal server data, so the whole topology diagram does not have the authority to show, but this will not affect us to show the problem at all. First of all, when I received a colleague who reported that the network speed was slow, I used the segmented isolation method to test the external network address step by step, and finally determined that there was a problem from our own internal to the gateway. This is not easy to troubleshoot, because not everyone has a problem with the gateway. In fact, most of them have no problem with the gateway. At that time, it was judged that there was a problem with the line from a certain access switch to the core switch. If this is a problem, it will not be easy to do, because the office network has been established since 1996, and the aging of the line is also very possible, if it is really the problem of the line. Well, changing the line is very troublesome, but later, after careful observation, it was found that the pc machines of colleagues who lost packets were not all on the same switch, but were distributed on many stations. This can be ruled out as the result of line aging, because it is impossible for many lines to be aging at the same time. The problem became more and more difficult at that time, considering whether there was any new business that led to a sudden increase in office network traffic recently, but the fact was that there was no new business, as usual, so I used our monitoring Cacti to check the traffic diagram of this core switch, and found that the port traffic of the switch connecting with the firewall was very large, and our firewall is now on the market. It seems that there is a problem with the connection between the firewall and the switch. before this, we also used wireshark to grab packets to look at the internal network traffic, and found that except for a large amount of budp, there was no other abnormal traffic. I looked at the two lines from the firewall to the switch. The firewall itself is a 1000-megabit interface, but the switches are basically 100-megabit interfaces. Gigabit interfaces are very few, and they are basically occupied. And the connection between the firewall and the switch has a gigabit line and the other line is 100 megabytes, which seems to be caused by traffic congestion. The process is like this. The intranet gateway is placed on the firewall, and the traffic passes from the second layer of the switch to the firewall, and then from the firewall to the router via the switch. Because entering the firewall is gigabit, a lot of traffic can pass through. But when the firewall forwards the traffic on the switch, the switch uses a 100-megabit port to receive it, resulting in a 100% utilization rate of the switch interface, and then the switch uses cpu to calculate, so the cpu of the switch will naturally rise. Later, I found a gigabit port on the switch to connect to the firewall, cpu went down, and the packet loss phenomenon disappeared. it is still not over, let's go! When I looked at cpu again, I found that cpu utilization was still high:

The cpu load of sharing cases among snowy people is more than 90% (2)

By looking at its process, it is found that the Cat4k Mgmt LoPri is very high. Here, the HiPri represents the high priority process, and the LoPrig table deals with the low priority process. The reason for the high LoPri value is that the process exceeds the Target given by the HiPri, and then it is handed over to the LoPri to deal with the problem.

The cpu load of sharing cases among snowy people is more than 90% (2)

I started to look at the cpu process (show platform health) again. The cpu load of sharing cases is more than 90%. (2) this command is able to see which process takes up a lot of cpu: intra# sh platform health% CPU% CPU RunTimeMax Priority Average% CPU Total. Target Actual Target Actual Fg Bg 5Sec Min Hour CPU K2PortMan Review 2.00 2.81 15 11 100 500 2 2 2 8242:09 Gigaport0 Review 0.40 0.00 40 100 500 00 00: 00 Gigaport1 Review 0.40 0.00 40 100 500 00 00: 00 Gigaport2 Review 0.40 0.00 40 100 500 00 00: 00 Gigaport3 Review 0.40 0.00 40 100 500 00 00: 00 K2FibPerVlanPuntMan 2.00 0.00 15 2 100 500 00 00: 00 K2FibFlowCache flow 2.00 0.02 10 8 100 500 00 0 195:34 K2FibFlowCache flow 2.00 54.00 10 8 100 500 58 65 45 41846 K2FibFlowCache adj 36 K2FibFlowCache adj r 2.00 0.09 10 4 100 500 00 0280 Actual 52 you can see that the value of other values is larger than that of Actual But K2FibFlowCache flow is abnormal. Check the corresponding explanation on the official website: the cpu load of sharing cases among snowy people is more than 90%. (2) this value is high because PBR is in trouble. Our core switch is indeed configured with PBR to handle special needs. When I remove the PBR, check K2FibFlowCache flow again.

The cpu load of sharing cases among snowy people is more than 90% (2)

Found that this value immediately went down, and then take a look at the cpu load of more than 90% of the shared cases of CPU (2)

three。 Summarize the conclusion

1. The increase in cpu of the switch is caused by many factors, which is relatively difficult to troubleshoot.

two。 When troubleshooting cpu faults, if there is a sudden increase, it should also be checked from several aspects, mainly to see if there have been any recent business changes.

No changes, no changes in the configuration, etc., may be caused by misoperation, of course, the old machine may also be hardware failure

3. Generally speaking, the increase in traffic has a great impact on the cpu of the switch, such as switch interface forwarding traffic, × × traffic and so on.

4. The official website also has many solutions to the problem of rising cpu, which should be combined with other useful resources, such as the traffic in this example.

Monitoring tool Cacti

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.