In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail what the common server failures are, and the editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.
1. Common failures and phenomena of the server
The main reason why the server cannot be started:
① city power or power cord failure (power outage or poor contact)
② power supply or power module failure
③ memory failure (usually accompanied by alarm)
④ CPU failure (there is usually an alarm)
⑤ motherboard failure
Interruption conflicts caused by other ⑥ cards
2. The server cannot be started
① checks whether the power cord and the various Iamp O connections are connected properly.
② checks to see if the motherboard is powered on after the power cord is connected.
③ sets the server to a minimum configuration (only a single cpu, minimum memory, only a monitor and keyboard) and directly shortens the motherboard switch jumper to see if it can be started.
④ checks the power supply, unplugs all power connectors, and shorts the green and black cables of the motherboard power supply port to see if the power supply is turned on.
If ⑤ determines that the power supply is normal, it needs to use the replacement method to troubleshoot, which starts with the most easily replaced accessories (memory, cpu, motherboard) in a minimized configuration.
3. The system restarts frequently
Reasons for frequent system restarts:
① power failure (determined and solved by replacement method)
② memory failure (can be detected in BIOS error report)
Excessive data flow on ③ network port (excessive work pressure)
④ software failure (update or reinstall operating system resolution)
4. Judgment and treatment of server crash fault.
The server crash fault is difficult to judge, which is generally divided into two aspects: software and hardware.
First aspect-software glitch
① first checks the system log of the operating system, which can be used to determine some of the causes of the crash.
The cause of ② computer virus.
A crash caused by a bug or vulnerability in ③ system software, which needs to be made after judging that the hardware is fault-free, and requires the help of a software provider.
If the ④ software is used improperly or the working pressure of the system is too high, you can ask the customer to reduce the working pressure of the server appropriately to see if it can be solved.
Second aspect-hardware failure
① hardware conflict
② power failure or insufficient power supply can be judged by comparing and calculating the value of all the load power of the server power supply.
③ hard drive failure (scan the surface of the hard drive to see if there are any bad tracks)
④ memory failure (can be judged by error report in motherboard BIOS and error message from operating system)
⑤ motherboard failure (using replacement method to determine)
⑥ CPU failure (using substitution)
⑦ board failure (usually SCSI/ raid card or other pci devices may also cause system panic, which can be judged and dealt with by replacement method)
Note: after the system crash fault is dealt with, a certain pressure machine test needs to be carried out within a period of time to check whether the fault is completely solved.
5. The hard disk cannot be found when installing the operating system.
Cause of failure:
① has no physical hard disk device
Cable connection problem of ② hard disk
③ does not have a hard disk controller driver installed or the driver does not match
6. How to get the driver
Use random CD to make corresponding driver
7. The hard disk controller driver still cannot be loaded with the correct driver.
Check to see if hostraid is enabled
8. after the newly purchased hard disk is installed on the machine, the machine self-test fails.
① will remove the new hard drive, whether the machine can pass the self-test.
② checks whether the ID number of the newly added hard drive is the same as the ID number of the original hard disk. If the ID number of the hard drive is the same, the self-test will fail.
9. How to format SCSI hard disk
1. Operating system: use disk management tools to format
2. No operating system: formatted in the SCSI management and control interface
3. Take ADAPTEC Raid card as an example: when boot-CTRL+A message appears, press CTRL+A to enter.
① selects channel A
② check SCSI UTILITY- to detect the hard drive-select the hard drive to be detected
③ Select FORMAT to fully format the hard drive
④ Select VERIFY to detect the hard disk to see if there are any bad channels
Note: when formatting the hard disk, do not interrupt or power outage, otherwise the disk will be damaged
10. There is a raid card machine in the Aisino series, what to do when one of the hard drives does not work properly RAID alarm, but the system can run normally?
With a new hard drive, make sure that the capacity is greater than or equal to the hard drive that does not work properly, and it is best to replace it with the same type of hard drive.
Common faults related to raid card
The first category: the raid card itself has a problem.
① is often shown as the loss of RAID information, the hard disk is often offline, can not do REBUILD, the hard disk can not be detected at post or for a long time.
Typical fault A: after RAID1, install the operating system, everything is normal, but the second time to restart the system, the alarm sound, after inspection found that a hard disk dropped, REBUILD, returned to normal, but rebooted again. It is suspected to be a hard disk failure, and there is no problem after verifying the hard disk. Finally, the raid card is replaced and the problem is solved.
Typical failure B: the machine often crashes, and sometimes the startup speed is very slow. Looking at the system log, we found that there was an error when the system started: the device / devices/scsi/port0 did not respond during the transfer waiting time. Return to normal after replacing the raid card.
The second category: the problem of hard disk itself.
① shows that the hard disk is offline, and the status in the RAID array is DEAD, or when you do REBUILD, you cannot continue after a certain progress.
Typical failure: after the hard disk is offline, when doing REBUILD, there is an error indicating that it cannot continue when it reaches 20%. After confirming that the offline hard drive, hard drive cartridge and SCSI cable can all work properly, check the online hard drive, find a bad way, repair the hard drive, redo REBUILD, and return to normal.
The third category: contact problems of hard disk cartridges or modules
Problems such as ① often show that the raid card cannot detect the hard drive at all, which is relatively simple, but there are some problems that need to be paid attention to when dealing with machines related to the hard drive box.
Typical failure: hard drive can not be detected in the SCSI card, connect the RIAD cable to the ULTRA160 interface of the motherboard, the fault remains the same, pull out the hard drive box (excluding the bracket behind the hard drive box) to replace, the fault remains the same, replace the hard drive, or not. Finally, remove the carrier on the back of the hard drive box (non-hot-swappable part) and find that a needle on the 80PIN connector on the rear carrier is bent, straighten the needle and return to normal.
11. For the SCSI hard drive used on the server, why can't the ID number of the hard disk be set to 7
In SCSI controller, ID=7 is set to be occupied by hard disk controller by default, so the ID number of hard disk cannot be set to 7.
12. Why can't post pass?
Solution:
The ① machine cuts off the power, opens the chassis, and uses the jumper cap of the "COMS CLEAR" jumper to short connect the other two pins of the "COMS CLEAR" jumper (see the motherboard manual).
② machine power up, self-test, wait for the machine self-test is off, report that CMOS has been cleared, then turn off the machine power, and restore the jumper
Reboot the ③ machine
13. Physical memory slot error
Solution:
Power on-press F2 to enter "SETUP"-"ADVANCED"-"MEMORY CONFIGURATION" enter-"CLEAR DIMM ERRORS" directly enter
This is the end of this article on "what are the common server failures?". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.