In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Recently, I have done cluster room migration, connecting root dedicated line between the old room and the new room, doing cluster non-downtime migration, that is, cross-room, and adding more than 100 new servers at the same time. I encountered several problems and recorded them.
The machines in the old cluster are centos 6, and the machines added to the new computer room are centos 7.
1. Packet loss problem
When cross-room, datanodes display many Slow BlockReceiver logs
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 630ms(threshold=300ms)
After investigation, the main reason for this error is the MTU setting of the network card. Hadoop recommends setting the MTU value of the network card from 1500 to 9000 to support receiving jumbo frames. After adjusting the mtu value, there are still a few occasional bars, but the frequency is much lower. And I remember that this switch must be modified together with the light to change the server is not good.
2. centos7 execution df command suspended, unable to exit
Executing df under cent7 will die there, and ctrl-c won't quit. Because our nodemonager health check script contains df command, nm health check will be stuck, and finally all CPU will be eaten up, resulting in the calculation task can not be carried out normally. The kill command cannot kill a dead df process, nor can the strace trace df command exit. You must kill strace with kill-9.
stat("/sys/fs/cgroup/memory", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0stat("/sys/kernel/config", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0stat("/proc/sys/fs/binfmt_misc",
The last df is stuck in binfmt_misc.
After investigation, this is a bug in centos7 systemd, 1534701. The reason why we triggered this bug should be that when executing hadoop installation, systemd-related components were updated as dependencies, but there was no restart. The new systemd did not take effect, so after the restart, the fault was resolved.
Third, the special line traffic is large, resulting in slow running tasks.
Use tcpdump and nmap comprehensive analysis, found a large number of ARP connections, should be class B address without VLAN routing, cross-room clusters do ARP notification between each other caused broadcast storm. The subsequent solution is to re-plan the vlan for operation and maintenance.
These failures are basically not the problem of hadoop itself, just like in the previous record, the network card of one of the hundreds of machines became 10Mbps, which slowed down the running speed of the whole cluster. These problems need hadoop O & M to discover, investigate, and notify other departments. Therefore, hadoop O & M should be a bridge between the data R & D department and the O & M department. It can quickly locate where problems occur between hadoop, data applications, operating systems, and hardware, and then arrange relevant personnel to solve them. The faster the location, the more cost can be saved. Time cost and money cost are costs. For example, it is said that our company pulls dedicated lines across clusters for ten thousand a day. Customers don't run out of data within a limited time and report losing more money.
When the cross-room migration is finished, you can write it specifically.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.