In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Recently, the company did not shut down the Hadoop cluster before the power outage, resulting in data loss, namenode broken, unable to start, so I tried to recover.
Method 1: use hadoop namenode-importCheckpoint
1. Delete the name directory:
1 [hadoop@node1 hdfs] $rm-rf name
2. Close the cluster and copy the namesecondary directory from secondarynamenode to dfs.name.dir:
[hadoop@node2 hdfs] $scp-r namesecondary node1:/app/user/hdfs/fsp_w_picpath 100% 157 0.2KB/s 00:00 fstime 8 0.0KB/s 00:00 fsp_w_picpath 100% 2410 2.4KB/s 00:00 VERSION 100% 101 0.1KB/s 00:00 edits 100% 4 0.0KB/ S 00:00 fstime 100% 8 0.0KB/s 00:00 fsp_w_picpath 100% 2410 2.4KB/s 00:00 VERSION 100% 101 0.1KB/s 00:00 edits 100 0.0KB/s 00:00
3. Execute hadoop namenode-importCheckpoint on the namenode node
[hadoop@node1 hdfs] $hadoop namenode-importCheckpoint13/11/14 07:24:20 INFO namenode.NameNode: STARTUP_MSG: / * STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = node1/192.168.1.151STARTUP_MSG: args = [- ImportCheckpoint] STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 911707 Compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 bonus Grease 13 Universe 14 07:24:20 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode Port=900013/11/14 07:24:20 INFO namenode.NameNode: Namenode up at: node1.com/192.168.1.151:900013/11/14 07:24:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null13/11/14 07:24:20 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext13/11/14 07:24:21 INFO namenode.FSNamesystem: fsOwner=hadoop Hadoop13/11/14 07:24:21 INFO namenode.FSNamesystem: supergroup=supergroup13/11/14 07:24:21 INFO namenode.FSNamesystem: isPermissionEnabled=true13/11/14 07:24:21 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext13/11/14 07:24:21 INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean13/11/14 07:24:21 INFO common.Storage: Storage directory / app/user/hdfs/name is not formatted.13/11/14 07:24:21 INFO common.Storage : Formatting... 13-11-14 07:24:21 INFO common.Storage: Number of files = 2613-11-14 07:24:21 INFO common.Storage: Number of files under construction = 013 Number of files under construction 11 loaded in 14 07:24:21 INFO common.Storage: Image file of size 2410 loaded in 0 seconds.13/11/14 07:24:21 INFO common.Storage: Edits file / app/user/hdfs/namesecondary/current/edits of size 4 edits # 0 loaded in 0 seconds.13/11/14 07:24:21 INFO common.Storage: Image file of size 2410 saved in 0 seconds.13/11/14 07:24:21 INFO common.Storage: Image file of size 2410 saved in 0 seconds.13/11/14 07:24:21 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions (ms): 0Number of transactions batched in Syncs: 0Number of syncs: 0 SyncTimes (ms): 0 13-11-14 07:24:21 INFO namenode.FSNamesystem: Finished loading FSImage in 252 msecs13/11/14 07:24:21 INFO hdfs.StateChange: STATE* Safe mode ON. The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.13/11/14 07:24:21 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter (org.mortbay.log) via org.mortbay.log.Slf4jLog13/11/14 07:24:21 INFO http.HttpServer: Port returned by webServer.getConnectors () [0] .getLocalPort () before open () is-1. Opening the listener on 5007013 11 INFO http.HttpServer 14 07:24:21 INFO http.HttpServer: listener.getLocalPort () returned 50070 webServer.getConnectors () [0] .getLocalPort () returned 5007013 INFO http.HttpServer 14 07:24:21 INFO http.HttpServer: Jetty bound to port 5007013 Jetty bound to port 11 Hammer14 07:24:21 INFO mortbay.log: jetty-6.1.1413/11/14 07:24:21 INFO mortbay.log: Started SelectChannelConnector@node1.com:5007013/11/14 07:24:21 INFO namenode.NameNode: Web-server up At: node1.com:5007013/11/14 07:24:21 INFO ipc.Server: IPC Server Responder: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server listener on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 0 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 1 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 2 on 9000: starting13/11/14 07:24 : 21 INFO ipc.Server: IPC Server handler 3 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 4 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 5 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 6 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 9 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 7 on 9000: starting13/11/14 07:24:21 INFO ipc.Server: IPC Server handler 8 on 9000: starting13/11/14 07:37:05 INFO namenode.NameNode: SHUTDOWN_MSG: / * SHUTDOWN_MSG: Shutting down NameNode at node1/192.168 .1.151 * * / [hadoop@node1 current] $start-all.sh starting namenode Logging to / app/hadoop/bin/../logs/hadoop-hadoop-namenode-node1.out192.168.1.152: starting datanode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-datanode-node2.out192.168.1.153: starting datanode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-datanode-node3.out192.168.1.152: starting secondarynamenode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-node2.outstarting jobtracker Logging to / app/hadoop/bin/../logs/hadoop-hadoop-jobtracker-node1.out192.168.1.152: starting tasktracker, logging to / app/hadoop/bin/../logs/hadoop-hadoop-tasktracker-node2.out192.168.1.153: starting tasktracker, logging to / app/hadoop/bin/../logs/hadoop-hadoop-tasktracker-node3.out [hadoop@node1 current] $jps1027 JobTracker1121 Jps879 NameNode
Summary:
Note: the content of the last check of the secondarynamenode in the restored namenode to the time when the failure occurred will be lost, so the fs.checkpoint.period parameter value should be weighed as much as possible in the actual setting. It also backs up the contents of the secondarynamenode node from time to time, because scondarynamenode is also a single point in case of failure.
Additional note: if you are using a new node to restore namenode, you should pay attention to
1. The Linux environment, directory structure, environment variables and other configurations of the new node need to be exactly the same as the original namenode, including all the file configurations in the conf directory.
2. The hostname of the new namenode should be the same as that of the original namenode. If you rename the hostname, you need to replace the hosts files of datanode and secondarynamenode in batches, and reconfigure the fs.default.name in some core-site.xml files of the following files.
Dfs.http.address in hdfs-site.xml file (on secondarynamenode node)
Mapred.job.tracker in the mapred-site.xml file (if jobtracker and namenode are on the same machine, they are usually on the same machine).
There is a second way:
Use namespaceID
1. Close the cluster and format namenode:
1 [hadoop@node1 name] $stop-all.sh 2 stopping jobtracker 3 192.168.1.152: stopping tasktracker 4 192.168.1.153: stopping tasktracker 5 no namenode to stop 6 192.168.1.152: stopping datanode 7 192.168.1.153: stopping datanode 8 192.168.1.152: stopping secondarynamenode 9 [hadoop@node1 name] $hadoop namenode-format10 13-11-14 06:21:37 INFO namenode.NameNode: STARTUP_MSG: 11 / * * * 12 STARTUP_MSG: Starting NameNode13 STARTUP_MSG: host = node1/192.168.1.15114 STARTUP_MSG: args = [- format] 15 STARTUP_MSG: version = 0.20.216 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 911707 Compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 201017 * / 18 Re-format filesystem in / app/user/hdfs/name? (Y or N) Y19 06:21:39 on 13-11-14 INFO namenode.FSNamesystem: fsOwner=hadoop Hadoop20 13-11-14 06:21:39 INFO namenode.FSNamesystem: supergroup=supergroup21 13-11-14 06:21:39 INFO namenode.FSNamesystem: isPermissionEnabled=true22 13-11-14 06:21:39 INFO common.Storage: Image file of size 96 saved in 0 seconds.23 13-11-14 06:21:39 INFO common.Storage: Storage directory / app/user/hdfs/name has been successfully formatted.24 13-11-14 06:21:39 INFO namenode.NameNode: SHUTDOWN_MSG: 25 / * * * 26 SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.1.15127 * /
2. Obtain the namespaceID before namenode formatting from any datanode and modify the namespaceID of namenode is the same as datanode:
# Thu Nov:: CST namespaceID storageIDDS. CTime storageType layoutVersion apphdfsdata-modify namenode's namespaceID---- # Thu Nov:: CST namespaceID cTime storageType layoutVersion
3. Delete the fsp_w_picpath file of the new namenode:
1 [hadoop@node1 current] $ll2 total 163-rw-rw-r-- 1 hadoop hadoop 4 Nov 14 06:21 edits4-rw-rw-r-- 1 hadoop hadoop 96 Nov 14 06:21 fsp_w_picpath6-rw-rw-r-- 1 hadoop hadoop 8 Nov 14 06:21 fstime6-rw-rw-r-- 1 hadoop hadoop 101 Nov 14 06:22 VERSION7 [hadoop@node1 current] $rm fsp_w_picpath
4. Copy fsp_w_picpath from Secondarynamenode to the current directory of Namenode:
[hadoop@node2 current] $lltotal 16 RW Nov 14 05:38 edits-rw-rw-r-- 1 hadoop hadoop 2410 Nov 14 05:38 fsp_w_picpath-rw-rw-r-- 1 hadoop hadoop 8 Nov 14 05:38 fstime-rw-rw-r-- 1 hadoop hadoop Nov 14 05:38 VERSION [hadoop@node2 current] $scp fsp_w_picpath node1:/app/user/hdfs/name/currentThe authenticity of host 'node1. 168.1.151) 'can't be established.RSA key fingerprint is ca:9a:7e:19:ee:a1:35:44:7e:9d:d4:09:5c:fc:c5:0a.Are you sure you want to continue connecting (yes/no)? YesWarning: Permanently added 'node1192.168.1.151' (RSA) to the list of known hosts.fsp_w_picpath 100% 2410 2.4KB/s 00:00
5. Restart the cluster:
[hadoop@node1 current] $start-all.sh starting namenode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-namenode-node1.out192.168.1.152: starting datanode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-datanode-node2.out192.168.1.153: starting datanode, logging to / app/hadoop/bin/../logs/hadoop-hadoop-datanode-node3.out192.168.1.152: starting secondarynamenode Logging to / app/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-node2.outstarting jobtracker, logging to / app/hadoop/bin/../logs/hadoop-hadoop-jobtracker-node1.out192.168.1.152: starting tasktracker, logging to / app/hadoop/bin/../logs/hadoop-hadoop-tasktracker-node2.out192.168.1.153: starting tasktracker, logging to / app/hadoop/bin/../logs/hadoop-hadoop-tasktracker-node3.out [hadoop@node1 current] $jps32486 Jps32419 JobTracker32271 NameNode
In the second method, the first step is to format the namenode, and the second step is to use the backup to restore the previous data.
When I restored it, I also did it. The backup data was actually from September last year, and it was also bad.
Unfortunately, all the data are gone. So be sure to manually back up namenode and secondrynamenode regularly, because your system is also a single point of backup, which is very unreliable. After a few days, there is still no data recovery. It's a lesson to learn.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.