Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Arthas to solve the problem of slow loading EditLog when starting StandbyNameNode

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article focuses on "how to use Arthas to solve the problem of starting StandbyNameNode to load EditLog slowly", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to use Arthas to solve the problem of slow loading EditLog when starting StandbyNameNode".

The company sets up a new HDFS cluster, and namenode does the ha, but there is a strange phenomenon when starting the StandbyNamenode node: the empty cluster loads Editlog very slowly, and each restart takes almost 20 to 30 minutes.

To make it easier for you to understand, let's talk about the startup process of StandbyNamenode (hereinafter referred to as SNN):

When SNN starts, if there is no FSImage locally, it will go to ANN (ActiveNamenode) to pull FSImage.

If there is a FSImage locally, then the editlog of gap will be pulled from JournalNode according to transactionId and merged locally.

The problem is in step 2, where there is a fixed 15s delay in pulling the EditLog from JournalNode. Generally speaking, the empty cluster has almost no operations, and the editlog will not be too large, so it should not take 15 seconds to pull the EditLog from the JournalNode. The logs are as follows (to facilitate observation and interception of some logs):

2020-11-04 18 http://cbdp-online1.sdns.fin 277 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp)-Fast-forwarding stream 'http://cbdp-online1.sdns.fin ancial.cloud:8480/getJournal?jid=hdfs-ha&segmentTxId=213656&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299 mure14abefbdc11 in Progresses Oklahoma truth' to transaction ID 1842692020-11-04 18VOV 2742582 INFO namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords (289))-replaying edit log: 1Plus 44 transactions completed. (2) 2020-11-04 18 INFO namenode.FSImage 27 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits)-Edits file http://cbdp-online1.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha & segmentTxId=213656&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true, http://cbdp-online2.sdns.financial.cloud:8 480/getJournal?jid=hdfs-ha&segmentTxId=213656&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true Http://cbdp-onli ne3.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&segmentTxId=213656&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgres sOk=true of size 5981 edits # 44 loaded in 15 seconds.2020-11-04 18 purge 27 ne3.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&segmentTxId=213656&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501 42583 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp (177))-Fast-forwarding stream 'http://cbdp-online1.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&; SegmentTxId=213700&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299 house 14abefbdc11 replaying edit log in Progresses Oklahoma truth'to transaction ID 184269 20-11-04 18 INFO namenode.FSEditLogLoader 27 INFO namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords (289))-replaying edit log: 1 transactions completed 53. (2) 2020-11-04 18 Fringe 27 FSEditLogLoader.java:loadFSEdits 57 589 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits)-Edits file http://cbdp-online1.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&;segmentTxId=213700&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true, http://cbdp-online2.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&; SegmentTxId=213700&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true, http://cbdp-online3.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&;segmentTxId=213700&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true of size 7088 edits # 53 loaded in 15 seconds1. First of all, through the log preliminary positioning code, roughly locate the time-consuming method trace org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader loadFSEdits2. The above results can only determine the roughly time-consuming method block, but can not accurately locate the actual time-consuming method. If you want to locate accurately, you need to expand layer by layer, including callback function and native function. In order to locate the code more conveniently, let's first execute profiler start and observe the time-consuming function call.

Profiler start/stop

3. Continue to trace the function trace org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1 run4. Because jdk function tracking is involved in the process, we need to set options unsafe truetrace-- skipJDKMethods false sun.net.www.http.HttpClient parseHTTPHeadertrace-- skipJDKMethods false java.net.SocketInputStream socktRead'# cost > 10000'

5. Finally, we confirm the code execution path stack * SocketInputStream socketRead "# cost > 10000" by calling the stack.

It is found that due to the blocking caused by StandbyNameNode network reading data, the native function has been encountered, and there is no effective method for analysis at the java level.

Then I saw StandbyNameNode's log:

2020-11-04 18 RedundantEditLogInputStream.java:nextOp 2742583 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp)-Fast-forwarding stream 'http://cbdp-online1.sdns.financial.cloud:8480/getJournal?jid=hdfs-ha&;segmentTxId=213700&storageInfo=-64%3A272699407%3A1603893889358%3ACID-aa8ec1b5-a501-4195-9299-e14abefbdc11&inProgressOk=true' to transaction ID 184269

At the same time, I think of the idea put forward by @ Heiyan, which may be blocked when reading EditLog files on the JournalNode side.

6. It takes time for us to call trace on the JournalNode side tracking code-- skipJDKMethods false org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet doGet'# cost > 10000'

It was found that it took 15 seconds to call java.net.InetSocketAddress.getHostName, and the culprit was found.

Conclusion:

After analysis, it is found that when Kerberos is enabled, the JournalNode side will enter the method isValidRequestor when it responds to the call of the getEditLog interface. At this time, it will parse the hostName of the SecondNameNode and search the corresponding principal accordingly.

The dns domain name resolution service cannot obtain the default address of SecondNameNode 0.0.0.0hostName 9868, that is, it cannot resolve the hostName of 0.0.0.0. A timeout of 15s is returned here, so each time it takes 15s to obtain the EditLog of JournalNode through URLLog, it will take an extra 15s, causing the SNN to load EditLog slowly.

In order to verify the conjecture, configure 0.0.0.0 0.0.0.0 in the hosts file of each JournalNode node, restart SNN, and speed up 20 times.

At this point, I believe you have a deeper understanding of "how to use Arthas to solve the problem of starting StandbyNameNode to load EditLog slowly". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report