Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of HDFS client read and write timeout

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you an example of HDFS client read and write timeout analysis, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Background

Some time ago, the disk utilization of our Hadoop cluster was relatively high, and some hard disk space was used more than 70%. The DataNode service read and write load of these servers was relatively high, resulting in the failure of some data synchronization tasks due to read and write timeouts. The specific scenarios and exceptions are very similar to those described in this blog.

The DFSClient client interacts with HDFS to read and write data, and needs to get metadata information from NameNode, and then interact with DataNode. Therefore, the timeout also involves these two services. The following is the resolution of the timeout for the interaction between the client and the two services.

Client and NameNode timeout

The operation timeout between the client and NameNode is controlled by the following two configurations:

Ipc.client.ping: the default is true. When configured as true, the client will try its best to wait for the server to respond and send ping messages periodically so that the connection will not be disconnected because of tcp timeout. When configured as false, the client uses the value corresponding to the configuration item ipc.ping.interval as the timeout time, within which no response is received, that is, the timeout occurs.

Ipc.ping.interval: in milliseconds, the default value is 1 minute. When ipc.client.ping is configured as true, it represents the period in which ping messages are sent. When ipc.client.ping is set to false, it indicates the timeout of the connection.

When the NameNode node is at full load and the CPU of the node where the NameNode is located is exhausted, the NameNode cannot respond. For the HDFS client newly connected to the NameNode, you can switch to another NameNode for normal operation, while the HDFS client that is already connected to the NameNode node may get stuck and cannot proceed to the next step.

There is a keep alive mechanism in the RPC connection between the HDFS client and the NameNode. Keeping the connection will not time out, and try your best to wait for the response from the server. As a result, the operation of the connected HDFS client will be stuck.

For HDFS clients that are already stuck, you can do the following:

Wait for the NameNode response, and once the CPU utilization of the node where the NameNode is located drops, and the NameNode can regain the CPU resources, the HDFS client will get the response.

If you cannot wait longer, you need to restart the application process where the HDFS client is located, causing the HDFS client to reconnect to the idle NameNode.

To avoid this problem, in the above problem scenario, you can make the following configuration in the client configuration file core-site.xml:

Configure ipc.client.ping as false, so that the client will use the value corresponding to the configuration item ipc.ping.interval as the timeout time. If there is no response within this time, the timeout will occur.

Configure ipc.ping.interval with a large timeout to avoid the timeout when the service is busy. It is recommended to configure 900000 in ms.

Client and DataNode read and write timeout

The read and write timeout of DataNode is controlled by the following two configurations:

Read timeout: dfs.client.socket-timeout. The default is 1 minute.

Write timeout: dfs.datanode.socket.write.timeout. The default is 8 minutes.

The above configuration is set in the HDFS client, and their default values are in the org.apache.hadoop.hdfs.server.common.HdfsServerConstants class:

/ / Timeouts for communicating with DataNode for streaming writes/reads / / DataNode read and write timeout public static final int READ_TIMEOUT = 60 * 1000; public static final int READ_TIMEOUT_EXTENSION = 5 * 1000; public static final int WRITE_TIMEOUT = 8 * 60 * 1000; public static final int WRITE_TIMEOUT_EXTENSION = 5 * 1000; / / for write pipeline

The read and write timeout of DataNode is related to the number of DataNode. Finally, the read and write timeout is determined according to the number of DataNode. It is calculated by multiplying the value of read and write timeout by the number of nodes. The logic is in the org.apache.hadoop.hdfs.DFSClient class:

/ * Return the timeout that clients should use when writing to datanodes. * @ param numNodes the number of nodes in the pipeline. The number of nodes in the pipe * / int getDatanodeWriteTimeout (int numNodes) {return (dfsClientConf.confTime > 0)? (dfsClientConf.confTime + HdfsServerConstants.WRITE_TIMEOUT_EXTENSION * numNodes): 0;} int getDatanodeReadTimeout (int numNodes) {return dfsClientConf.socketTimeout > 0? (HdfsServerConstants.READ_TIMEOUT_EXTENSION * numNodes + dfsClientConf.socketTimeout): 0;} these are all the contents of the article "sample Analysis of HDFS client read and write timeout". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report