Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of tuning hadoop parameters

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the method of hadoop parameter tuning". In daily operation, I believe that many people have doubts about what the method of hadoop parameter tuning is. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "what is the method of hadoop parameter tuning?" Next, please follow the editor to study!

I. hdfs-site.xml configuration file

1 、 dfs.blocksize

Parameter: hadoop file block size

Description: the default block size for new files, in bytes, defaults to 134217728 bytes.

You can use the following suffixes (case insensitive): K (kilo), m (mega), g (giga), t (tera), p (peta), e (exa) to specify the size (such as 128k, 512m, 1g, etc.)

Or provide the full size in bytes.

2 、 dfs.namenode.handler.count

Parameter: number of server threads of namenode

Description: NameNode has a worker thread pool for handling client remote procedure calls and cluster daemon calls. A larger number of handlers means a larger pool to handle concurrent heartbeats from different DataNode and client-side concurrent metadata operations. For large clusters or clusters with a large number of clients, you usually need to increase the default value of the parameter dfs.namenode.handler.count by 10. The general principle for setting this value is to set it to the natural logarithm of the cluster size multiplied by 20, that is, 20 logN, N is the cluster size.

3 、 dfs.datanode.balance.bandwidthPerSec

Parameter: datanode balanced bandwidth

Description: specifies that each datanode can use bytes per second to balance the maximum bandwidth of the target.

4 、 dfs.replication

Parameter: number of block copies

Description: default block copy. You can specify the actual number of copies when you create the file. If replication is not specified in create time, the default value of 3 is used.

5 、 dfs.datanode.max.transfer.threads

Parameter: datanode maximum number of transmission lines

Description: specifies the maximum number of threads used to transfer data in and out of DN. If there is inconsistency in the cluster, the data will be unevenly distributed.

II. Core-site.xml configuration file

1 、 io.file.buffer.size

Parameter: buffer size of the file

Description: the buffer size used for sequential files. The size of this buffer should be a multiple of the hardware page size (4096 on Intel x86), which determines how much data is buffered during read and write operations. The cache size of SequenceFiles read and write operations, as well as the output of map, use this buffer capacity, which can reduce the number of Ihand O times. It is recommended to set it to 64KB to 128KB

III. Yarn-site.xml configuration file

1 、 yarn.nodemanager.resource.memory-mb

Parameter: nodemanager resource pool memory of this node

Description: the total amount of physical memory available on the NodeManager node is 8192 (MB) by default, which can be allocated according to the maximum memory that can be allocated by the node. Please reserve resources for the operating system and other services.

2 、 yarn.nodemanager.resource.cpu-vcores

Parameter: how many cpu of this node join the resource pool? default is 8.

Description: indicates the number of virtual CPU available for YARN on this node. The default is 8. Note that it is recommended to set this value to the same number as the number of physical CPU cores. If your node has less than 8 CPU cores, you need to reduce this value, and YARN will not intelligently detect the total number of physical CPU of the node.

At this point, the study of "what is the method of tuning hadoop parameters" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report