In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you "how to tune the performance of the hadoop level", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to optimize the performance of the hadoop level" this article.
Hadoop layer performance tuning 1. Daemon for memory tuning
A) NameNode and DataNode memory adjustments in the hadoop-env.sh file
NameNode: ExportHADOOP_NAMENODE_OPTS= "- Xmx512m-Xms512m-Dhadoop.security.logger=$ {HADOOP_SECURITY_LOGGER:-INFO,RFAS}-Dhdfs.audit.logger=$ {HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
DataNode:
Export HADOOP_DATANODE_OPTS= "- Xmx256m-Xms256m-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
The two parameters-Xmx-Xms are generally the same to avoid JVM reallocation of memory after each garbage collection.
B) REsourceManager and NodeManager memory adjustments in the yarn-env.sh file
REsourceManager:
Export YARN_RESOURCEMANAGER_HEAPSIZE=1000 defaults to export YARN_RESOURCEMANAGER_OPTS= "." You can override the above values
NodeManager:
Export YARN_NODEMANAGER_HEAPSIZE=1000 default export YARN_NODEMANAGER_OPTS= "; can override the above values
Resident memory experience configuration:
Namenode:16G
Datanode:2-4G
ResourceManager:4G
NodeManager:2G
Zookeeper:4G
Hive Server:2G
2. Configure multiple mr intermediate directories to disperse the IO pressure
Http://hadoop.apache.org/docs/r2.6.0/
Profile yarn-default.xml disperses IO pressure
Yarn.nodemanager.local-dirs
Yarn.nodemanager.log-dirs
Profile mapred-default.xml:
Mapreduce.cluster.local.dir
Profile hdfs-default.xml: improving reliability
Dfs.namenode.name.dir
Dfs.namenode.edits.dir
Dfs.datanode.data.dir
3. Mr intermediate results should be compressed
A) configuration in the configuration mapred-site.xml file
Mapreduce.map.output.compress
True
Mapreduce.map.output.compress.codec
Org.apache.hadoop.io.compress.SnappyCodec
Specify the parameter hadoop jar / home/hadoop/tv/tv.jar MediaIndex-Dmapreduce.compress.map.output=true-Dmapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec / tvdata / media when the program is running
B) use reasonable compression algorithms (cpu and disk) cpu: if it is the bottleneck of cpu, you can replace the fast compression algorithm disk: if it is the bottleneck of the disk, you can replace the compression algorithm with high compression strength. In general, we use snappy compression to balance lzo.
4. Avoid in hdfs file system, a large number of small files exist in 5. According to the specific situation, use Combiner on the Map node to reduce the output.
6. Reuse Writable types
For example, declare an object Text word = new Text (); map (), reuse in the reduce () method
7. Adjust the parallelism of task according to the specific conditions of cluster nodes
Set the maximum number of map and reduce tasks:
Mapreduce.tasktracker.map.tasks.maximum
Mapreduce.tasktracker.reduce.tasks.maximum
Profile mapred-default.xml:
Set the map and reduce memory size for a single task:
Mapreduce.map.memory.mb 1G default
Mapreduce.reduce.memory.mb 1G default
8. For effective monitoring methods (using nmon, ganglia will be deployed to collect various metrics, analyze metrics to find bottlenecks, and then specify measures) hardware-level performance tuning:
The rack is separated and the nodes are placed evenly.
Performance tuning at the operating system level:
Multiple network cards: bind multiple network cards to do load balancing or active / standby
Disk: multiple disks are mounted to different directories. Disks that store data for calculation should not be raid.
Cluster planning:
Cluster node memory allocation:
For example, a data node, if task parallelism is 8 DataNode (2x4G) + NodeManager (2G) + Zookeeper (4G) + 1G (default size of a single task) * 8=16G~18G
Cluster size: if 1T of data per day is saved for one month, it is common for enterprises to keep data for 7 days and 15 days per node if the hard disk of each node is 2T 1T*3 (replica) * 30 (days) = 90T (replica) 2T * (60cm 70%) nautical 60 nodes. If the data is more important, one month.
These are all the contents of the article "how to tune the performance at the hadoop level". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.