Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Optimize cdh cluster performance-can be operated before installing the cluster (best practices for MapReduce configuration) 002

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Optimize cdh cluster performance-002 can be operated before cluster installation

/ / after reading the official cdh document, you can know the optimization operation.

Linux optimization (involving Linux memory parameter optimization) before setting up cdh production environment in 03

Https://blog.51cto.com/12445535/2365948 operates at the same time.

Explained:

Provides solutions to some performance problems and introduces configuration best practices.

1. Disable tuned service / / memory allocation management

/ / what is the tuned service?

RHEL/CentOS introduces a new set of system tuning tool tuned/tuned-adm after version 6.3, in which tuned is a server program, which is used to monitor and collect data from various components of the system, and dynamically adjust system settings according to the information provided by the data, so as to achieve the purpose of dynamic optimization of the system. Tuned-adm is a client program used to deal with tuned. Managing and configuring tuned,tuned-adm in a command-line manner provides some pre-configured optimization solutions for direct use.

/ / dynamic tuning scheme, users can adopt different tuning schemes in different time periods. Because it exists in the form of a service process, it can be easily combined with crontab! Tuned is a daemon that monitors and collects usage data for individual system components, and uses that information to dynamically adjust system settings as needed. It can respond to changes in CPU and network usage and adjust settings to improve the performance of active devices or reduce power consumption of inactive devices.

See figure:

Details [root@NewCDH-0--141 ~] # cat / etc/redhat-release CentOS Linux release 7.1.1503 (Core) [root@NewCDH-0--141] # systemctl status tuned ● tuned.service-Dynamic System Tuning Daemon Loaded: loaded (/ usr/lib/systemd/system/tuned.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2019-01-23 10:47:06 CST 2 months 0 days ago Main PID: 897 (tuned) CGroup: / system.slice/tuned.service └─ 897 / usr/bin/python-Es / usr/sbin/tuned-l-PJan 23 10:47:02 NewCDH-0--141 systemd [1]: Starting Dynamic System Tuning Daemon...Jan 23 10:47:06 NewCDH-0--141 systemd [1]: Started Dynamic System Tuning Daemon. [root @ NewCDH-0--141 ~] # tuned-adm listAvailable profiles:- balanced- Desktop- latency-performance- network-latency- network-throughput- powersave- throughput-performance- virtual-guest- virtual-hostCurrent active profile: virtual-guest [root@NewCDH-0--141 ~] # tuned-adm off / / close the tuning service [root@NewCDH-0--141 ~] # tuned-adm listAvailable profiles:- balanced- desktop- latency-performance- network-latency- network-throughput- powersave- throughput-performance- virtual-guest- virtual-hostNo current active profile. [root @ NewCDH-0--141 ~ ] # systemctl stop tuned [root @ NewCDH-0--141 ~] # systemctl disable tuned

2. Disable transparent repetitive pages (THP) / / as mentioned in the previous blog, it is not cumbersome here

For more information, please see: Linux initialization script (centos6 centos7 generic) https://blog.51cto.com/12445535/2362407 14

3. Swap partition optimization: reduce the percentage of swap partition usage as above

4 、

Improve the performance of random processors and IFile readers

MapReduce shuffle processors and IFile readers use native Linux calls (posix_fadvise (2) and sync_data_range) on Linux systems with Hadoop native libraries installed.

Random processing program

You can improve the performance of MapReduce shuffle processors by enabling shuffle readahead. This causes the TaskTracker or node manager to prefetch map output before sending it to reducer through a socket.

To enable this feature for YARN, set mapreduce.shuffle.manage.os.cache to true (the default). To further adjust performance, adjust the value of mapreduce.shuffle.readahead.bytes. The default value is 4 MB.

To enable this feature for MapReduce, set mapred.tasktracker.shuffle.fadvise to true (the default). To further adjust performance, adjust the value of mapred.tasktracker.shuffle.readahead.bytes. The default value is 4 MB.

IFile readers

Enabling IFile read-ahead improves the performance of merge operations. To enable this feature for MRv1 or YARN, set mapreduce.ifile.readahead to true (the default). To further adjust performance, adjust the value of mapreduce.ifile.readahead.bytes. The default is 4MB.

5. Best practices for MapReduce configuration

The configuration settings described below can reduce inherent delays in MapReduce execution. You can set these values in mapred-site.xml.

(1)

Send a heartbeat as soon as the task is completed

Set mapreduce.tasktracker.outofband.heartbeat to true so that TaskTracker sends an out-of-band heartbeat when the task is completed to reduce latency. The default value is false:

Mapreduce.tasktracker.outofband.heartbeat

True

(2)

Reduce the interval between JobClient status reports on a single-node system

The jobclient.progress.monitor.poll.interval property defines the time interval (in milliseconds) that the JobClient reports the status to the console and checks the completion of the job. The default value is 1000 milliseconds; you may want to set this value to a lower value to make the test run faster on a single-node cluster. Adjusting this value on a large production cluster may result in unnecessary client-server traffic.

Jobclient.progress.monitor.poll.interval

ten

(3)

Adjust JobTracker heartbeat interval

Adjusting the minimum interval of TaskTracker-to-JobTracker heartbeats to a smaller value can improve MapReduce performance on small clusters.

Mapreduce.jobtracker.heartbeat.interval.min

ten

(4)

Start MapReduce JVM immediately

The mapred.reduce.slowstart.completed.maps property specifies the percentage of Map tasks in a job that must be completed before any Reduce tasks can be scheduled. For small jobs that require fast turnaround, setting this value to 0 can improve performance; a higher value (up to 50%) may be suitable for larger jobs.

Mapred.reduce.slowstart.completed.maps

0

Reference link

Optimizing Performance in CDH https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cdh_admin_performance.html

Tuned http://www.cnblogs.com/createyuan/p/5701650.html of linux service

Talking about the 12th part of linux performance tuning: red Hat Optimization Strategy Tuned https://blog.csdn.net/u013870094/article/details/51055483

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 301

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report