Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The CPU optimized by hive job sql is too high

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Recently, a SQL has been running for more than two hours, so I'm going to optimize it.

First, check the counter data discovery of job generated by hive sql.

The total CPU time spent is overestimated by 100.4319973 hours

CPU time spent for each map

The first one took 2.0540889 hours.

It is recommended to set the following parameters:

1. Mapreduce.input.fileinputformat.split.maxsize is now 256000000 downwards to increase the number of maps (this move has an immediate effect. I set it to 32000000 to generate a 500 + map, and the final task is accelerated from 2 hours to 47 minutes.)

2. Optimize UDF getPageID getSiteId getPageValue (these methods use a lot of text matching of regular expressions)

2.1 regular expression processing optimization can refer to

Http://www.fasterj.com/articles/regex1.shtml

Http://www.fasterj.com/articles/regex2.shtml

2.2 UDF optimization see

1 Also you should use class level privatete members to save on object incantation and garbage collection.2 You also get benefits by matching the args with what you would normally expect from upstream. Hive converts text to string when needed, but if the data normally coming into the method is text you could try and match the argument and see if it is any faster. Exapmle: before optimization: > import org.apache.hadoop.hive.ql.exec.UDF; > import java.net.URLDecoder; > public final class urldecode extends UDF {> public String evaluate (final String s) {> if (s = = null) {return null;} > return getString (s); > public static String getString (String s) {> String a; > try {> a = URLDecoder.decode (s) >} catch (Exception e) {> a = ""; >} > return a; > public static void main (String args []) {> String t = "% E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; > System.out.println (getString (t)); >} >}

After optimization:

Import java.net.URLDecoder;public final class urldecode extends UDF {private Text t = new Text (); public Text evaluate (Text s) {if (s = = null) {return null;} try {t.set (URLDecoder.decode (s.toString (), "UTF-8")); return t;} catch (Exception e) {return null }} / / public static void main (String args []) {/ / String t = "% E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; / / System.out.println (getString (t)); / /}} 3 inherit to implement GenericUDF

3. If it is Hive 0.14 +, you can enable hive.cache.expr.evaluation UDF Cache function.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report