In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you the example analysis of hive-site files in Hive operation and maintenance. I believe most people don't know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.
0. Priority set by hive (from high to low):
1. Hive set command.
two。 Command line selection-hiveconf
3. Hive-site.xml
4. Hive-default.xml
5. hadoop-site.xml (or core-site.xml hdfs-site.xml mapred-site.xml)
6. Hadoop-default.xml (or core-default.xml hdfs-default.xml mapred-default.xml).
7. The log information of hive is stored in / tmp/$USER/hive.log. In case of an error, the mapred task logs of hadoop can also be viewed, and the local environment can be viewed under / tmp/nslab.
Command: hive-hiveconf hive.root.logger=DEBUG,console prints debug information to the console.
The use of set
1. Use set to view the values of the settings:
Set hive.enforce.bucketing
two。 Enter only one set and all the settings are listed.
3. Set the new property in a format similar to the following:
Set hive.enforce.bucketing=true
1. Dynamic partitioning:
Hive.exec.dynamic.partition
Whether to turn on dynamic partitions.
Default: false
Hive.exec.dynamic.partition.mode
When dynamic partitioning is turned on, the mode of dynamic partitioning can be selected with two values: strict and nonstrict. Strict requires at least one static partition column, while nonstrict does not.
Default: strict
Hive.exec.max.dynamic.partitions
The maximum number of dynamic partitions allowed.
Default: 1000
Hive.exec.max.dynamic.partitions.pernode
The maximum number of dynamic partitions allowed by a single reduce node.
Default: 100
Hive.exec.default.partition.name
The name of the default dynamic partition, which is used when the dynamic partition is listed as''or null.''
two。 Print column name, turn on row-to-column (to be tested)
Set hive.cli.print.header=true; / / print column name
Set hive.cli.print.row.to.vertical=true; / / enable the row transfer function, on the premise that the printing column name function must be enabled
Set hive.cli.print.row.to.vertical.num=1; / / sets the number of columns displayed per row
3. View the hive version:
Set hive.hwi.war.file
4. View the hive command line character encoding:
Hive.cli.encoding
Hive default command line character encoding.
Default: 'UTF8'
5. Hive Fetch Task execution:
Set hive.fetch.task.conversion=more
For simple SELECT from LIMIT n-like statements that do not require aggregation, you do not need to use MapReduce job to obtain data directly through Fetch task (if the amount of data is too large, no result can be returned)
Vi, like linux, manipulates text directly.
It is also a bit similar to the operation of shark's column storage: it is placed in the same array, so the data is queried quickly.
Hive.fetch.task.conversion
Default mapreduce operation for Hive
Default: minimal
6. MapJoin
The old version of HIVE needs to add / * + MAPJOIN (tablelist) * / prompt the optimizer to convert to MapJoin after the SELECT keyword of the query / subquery. The higher version only needs to be set:
Set hive.auto.convert.join=true
HIVE chooses a small table as the left table of LEFT.
7. Strict Mode:
Hive.mapred.mode=true, strict mode does not allow the following queries to be executed:
No partition is specified on the partition table
Order by statements without limit restrictions
Cartesian product: there is no ON statement in JOIN
8. Execute tasks concurrently:
Set this parameter to control whether different job in the same sql can be run at the same time. The default is false.
Hive.exec.parallel=true, default is false
Hive.exec.parallel.thread.number=8
9. Load balancing
Hive.groupby.skewindata=true: load balancer when data is skewed. If the selected item is set to true, the generated query plan will have two MRJob. In the first MRJob
The output result set of Map will be randomly distributed to Reduce, and each Reduce will do partial aggregation operation and output the result, so that the result will be the same GroupBy Key.
It is possible to be distributed to different Reduce to achieve the purpose of load balancing; the second MRJob is then distributed according to GroupBy Key according to the preprocessed data results
In Reduce (this process ensures that the same GroupBy Key is distributed to the same Reduce), and finally completes the final aggregation operation.
10. Hive.exec.rowoffset: whether to provide virtual column 11. Set hive.error.on.empty.partition=true; so if the dynamic partition is empty, an exception will be reported
Set hive.error.on.empty.partition = true
Set hive.exec.dynamic.partition.mode=nonstrict
Reference address: http://my.oschina.net/repine/blog/541380
12. Hive.merge.mapredfiles: merge small files
The job requires merging reduce to produce files:
Set hive.merge.smallfiles.avgsize=67108864
Set hive.merge.mapfiles=true
Set hive.merge.mapredfiles=true
Reference address: http://www.linuxidc.com/Linux/2015-06/118391.htm
1. First set the standard for small files in hive-site.xml.
Hive.merge.smallfiles.avgsize
536870912
When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.
two。 Output for mapreduce with only map and merge small files.
Hive.merge.mapfiles
True
Merge small files at the end of a map-only job
3. Output and merge small files for mapreduce containing reduce.
Hive.merge.mapredfiles
True
Merge small files at the end of a map-reduce job
The above is all the contents of the article "sample Analysis of hive-site Files in Hive Operation and maintenance". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.