Example Analysis of hive-site File in Hive Operation and maintenance 05/06 Update SLTechnology News&Howtos

Example Analysis of hive-site File in Hive Operation and maintenance

2025-05-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the example analysis of hive-site files in Hive operation and maintenance. I believe most people don't know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.

0. Priority set by hive (from high to low):

1. Hive set command.

two。 Command line selection-hiveconf

3. Hive-site.xml

4. Hive-default.xml

5. hadoop-site.xml (or core-site.xml hdfs-site.xml mapred-site.xml)

6. Hadoop-default.xml (or core-default.xml hdfs-default.xml mapred-default.xml).

7. The log information of hive is stored in / tmp/$USER/hive.log. In case of an error, the mapred task logs of hadoop can also be viewed, and the local environment can be viewed under / tmp/nslab.

Command: hive-hiveconf hive.root.logger=DEBUG,console prints debug information to the console.

The use of set

1. Use set to view the values of the settings:

Set hive.enforce.bucketing

two。 Enter only one set and all the settings are listed.

3. Set the new property in a format similar to the following:

Set hive.enforce.bucketing=true

1. Dynamic partitioning:

Hive.exec.dynamic.partition

Whether to turn on dynamic partitions.

Default: false

Hive.exec.dynamic.partition.mode

When dynamic partitioning is turned on, the mode of dynamic partitioning can be selected with two values: strict and nonstrict. Strict requires at least one static partition column, while nonstrict does not.

Default: strict

Hive.exec.max.dynamic.partitions

The maximum number of dynamic partitions allowed.

Default: 1000

Hive.exec.max.dynamic.partitions.pernode

The maximum number of dynamic partitions allowed by a single reduce node.

Default: 100

Hive.exec.default.partition.name

The name of the default dynamic partition, which is used when the dynamic partition is listed as''or null.''

two。 Print column name, turn on row-to-column (to be tested)

Set hive.cli.print.header=true; / / print column name

Set hive.cli.print.row.to.vertical=true; / / enable the row transfer function, on the premise that the printing column name function must be enabled

Set hive.cli.print.row.to.vertical.num=1; / / sets the number of columns displayed per row

3. View the hive version:

Set hive.hwi.war.file

4. View the hive command line character encoding:

Hive.cli.encoding

Hive default command line character encoding.

Default: 'UTF8'

5. Hive Fetch Task execution:

Set hive.fetch.task.conversion=more

For simple SELECT from LIMIT n-like statements that do not require aggregation, you do not need to use MapReduce job to obtain data directly through Fetch task (if the amount of data is too large, no result can be returned)

Vi, like linux, manipulates text directly.

It is also a bit similar to the operation of shark's column storage: it is placed in the same array, so the data is queried quickly.

Hive.fetch.task.conversion

Default mapreduce operation for Hive

Default: minimal

6. MapJoin

The old version of HIVE needs to add / * + MAPJOIN (tablelist) * / prompt the optimizer to convert to MapJoin after the SELECT keyword of the query / subquery. The higher version only needs to be set:

Set hive.auto.convert.join=true

HIVE chooses a small table as the left table of LEFT.

7. Strict Mode:

Hive.mapred.mode=true, strict mode does not allow the following queries to be executed:

No partition is specified on the partition table

Order by statements without limit restrictions

Cartesian product: there is no ON statement in JOIN

8. Execute tasks concurrently:

Set this parameter to control whether different job in the same sql can be run at the same time. The default is false.

Hive.exec.parallel=true, default is false

Hive.exec.parallel.thread.number=8

9. Load balancing

Hive.groupby.skewindata=true: load balancer when data is skewed. If the selected item is set to true, the generated query plan will have two MRJob. In the first MRJob

The output result set of Map will be randomly distributed to Reduce, and each Reduce will do partial aggregation operation and output the result, so that the result will be the same GroupBy Key.

It is possible to be distributed to different Reduce to achieve the purpose of load balancing; the second MRJob is then distributed according to GroupBy Key according to the preprocessed data results

In Reduce (this process ensures that the same GroupBy Key is distributed to the same Reduce), and finally completes the final aggregation operation.

10. Hive.exec.rowoffset: whether to provide virtual column 11. Set hive.error.on.empty.partition=true; so if the dynamic partition is empty, an exception will be reported

Set hive.error.on.empty.partition = true

Set hive.exec.dynamic.partition.mode=nonstrict

Reference address: http://my.oschina.net/repine/blog/541380

12. Hive.merge.mapredfiles: merge small files

The job requires merging reduce to produce files:

Set hive.merge.smallfiles.avgsize=67108864

Set hive.merge.mapfiles=true

Set hive.merge.mapredfiles=true

Reference address: http://www.linuxidc.com/Linux/2015-06/118391.htm

1. First set the standard for small files in hive-site.xml.

Hive.merge.smallfiles.avgsize

536870912

When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

two。 Output for mapreduce with only map and merge small files.

Hive.merge.mapfiles

True

Merge small files at the end of a map-only job

3. Output and merge small files for mapreduce containing reduce.

Hive.merge.mapredfiles

True

Merge small files at the end of a map-reduce job

The above is all the contents of the article "sample Analysis of hive-site Files in Hive Operation and maintenance". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.