How to integrate HIve with Atlas 04/24 Update SLTechnology News&Howtos

How to integrate HIve with Atlas

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "Atlas how to integrate HIve", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "Atlas how to integrate HIve" this article.

Atlas integrated Hive

After installing Atlas, you need to connect Atlas to other components if you want to use it.

One of the most commonly used is Hive.

Through the architecture of Atlas, as long as Hive Hook is configured, every time Hive does anything, it will be written to Kafka and received by atlas.

And show it in the form of a diagram in Atlas.

Hive Model

What operation information will be recorded in Hive? Altas defines Hive Model.

Contains the following:

1. Entity type: hive_db

Type: Asset

Attributes: qualifiedName, name, description, owner, clusterName, location, parameters, ownerName

Hive_table

Type: DataSet

Attributes: qualifiedName, name, description, owner, db, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary

Hive_column

Type: DataSet

Attributes: qualifiedName, name, description, owner, type, comment, table

Hive_storagedesc

Type: Referenceable

Attributes: qualifiedName, table, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories

Hive_process

Type: Process

Attributes: qualifiedName, name, description, owner, inputs, outputs, startTime, endTime, userName, operationType, queryText, queryPlan, queryId, clusterName

Hive_column_lineage

Type: Process

Attributes: qualifiedName, name, description, owner, inputs, outputs, query, depenendencyType, expression

2. Enumeration types:

Hive_principal_type value: USER, ROLE, GROUP

3. Structural type

Hive_order attribute: col, order

Hive_serde attributes: name, serializationLib, parameters

The structure of the HIve entity:

Hive_db.qualifiedName: @

Hive_table.qualifiedName:. @

Hive_column.qualifiedName:.. @

Hive_process.queryString: trimmed query string in lower case

Configure Hive hook

Hive hook listens to the create/update/delete operation of hive. Here are the configuration steps:

1. Modify hive-env.sh (specify packet address)

Export HIVE_AUX_JARS_PATH=/opt/apps/apache-atlas-2.1.0/hook/hive

2. Modify hive-site.xml (restart hive after configuration)

Hive.exec.post.hooks

Org.apache.atlas.hive.hook.HiveHook

1234

Note that this is actually post-implementation monitoring, and there can be pre-implementation and in-execution monitoring.

3. Copy synchronous configuration copy atlas configuration file atlas-application.properties to hive configuration directory to add configuration:

Atlas.hook.hive.synchronous=false

Atlas.hook.hive.numRetries=3

Atlas.hook.hive.queueSize=10000

Atlas.cluster.name=primary

Atlas.rest.address= http://doit33:21000

Import Hive metadata into Atlas

Bin/import-hive.sh

Using Hive configuration directory [/ opt/module/hive/conf]

Log file for import is / opt/module/atlas/logs/import-hive.log

Log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.

Log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.

Enter user name: admin; enter password: admin

Enter username for atlas:-admin

Enter password for atlas:-

Hive Meta Data import was successful!!!

Step on the pit full record 1. Can not find the class org.apache.atlas.hive.hook.hivehook

The third party jar package of hive was not added.

The tip is to use hive-shell to see if the jar package is added without set, which will print a list of configuration variables overridden by the user or configuration unit.

Taking joining elsaticsearch-hadoop-2.1.2.jar as an example, this paper describes several ways to join third-party jar in Hive.

1, add in hive shell

Hive > add jar / home/hadoop/elasticsearch-hadoop-hive-2.1.2.jar

Whether the connection mode is valid Hive Shell does not need to restart the Hive service to make it valid Hive Server is invalid

2Jar is put into the ${HIVE_HOME} / auxlib directory

Create the folder auxlib in ${HIVE_HOME}, and then put the custom jar file in that folder. This method does not require a restart of Hive. And it's more convenient.

Whether the connection method is valid or not Hive Shell does not need to restart the Hive service to effectively restart the Hive Server Hive service.

3HIVE.AUX.JARS.PATH and hive.aux.jars.path

The hive.aux.jars.path configuration of HIVE.AUX.JARS.PATH and hive-site.xml in hive-env.sh is not valid for the server, but only valid for the current hive shell. Different hive shell does not affect each other. Each hive shell needs to be configured and can be configured as a folder. HIVE.AUX.JARS.PATH and hive.aux.jars.path only support local files. Can be configured as a file or as a folder.

Whether the connection method is valid Hive Shell restart Hive service only takes effect Hive Server restart Hive service takes effect 2. HIVE reports error Failing because I am unlikely to write too

HIVE.AUX.JARS.PATH is not configured correctly

There is a passage in the hive-env.sh script

# Folder containing extra libraries required for hive compilation/execution can be controlled by:

If ["${HIVE_AUX_JARS_PATH}"! = "]; then

Export HIVE_AUX_JARS_PATH=$ {HIVE_AUX_JARS_PATH}

Elif [- d "/ usr/hdp/current/hive-webhcat/share/hcatalog"]; then

Export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog

If you set a value to HIVE_AUX_JARS_PATH, / usr/hdp/current/hive-webhcat/share/hcatalog is ignored.

Hive can only read one HIVE_AUX_JARS_PATH

Centrally place our shared jar package in one place, and then establish a corresponding soft connection under / usr/hdp/current/hive-webhcat/share/hcatalog

Sudo-u hive ln-s / usr/lib/share-lib/elasticsearch-hadoop-2.1.0.Beta4.jar / usr/hdp/current/hive-webhcat/share/hcatalog/elasticsearch-hadoop-2.1.0.Beta4.jar above are all the contents of the article "how Atlas integrates HIve". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.