How to implement metadata Management with Hive Hook and MetaStore Listener 09/19 Update SLTechnology News&Howtos

How to implement metadata Management with Hive Hook and MetaStore Listener

2025-09-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces Hive Hook and MetaStore Listener how to achieve metadata management, the article is very detailed, has a certain reference value, interested friends must read it!

Metadata management is the core of the data warehouse. It not only defines what the data warehouse has, but also points out the content and location of the data in the data warehouse, depicts the rules of data extraction and transformation, and stores all kinds of business information related to the theme of the data warehouse.

Metadata management metadata definition

According to the traditional definition, Metadata is data about data. Metadata connects the source data, data warehouse and data application, and records the whole process of data generation and consumption. Metadata mainly records the definition of the model in the data warehouse, the mapping relationship between different levels, the monitoring of the data state of the data warehouse and the running status of ETL tasks. In the data warehouse system, metadata can help data warehouse administrators and developers to easily find the data they care about, which can be used to guide their data management and development work and improve work efficiency. Metadata is divided into two categories according to its purpose: technical metadata (Technical Metadata) and business metadata (Business Metadata). Technical metadata stores data about the technical details of the data warehouse system, and is used to develop and manage the data used by the data warehouse.

Metadata classification technology metadata distributed computing system stores metadata

Such as Hive tables, columns, partitions and other information. The table name is recorded. Partition information, responsible person information, file size, table type, and column field name, field type, field remarks, whether it is a partition field, and so on.

Distributed computing system runs metadata

Similar to Hive's Job log, including job type, instance name, input and output, SQL, run parameters, execution time, and so on.

Task scheduling metadata

The dependency types and dependencies of tasks, as well as the running logs of different types of scheduled tasks.

Business metadata

Business metadata describes the data in the data warehouse from a business point of view, which provides a semantic layer between the user and the actual system, so that business personnel who do not understand computer technology can also "read" the data in the data warehouse. Common business metadata are: standardized definitions of dimensions and attributes, business processes, indicators, etc., for better management and use of data; data application metadata, such as configuration and operation of data reports, data products, etc.

Metadata application

The real value of data is that it drives decision-making and guides operations through data. Through a data-driven approach, we can identify trends and take effective action to help us identify problems and drive innovation or solutions. This is the digital operation. Similarly, for metadata, it can be used to guide data-related personnel to carry out daily work and achieve data-based "operation". For example, for data users, they can quickly find the data they need through metadata; for ETL engineers, they can use metadata to guide them to carry out various daily ETL tasks such as model design, task optimization and task offline; and for operation and maintenance engineers, they can use metadata to guide them to carry out operation and maintenance work such as storage, calculation and system optimization of the entire cluster.

Hive Hooks and Metastore ListenersHive Hooks

With regard to data governance and metadata management framework, there are many open source systems in the industry, such as Apache Atlas, which can meet the needs of metadata management in complex scenarios. In fact, Apache Atlas uses Hive's Hooks to manage the metadata of Hive. The following configuration is required:

Hive.exec.post.hooks

Org.apache.atlas.hive.hook.HiveHook

Listen for various events of Hive through Hook, such as creating tables, modifying tables, etc., then push the collected data to Kafka according to a specific format, and finally consume metadata and store it.

Hive Hooks classification

So, what on earth is Hooks?

Hooks is an event and message mechanism that binds events to the execution flow of an internal Hive without recompiling Hive. Hook provides a way to extend and inherit external components. Depending on the type of Hook, it can be run at different stages. The types of Hooks are mainly divided into the following categories:

Hive.exec.pre.hooks

As can be seen from the name, it is called before the execution engine executes the query. This can only be used after Hive has optimized the query plan. To use this Hooks, you need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext. The configuration in hive-site.xml is as follows:

Hive.exec.pre.hooks

Fully qualified name of the implementation class

Hive.exec.post.hooks

Called before the end of the execution plan is returned to the user. You need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext when using it. The configuration in hive-site.xml is as follows:

Hive.exec.post.hooks

Fully qualified name of the implementation class

Hive.exec.failure.hooks

Called after the execution of the plan failed. You need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext when using it. The configuration in hive-site.xml is as follows:

Hive.exec.failure.hooks

Fully qualified name of the implementation class

Hive.metastore.init.hooks

HMSHandler initialization is called. You need to implement the API: org.apache.hadoop.hive.metastore.MetaStoreInitListener when using it. The configuration in hive-site.xml is as follows:

Hive.metastore.init.hooks

Fully qualified name of the implementation class

Hive.exec.driver.run.hooks

When running at the beginning or end of Driver.run, you need to implement the API: org.apache.hadoop.hive.ql.HiveDriverRunHook. The configuration in hive-site.xml is as follows:

Hive.exec.driver.run.hooks

Fully qualified name of the implementation class

Hive.semantic.analyzer.hook

Called when Hive performs semantic analysis on the query statement. You need to integrate the abstract class: org.apache.hadoop.hive.ql.parse.AbstractSemanticAnalyzerHook. The configuration in hive-site.xml is as follows:

Hive.semantic.analyzer.hook

Fully qualified name of the implementation class

The advantages and disadvantages of Hive Hooks can be easily embedded in various query phases or run custom code can be used to update metadata disadvantages when using Hooks, the obtained metadata usually needs to be further parsed, otherwise it is difficult to understand the query process

For Hive Hooks, this article presents a use case for hive.exec.post.hook, which is run after the query is executed and before the result is returned.

Metastore Listeners

The so-called Metastore Listeners refers to the monitoring of Hive metastore. Users can customize some code to use metadata snooping.

When we look at the source code of the class HiveMetaStore, we will find that in the init () method of creating HiveMetaStore, three kinds of Listener, MetaStorePreEventListener,MetaStoreEventListener and MetaStoreEndFunctionListener, are created at the same time, and these Listener are used to listen to each step of the event.

Public class HiveMetaStore extends ThriftHiveMetastore {

/ /... Omit the code

Public static class HMSHandler extends FacebookBase implements

IHMSHandler {

/ /... Omit the code

Public void init () throws MetaException {

/ /... Omit the code

/ / obtain MetaStorePreEventListener

PreListeners = MetaStoreUtils.getMetaStoreListeners (MetaStorePreEventListener.class

HiveConf

HiveConf.getVar (HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS))

/ / obtain MetaStoreEventListener

Listeners = MetaStoreUtils.getMetaStoreListeners (MetaStoreEventListener.class

HiveConf

HiveConf.getVar (HiveConf.ConfVars.METASTORE_EVENT_LISTENERS))

Listeners.add (new SessionPropertiesListener (hiveConf))

/ / obtain MetaStoreEndFunctionListener

EndFunctionListeners = MetaStoreUtils.getMetaStoreListeners (

MetaStoreEndFunctionListener.class

HiveConf

HiveConf.getVar (HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS))

/ /... Omit the code

}

Metastore Listeners classification hive.metastore.pre.event.listeners

This abstract class needs to be extended to provide an implementation of the actions that need to be performed before a particular event occurs on the metastore. These methods are called before the event occurs on the metastore.

When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStorePreEventListener. The configuration in Hive-site.xml is:

Hive.metastore.pre.event.listeners

Fully qualified name of the implementation class

Hive.metastore.event.listeners

This abstract class needs to be extended to provide an implementation of the action that needs to be performed when a particular event occurs on the metastore. These methods are called whenever an event occurs on the Metastore.

When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStoreEventListener. The configuration in Hive-site.xml is:

Hive.metastore.event.listeners

Fully qualified name of the implementation class

Hive.metastore.end.function.listeners

These methods are called whenever the function ends.

When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStoreEndFunctionListener. The configuration in Hive-site.xml is:

Hive.metastore.end.function.listeners

Fully qualified name of the implementation class

Metastore Listeners advantages and disadvantages metadata has been parsed, it is easy to understand that does not affect the query process, is the disadvantage of read-only inflexibility, only can access objects belonging to the current event

For metastore listener, this article will give a use case of MetaStoreEventListener, which will implement two methods: onCreateTable and onAlterTable

Hive Hooks basic usage code

The specific implementation code is as follows:

Public class CustomPostHook implements ExecuteWithHookContext {

Private static final Logger LOGGER = LoggerFactory.getLogger (CustomPostHook.class)

/ / Storage the SQL operation type of Hive

Private static final HashSet OPERATION_NAMES = new HashSet ()

/ / HiveOperation is an enumerated class that encapsulates the SQL operation type of Hive

/ / Monitoring SQL operation type

Static {

/ / create a table

OPERATION_NAMES.add (HiveOperation.CREATETABLE.getOperationName ())

/ / modify database properties

OPERATION_NAMES.add (HiveOperation.ALTERDATABASE.getOperationName ())

/ / modify the database owner

OPERATION_NAMES.add (HiveOperation.ALTERDATABASE_OWNER.getOperationName ())

/ / modify table properties and add columns

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_ADDCOLS.getOperationName ())

/ / modify table attributes, table storage path

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_LOCATION.getOperationName ())

/ / modify table properties

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_PROPERTIES.getOperationName ())

/ / Table rename

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_RENAME.getOperationName ())

/ / column rename

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_RENAMECOL.getOperationName ())

/ / Update the column, delete the current column first, and then add a new column

OPERATION_NAMES.add (HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName ())

/ / create a database

OPERATION_NAMES.add (HiveOperation.CREATEDATABASE.getOperationName ())

/ / Delete the database

OPERATION_NAMES.add (HiveOperation.DROPDATABASE.getOperationName ())

/ / Delete the table

OPERATION_NAMES.add (HiveOperation.DROPTABLE.getOperationName ())

}

@ Override

Public void run (HookContext hookContext) throws Exception {

Assert (hookContext.getHookType () = = HookType.POST_EXEC_HOOK)

/ / execute the plan

QueryPlan plan = hookContext.getQueryPlan ()

/ / Operation name

String operationName = plan.getOperationName ()

LogWithHeader ("SQL statement executed:" + plan.getQueryString ())

LogWithHeader ("Operation name:" + operationName)

If (OPERATION_NAMES.contains (operationName) & &! plan.isExplain ()) {

LogWithHeader ("Monitoring SQL operations")

Set inputs = hookContext.getInputs ()

Set outputs = hookContext.getOutputs ()

For (Entity entity: inputs) {

LogWithHeader ("Hook metadata input value:" + toJson (entity))

}

For (Entity entity: outputs) {

LogWithHeader ("Hook metadata output value:" + toJson (entity))

}

} else {

LogWithHeader ("out of scope, ignore the hook!")

}

Private static String toJson (Entity entity) throws Exception {

ObjectMapper mapper = new ObjectMapper ()

/ / Type of entity

/ / mainly includes:

/ / DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION

Switch (entity.getType ()) {

Case DATABASE:

Database db = entity.getDatabase ()

Return mapper.writeValueAsString (db)

Case TABLE:

Return mapper.writeValueAsString (entity.getTable () .getTTable ())

}

Return null

}

/ * *

* Log format

* @ param obj

, /

Private void logWithHeader (Object obj) {

LOGGER.info ("[CustomPostHook] [Thread:" + Thread.currentThread () .getName () + "] |" + obj)

}

Use process interpretation

First, compile the above code into a jar package, put it in the $HIVE_HOME/lib directory, or use the command to add the jar package in the client side of Hive:

0: jdbc:hive2://localhost:10000 > add jar / opt/softwares/com.jmx.hive-1.0-SNAPSHOT.jar

Then configure the Hive-site.xml file. For convenience, we directly use the client command to configure:

0: jdbc:hive2://localhost:10000 > set hive.exec.post.hooks=com.jmx.hooks.CustomPostHook

View table operation

In the above code, we monitor some operations, and some custom code (such as output log) is triggered when these actions are monitored. When we enter the following command in the beeline client of Hive:

0: jdbc:hive2://localhost:10000 > show tables

You can see this in the $HIVE_HOME/logs/hive.log file:

[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | SQL statement executed: show tables

[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | Operation name: SHOWTABLES

[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | out of the monitoring scope, ignore the hook!

The above view table operation is not within the scope of monitoring, so there is no corresponding metadata log.

Table building operation

When we create a table in the beeline client of Hive, it looks like this:

CREATE TABLE testposthook (

Id int COMMENT "id"

Name string COMMENT "name"

) COMMENT "build table _ test Hive Hooks"

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'

LOCATION'/ user/hive/warehouse/'

Observe the hive.log log:

There are two Hook metastore output values above: the first is the metadata information of the database, and the second is the metadata information of the table

Database metadata {

"name": "default"

"description": "Default Hive database"

"locationUri": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"parameters": {

}

"privileges": null

"ownerName": "public"

"ownerType": "ROLE"

"setParameters": true

"parametersSize": 0

"setOwnerName": true

"setOwnerType": true

"setPrivileges": false

"setName": true

"setDescription": true

"setLocationUri": true

}

Table metadata {

"tableName": "testposthook"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597985444

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

]

"location": null

"inputFormat": "org.apache.hadoop.mapred.SequenceFileInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe"

"parameters": {

"serialization.format": "1"

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 1

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

}

"storedAsSubDirectories": false

"colsSize": 0

"setParameters": true

"parametersSize": 0

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"setSkewedInfo": true

"colsIterator": [

]

"setCompressed": false

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": false

"setCols": true

"setLocation": false

"setInputFormat": true

}

"partitionKeys": [

]

"parameters": {

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": null

"temporary": false

"rewriteEnabled": false

"partitionKeysSize": 0

"setDbName": true

"setSd": true

"setParameters": true

"setCreateTime": true

"setLastAccessTime": false

"parametersSize": 0

"setTableName": true

"setPrivileges": false

"setOwner": true

"setPartitionKeys": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setRetention": false

"partitionKeysIterator": [

]

"setTemporary": false

"setRewriteEnabled": false

}

We found that in the above table metadata information, the * * cols [] * * column has no data, that is, there is no information about the fields id and name when the table was created. To get this information, you can execute the following command:

ALTER TABLE testposthook

ADD COLUMNS (age int COMMENT 'age')

Observe the log information again:

In the log above, Hook metastore has only one input and one output: both represent the metadata information of table.

Enter {

"tableName": "testposthook"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597985445

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

{

"name": "id"

"type": "int"

"comment": "id"

"setName": true

"setType": true

"setComment": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setName": true

"setType": true

"setComment": true

}

]

"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

"parameters": {

"serialization.format":

"field.delim":

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 2

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

}

"storedAsSubDirectories": false

"colsSize": 2

"setParameters": true

"parametersSize": 0

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"setSkewedInfo": true

"colsIterator": [

{

"name": "id"

"type": "int"

"comment": "id"

"setName": true

"setType": true

"setComment": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setName": true

"setType": true

"setComment": true

}

]

"setCompressed": true

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": true

"setCols": true

"setLocation": true

"setInputFormat": true

}

"partitionKeys": [

]

"parameters": {

"transient_lastDdlTime": "1597985445"

"comment": "build table _ test Hive Hooks"

"totalSize": "0"

"numFiles": "0"

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": null

"temporary": false

"rewriteEnabled": false

"partitionKeysSize": 0

"setDbName": true

"setSd": true

"setParameters": true

"setCreateTime": true

"setLastAccessTime": true

"parametersSize": 4

"setTableName": true

"setPrivileges": false

"setOwner": true

"setPartitionKeys": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setRetention": true

"partitionKeysIterator": [

]

"setTemporary": false

"setRewriteEnabled": true

}

You can see the field metadata information of the * * "cols" * * column from the json above. Let's take a look at the output json:

Output {

"tableName": "testposthook"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597985445

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

{

"name": "id"

"type": "int"

"comment": "id"

"setName": true

"setType": true

"setComment": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setName": true

"setType": true

"setComment": true

}

]

"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

"parameters": {

"serialization.format":

"field.delim":

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 2

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

}

"storedAsSubDirectories": false

"colsSize": 2

"setParameters": true

"parametersSize": 0

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"setSkewedInfo": true

"colsIterator": [

{

"name": "id"

"type": "int"

"comment": "id"

"setName": true

"setType": true

"setComment": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setName": true

"setType": true

"setComment": true

}

]

"setCompressed": true

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": true

"setCols": true

"setLocation": true

"setInputFormat": true

}

"partitionKeys": [

]

"parameters": {

"transient_lastDdlTime": "1597985445"

"comment": "build table _ test Hive Hooks"

"totalSize": "0"

"numFiles": "0"

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": null

"temporary": false

"rewriteEnabled": false

"partitionKeysSize": 0

"setDbName": true

"setSd": true

"setParameters": true

"setCreateTime": true

"setLastAccessTime": true

"parametersSize": 4

"setTableName": true

"setPrivileges": false

"setOwner": true

"setPartitionKeys": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setRetention": true

"partitionKeysIterator": [

]

"setTemporary": false

"setRewriteEnabled": true

}

The output object does not contain the new column age, which represents the metadata information before the table is modified

Metastore Listeners basic usage code

The specific implementation code is as follows:

Public class CustomListener extends MetaStoreEventListener {

Private static final Logger LOGGER = LoggerFactory.getLogger (CustomListener.class)

Private static final ObjectMapper objMapper = new ObjectMapper ()

Public CustomListener (Configuration config) {

Super (config)

LogWithHeader ("created")

}

/ / listen for table creation operation

@ Override

Public void onCreateTable (CreateTableEvent event) {

LogWithHeader (event.getTable ())

}

/ / listen for table modification operations

@ Override

Public void onAlterTable (AlterTableEvent event) {

LogWithHeader (event.getOldTable ())

LogWithHeader (event.getNewTable ())

}

Private void logWithHeader (Object obj) {

LOGGER.info ("[CustomListener] [Thread:" + Thread.currentThread () .getName () + "] |" + objToStr (obj)

}

Private String objToStr (Object obj) {

Try {

Return objMapper.writeValueAsString (obj)

} catch (IOException e) {

LOGGER.error ("Error on conversion", e)

}

Return null

}

Use process interpretation

The way it is used is a little different from Hooks. Hive Hook interacts with Hiveserver, while Listener interacts with Metastore, that is, Listener runs in the Metastore process. The specific usage is as follows:

First put the jar package in the $HIVE_HOME/lib directory, and then configure the hive-site.xml file as follows:

Hive.metastore.event.listeners

Com.jmx.hooks.CustomListener

After the configuration is complete, you need to restart the metadata service:

Bin/hive-- service metastore &

Table creation operation CREATE TABLE testlistener (

Id int COMMENT "id"

Name string COMMENT "name"

) COMMENT "build table _ test Hive Listener"

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'

LOCATION'/ user/hive/warehouse/'

Observe the hive.log log:

{

"tableName": "testlistener"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597989316

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

]

"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

"parameters": {

"serialization.format":

"field.delim":

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 2

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

}

"storedAsSubDirectories": false

"setCols": true

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"colsSize": 2

"colsIterator": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

]

"setCompressed": true

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": true

"setParameters": true

"setLocation": true

"setInputFormat": true

"parametersSize": 0

"setSkewedInfo": true

}

"partitionKeys": [

]

"parameters": {

"transient_lastDdlTime": "1597989316"

"comment": "build table _ test Hive Listener"

"totalSize": "0"

"numFiles": "0"

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": {

"userPrivileges": {

"anonymous": [

{

"privilege": "INSERT"

"createTime":-1

"grantor": "anonymous"

"grantorType": "USER"

"grantOption": true

"setGrantOption": true

"setCreateTime": true

"setGrantor": true

"setGrantorType": true

"setPrivilege": true

}

{

"privilege": "SELECT"

"createTime":-1

"grantor": "anonymous"

"grantorType": "USER"

"grantOption": true

"setGrantOption": true

"setCreateTime": true

"setGrantor": true

"setGrantorType": true

"setPrivilege": true

}

{

"privilege": "UPDATE"

"createTime":-1

"grantor": "anonymous"

"grantorType": "USER"

"grantOption": true

"setGrantOption": true

"setCreateTime": true

"setGrantor": true

"setGrantorType": true

"setPrivilege": true

}

{

"privilege": "DELETE"

"createTime":-1

"grantor": "anonymous"

"grantorType": "USER"

"grantOption": true

"setGrantOption": true

"setCreateTime": true

"setGrantor": true

"setGrantorType": true

"setPrivilege": true

}

]

}

"groupPrivileges": null

"rolePrivileges": null

"setUserPrivileges": true

"setGroupPrivileges": false

"setRolePrivileges": false

"userPrivilegesSize": 1

"groupPrivilegesSize": 0

"rolePrivilegesSize": 0

}

"temporary": false

"rewriteEnabled": false

"setParameters": true

"setPartitionKeys": true

"partitionKeysSize": 0

"setSd": true

"setLastAccessTime": true

"setRetention": true

"partitionKeysIterator": [

]

"parametersSize": 4

"setTemporary": true

"setRewriteEnabled": false

"setTableName": true

"setDbName": true

"setOwner": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setPrivileges": true

"setCreateTime": true

}

When we do the table modification operation again,

ALTER TABLE testlistener

ADD COLUMNS (age int COMMENT 'age')

Observe the log again:

You can see that there are two records above, the first record is the information of old table, and the second is the information of the modified table.

Old table {

"tableName": "testlistener"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597989316

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

]

"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

"parameters": {

"serialization.format":

"field.delim":

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 2

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

}

"storedAsSubDirectories": false

"setCols": true

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"colsSize": 2

"colsIterator": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

]

"setCompressed": true

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": true

"setParameters": true

"setLocation": true

"setInputFormat": true

"parametersSize": 0

"setSkewedInfo": true

}

"partitionKeys": [

]

"parameters": {

"totalSize": "0"

"numFiles": "0"

"transient_lastDdlTime": "1597989316"

"comment": "build table _ test Hive Listener"

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": null

"temporary": false

"rewriteEnabled": false

"setParameters": true

"setPartitionKeys": true

"partitionKeysSize": 0

"setSd": true

"setLastAccessTime": true

"setRetention": true

"partitionKeysIterator": [

]

"parametersSize": 4

"setTemporary": false

"setRewriteEnabled": true

"setTableName": true

"setDbName": true

"setOwner": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setPrivileges": false

"setCreateTime": true

}

New table {

"tableName": "testlistener"

"dbName": "default"

"owner": "anonymous"

"createTime": 1597989316

"lastAccessTime": 0

"retention": 0

"sd": {

"cols": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

{

"name": "age"

"type": "int"

"comment": "Age"

"setComment": true

"setType": true

"setName": true

}

]

"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"

"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"

"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

"compressed": false

"numBuckets":-1

"serdeInfo": {

"name": null

"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"

"parameters": {

"serialization.format":

"field.delim":

}

"setSerializationLib": true

"setParameters": true

"parametersSize": 2

"setName": false

}

"bucketCols": [

]

"sortCols": [

]

"parameters": {

}

"skewedInfo": {

"skewedColNames": [

]

"skewedColValues": [

]

"skewedColValueLocationMaps": {

}

"setSkewedColNames": true

"setSkewedColValues": true

"setSkewedColValueLocationMaps": true

"skewedColNamesSize": 0

"skewedColNamesIterator": [

]

"skewedColValuesSize": 0

"skewedColValuesIterator": [

]

"skewedColValueLocationMapsSize": 0

}

"storedAsSubDirectories": false

"setCols": true

"setOutputFormat": true

"setSerdeInfo": true

"setBucketCols": true

"setSortCols": true

"colsSize": 3

"colsIterator": [

{

"name": "id"

"type": "int"

"comment": "id"

"setComment": true

"setType": true

"setName": true

}

{

"name": "name"

"type": "string"

"comment": "name"

"setComment": true

"setType": true

"setName": true

}

{

"name": "age"

"type": "int"

"comment": "Age"

"setComment": true

"setType": true

"setName": true

}

]

"setCompressed": true

"setNumBuckets": true

"bucketColsSize": 0

"bucketColsIterator": [

]

"sortColsSize": 0

"sortColsIterator": [

]

"setStoredAsSubDirectories": true

"setParameters": true

"setLocation": true

"setInputFormat": true

"parametersSize": 0

"setSkewedInfo": true

}

"partitionKeys": [

]

"parameters": {

"totalSize": "0"

"last_modified_time": "1597989660"

"numFiles": "0"

"transient_lastDdlTime": "1597989660"

"comment": "build table _ test Hive Listener"

"last_modified_by": "anonymous"

}

"viewOriginalText": null

"viewExpandedText": null

"tableType": "MANAGED_TABLE"

"privileges": null

"temporary": false

"rewriteEnabled": false

"setParameters": true

"setPartitionKeys": true

"partitionKeysSize": 0

"setSd": true

"setLastAccessTime": true

"setRetention": true

"partitionKeysIterator": [

]

"parametersSize": 6

"setTemporary": false

"setRewriteEnabled": true

"setTableName": true

"setDbName": true

"setOwner": true

"setViewOriginalText": false

"setViewExpandedText": false

"setTableType": true

"setPrivileges": false

"setCreateTime": true

}

You can see that the metadata information of the modified table contains the newly added column age.

These are all the contents of the article "how Hive Hook and MetaStore Listener implement metadata Management". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.