In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces Hive Hook and MetaStore Listener how to achieve metadata management, the article is very detailed, has a certain reference value, interested friends must read it!
Metadata management is the core of the data warehouse. It not only defines what the data warehouse has, but also points out the content and location of the data in the data warehouse, depicts the rules of data extraction and transformation, and stores all kinds of business information related to the theme of the data warehouse.
Metadata management metadata definition
According to the traditional definition, Metadata is data about data. Metadata connects the source data, data warehouse and data application, and records the whole process of data generation and consumption. Metadata mainly records the definition of the model in the data warehouse, the mapping relationship between different levels, the monitoring of the data state of the data warehouse and the running status of ETL tasks. In the data warehouse system, metadata can help data warehouse administrators and developers to easily find the data they care about, which can be used to guide their data management and development work and improve work efficiency. Metadata is divided into two categories according to its purpose: technical metadata (Technical Metadata) and business metadata (Business Metadata). Technical metadata stores data about the technical details of the data warehouse system, and is used to develop and manage the data used by the data warehouse.
Metadata classification technology metadata distributed computing system stores metadata
Such as Hive tables, columns, partitions and other information. The table name is recorded. Partition information, responsible person information, file size, table type, and column field name, field type, field remarks, whether it is a partition field, and so on.
Distributed computing system runs metadata
Similar to Hive's Job log, including job type, instance name, input and output, SQL, run parameters, execution time, and so on.
Task scheduling metadata
The dependency types and dependencies of tasks, as well as the running logs of different types of scheduled tasks.
Business metadata
Business metadata describes the data in the data warehouse from a business point of view, which provides a semantic layer between the user and the actual system, so that business personnel who do not understand computer technology can also "read" the data in the data warehouse. Common business metadata are: standardized definitions of dimensions and attributes, business processes, indicators, etc., for better management and use of data; data application metadata, such as configuration and operation of data reports, data products, etc.
Metadata application
The real value of data is that it drives decision-making and guides operations through data. Through a data-driven approach, we can identify trends and take effective action to help us identify problems and drive innovation or solutions. This is the digital operation. Similarly, for metadata, it can be used to guide data-related personnel to carry out daily work and achieve data-based "operation". For example, for data users, they can quickly find the data they need through metadata; for ETL engineers, they can use metadata to guide them to carry out various daily ETL tasks such as model design, task optimization and task offline; and for operation and maintenance engineers, they can use metadata to guide them to carry out operation and maintenance work such as storage, calculation and system optimization of the entire cluster.
Hive Hooks and Metastore ListenersHive Hooks
With regard to data governance and metadata management framework, there are many open source systems in the industry, such as Apache Atlas, which can meet the needs of metadata management in complex scenarios. In fact, Apache Atlas uses Hive's Hooks to manage the metadata of Hive. The following configuration is required:
Hive.exec.post.hooks
Org.apache.atlas.hive.hook.HiveHook
Listen for various events of Hive through Hook, such as creating tables, modifying tables, etc., then push the collected data to Kafka according to a specific format, and finally consume metadata and store it.
Hive Hooks classification
So, what on earth is Hooks?
Hooks is an event and message mechanism that binds events to the execution flow of an internal Hive without recompiling Hive. Hook provides a way to extend and inherit external components. Depending on the type of Hook, it can be run at different stages. The types of Hooks are mainly divided into the following categories:
Hive.exec.pre.hooks
As can be seen from the name, it is called before the execution engine executes the query. This can only be used after Hive has optimized the query plan. To use this Hooks, you need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext. The configuration in hive-site.xml is as follows:
Hive.exec.pre.hooks
Fully qualified name of the implementation class
Hive.exec.post.hooks
Called before the end of the execution plan is returned to the user. You need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext when using it. The configuration in hive-site.xml is as follows:
Hive.exec.post.hooks
Fully qualified name of the implementation class
Hive.exec.failure.hooks
Called after the execution of the plan failed. You need to implement the API: org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext when using it. The configuration in hive-site.xml is as follows:
Hive.exec.failure.hooks
Fully qualified name of the implementation class
Hive.metastore.init.hooks
HMSHandler initialization is called. You need to implement the API: org.apache.hadoop.hive.metastore.MetaStoreInitListener when using it. The configuration in hive-site.xml is as follows:
Hive.metastore.init.hooks
Fully qualified name of the implementation class
Hive.exec.driver.run.hooks
When running at the beginning or end of Driver.run, you need to implement the API: org.apache.hadoop.hive.ql.HiveDriverRunHook. The configuration in hive-site.xml is as follows:
Hive.exec.driver.run.hooks
Fully qualified name of the implementation class
Hive.semantic.analyzer.hook
Called when Hive performs semantic analysis on the query statement. You need to integrate the abstract class: org.apache.hadoop.hive.ql.parse.AbstractSemanticAnalyzerHook. The configuration in hive-site.xml is as follows:
Hive.semantic.analyzer.hook
Fully qualified name of the implementation class
The advantages and disadvantages of Hive Hooks can be easily embedded in various query phases or run custom code can be used to update metadata disadvantages when using Hooks, the obtained metadata usually needs to be further parsed, otherwise it is difficult to understand the query process
For Hive Hooks, this article presents a use case for hive.exec.post.hook, which is run after the query is executed and before the result is returned.
Metastore Listeners
The so-called Metastore Listeners refers to the monitoring of Hive metastore. Users can customize some code to use metadata snooping.
When we look at the source code of the class HiveMetaStore, we will find that in the init () method of creating HiveMetaStore, three kinds of Listener, MetaStorePreEventListener,MetaStoreEventListener and MetaStoreEndFunctionListener, are created at the same time, and these Listener are used to listen to each step of the event.
Public class HiveMetaStore extends ThriftHiveMetastore {
/ /... Omit the code
Public static class HMSHandler extends FacebookBase implements
IHMSHandler {
/ /... Omit the code
Public void init () throws MetaException {
/ /... Omit the code
/ / obtain MetaStorePreEventListener
PreListeners = MetaStoreUtils.getMetaStoreListeners (MetaStorePreEventListener.class
HiveConf
HiveConf.getVar (HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS))
/ / obtain MetaStoreEventListener
Listeners = MetaStoreUtils.getMetaStoreListeners (MetaStoreEventListener.class
HiveConf
HiveConf.getVar (HiveConf.ConfVars.METASTORE_EVENT_LISTENERS))
Listeners.add (new SessionPropertiesListener (hiveConf))
/ / obtain MetaStoreEndFunctionListener
EndFunctionListeners = MetaStoreUtils.getMetaStoreListeners (
MetaStoreEndFunctionListener.class
HiveConf
HiveConf.getVar (HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS))
/ /... Omit the code
}
}
}
Metastore Listeners classification hive.metastore.pre.event.listeners
This abstract class needs to be extended to provide an implementation of the actions that need to be performed before a particular event occurs on the metastore. These methods are called before the event occurs on the metastore.
When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStorePreEventListener. The configuration in Hive-site.xml is:
Hive.metastore.pre.event.listeners
Fully qualified name of the implementation class
Hive.metastore.event.listeners
This abstract class needs to be extended to provide an implementation of the action that needs to be performed when a particular event occurs on the metastore. These methods are called whenever an event occurs on the Metastore.
When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStoreEventListener. The configuration in Hive-site.xml is:
Hive.metastore.event.listeners
Fully qualified name of the implementation class
Hive.metastore.end.function.listeners
These methods are called whenever the function ends.
When you use it, you need to inherit the abstract class: org.apache.hadoop.hive.metastore.MetaStoreEndFunctionListener. The configuration in Hive-site.xml is:
Hive.metastore.end.function.listeners
Fully qualified name of the implementation class
Metastore Listeners advantages and disadvantages metadata has been parsed, it is easy to understand that does not affect the query process, is the disadvantage of read-only inflexibility, only can access objects belonging to the current event
For metastore listener, this article will give a use case of MetaStoreEventListener, which will implement two methods: onCreateTable and onAlterTable
Hive Hooks basic usage code
The specific implementation code is as follows:
Public class CustomPostHook implements ExecuteWithHookContext {
Private static final Logger LOGGER = LoggerFactory.getLogger (CustomPostHook.class)
/ / Storage the SQL operation type of Hive
Private static final HashSet OPERATION_NAMES = new HashSet ()
/ / HiveOperation is an enumerated class that encapsulates the SQL operation type of Hive
/ / Monitoring SQL operation type
Static {
/ / create a table
OPERATION_NAMES.add (HiveOperation.CREATETABLE.getOperationName ())
/ / modify database properties
OPERATION_NAMES.add (HiveOperation.ALTERDATABASE.getOperationName ())
/ / modify the database owner
OPERATION_NAMES.add (HiveOperation.ALTERDATABASE_OWNER.getOperationName ())
/ / modify table properties and add columns
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_ADDCOLS.getOperationName ())
/ / modify table attributes, table storage path
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_LOCATION.getOperationName ())
/ / modify table properties
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_PROPERTIES.getOperationName ())
/ / Table rename
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_RENAME.getOperationName ())
/ / column rename
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_RENAMECOL.getOperationName ())
/ / Update the column, delete the current column first, and then add a new column
OPERATION_NAMES.add (HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName ())
/ / create a database
OPERATION_NAMES.add (HiveOperation.CREATEDATABASE.getOperationName ())
/ / Delete the database
OPERATION_NAMES.add (HiveOperation.DROPDATABASE.getOperationName ())
/ / Delete the table
OPERATION_NAMES.add (HiveOperation.DROPTABLE.getOperationName ())
}
@ Override
Public void run (HookContext hookContext) throws Exception {
Assert (hookContext.getHookType () = = HookType.POST_EXEC_HOOK)
/ / execute the plan
QueryPlan plan = hookContext.getQueryPlan ()
/ / Operation name
String operationName = plan.getOperationName ()
LogWithHeader ("SQL statement executed:" + plan.getQueryString ())
LogWithHeader ("Operation name:" + operationName)
If (OPERATION_NAMES.contains (operationName) & &! plan.isExplain ()) {
LogWithHeader ("Monitoring SQL operations")
Set inputs = hookContext.getInputs ()
Set outputs = hookContext.getOutputs ()
For (Entity entity: inputs) {
LogWithHeader ("Hook metadata input value:" + toJson (entity))
}
For (Entity entity: outputs) {
LogWithHeader ("Hook metadata output value:" + toJson (entity))
}
} else {
LogWithHeader ("out of scope, ignore the hook!")
}
}
Private static String toJson (Entity entity) throws Exception {
ObjectMapper mapper = new ObjectMapper ()
/ / Type of entity
/ / mainly includes:
/ / DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
Switch (entity.getType ()) {
Case DATABASE:
Database db = entity.getDatabase ()
Return mapper.writeValueAsString (db)
Case TABLE:
Return mapper.writeValueAsString (entity.getTable () .getTTable ())
}
Return null
}
/ * *
* Log format
*
* @ param obj
, /
Private void logWithHeader (Object obj) {
LOGGER.info ("[CustomPostHook] [Thread:" + Thread.currentThread () .getName () + "] |" + obj)
}
}
Use process interpretation
First, compile the above code into a jar package, put it in the $HIVE_HOME/lib directory, or use the command to add the jar package in the client side of Hive:
0: jdbc:hive2://localhost:10000 > add jar / opt/softwares/com.jmx.hive-1.0-SNAPSHOT.jar
Then configure the Hive-site.xml file. For convenience, we directly use the client command to configure:
0: jdbc:hive2://localhost:10000 > set hive.exec.post.hooks=com.jmx.hooks.CustomPostHook
View table operation
In the above code, we monitor some operations, and some custom code (such as output log) is triggered when these actions are monitored. When we enter the following command in the beeline client of Hive:
0: jdbc:hive2://localhost:10000 > show tables
You can see this in the $HIVE_HOME/logs/hive.log file:
[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | SQL statement executed: show tables
[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | Operation name: SHOWTABLES
[CustomPostHook] [Thread: cab9a763-c63e-4f25-9f9a-affacb3cecdb main] | out of the monitoring scope, ignore the hook!
The above view table operation is not within the scope of monitoring, so there is no corresponding metadata log.
Table building operation
When we create a table in the beeline client of Hive, it looks like this:
CREATE TABLE testposthook (
Id int COMMENT "id"
Name string COMMENT "name"
) COMMENT "build table _ test Hive Hooks"
ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
LOCATION'/ user/hive/warehouse/'
Observe the hive.log log:
There are two Hook metastore output values above: the first is the metadata information of the database, and the second is the metadata information of the table
Database metadata {
"name": "default"
"description": "Default Hive database"
"locationUri": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"parameters": {
}
"privileges": null
"ownerName": "public"
"ownerType": "ROLE"
"setParameters": true
"parametersSize": 0
"setOwnerName": true
"setOwnerType": true
"setPrivileges": false
"setName": true
"setDescription": true
"setLocationUri": true
}
Table metadata {
"tableName": "testposthook"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597985444
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
]
"location": null
"inputFormat": "org.apache.hadoop.mapred.SequenceFileInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe"
"parameters": {
"serialization.format": "1"
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 1
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
}
"storedAsSubDirectories": false
"colsSize": 0
"setParameters": true
"parametersSize": 0
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"setSkewedInfo": true
"colsIterator": [
]
"setCompressed": false
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": false
"setCols": true
"setLocation": false
"setInputFormat": true
}
"partitionKeys": [
]
"parameters": {
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": null
"temporary": false
"rewriteEnabled": false
"partitionKeysSize": 0
"setDbName": true
"setSd": true
"setParameters": true
"setCreateTime": true
"setLastAccessTime": false
"parametersSize": 0
"setTableName": true
"setPrivileges": false
"setOwner": true
"setPartitionKeys": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setRetention": false
"partitionKeysIterator": [
]
"setTemporary": false
"setRewriteEnabled": false
}
We found that in the above table metadata information, the * * cols [] * * column has no data, that is, there is no information about the fields id and name when the table was created. To get this information, you can execute the following command:
ALTER TABLE testposthook
ADD COLUMNS (age int COMMENT 'age')
Observe the log information again:
In the log above, Hook metastore has only one input and one output: both represent the metadata information of table.
Enter {
"tableName": "testposthook"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597985445
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
{
"name": "id"
"type": "int"
"comment": "id"
"setName": true
"setType": true
"setComment": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setName": true
"setType": true
"setComment": true
}
]
"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
"parameters": {
"serialization.format":
"field.delim":
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 2
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
}
"storedAsSubDirectories": false
"colsSize": 2
"setParameters": true
"parametersSize": 0
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"setSkewedInfo": true
"colsIterator": [
{
"name": "id"
"type": "int"
"comment": "id"
"setName": true
"setType": true
"setComment": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setName": true
"setType": true
"setComment": true
}
]
"setCompressed": true
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": true
"setCols": true
"setLocation": true
"setInputFormat": true
}
"partitionKeys": [
]
"parameters": {
"transient_lastDdlTime": "1597985445"
"comment": "build table _ test Hive Hooks"
"totalSize": "0"
"numFiles": "0"
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": null
"temporary": false
"rewriteEnabled": false
"partitionKeysSize": 0
"setDbName": true
"setSd": true
"setParameters": true
"setCreateTime": true
"setLastAccessTime": true
"parametersSize": 4
"setTableName": true
"setPrivileges": false
"setOwner": true
"setPartitionKeys": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setRetention": true
"partitionKeysIterator": [
]
"setTemporary": false
"setRewriteEnabled": true
}
You can see the field metadata information of the * * "cols" * * column from the json above. Let's take a look at the output json:
Output {
"tableName": "testposthook"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597985445
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
{
"name": "id"
"type": "int"
"comment": "id"
"setName": true
"setType": true
"setComment": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setName": true
"setType": true
"setComment": true
}
]
"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
"parameters": {
"serialization.format":
"field.delim":
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 2
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
}
"storedAsSubDirectories": false
"colsSize": 2
"setParameters": true
"parametersSize": 0
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"setSkewedInfo": true
"colsIterator": [
{
"name": "id"
"type": "int"
"comment": "id"
"setName": true
"setType": true
"setComment": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setName": true
"setType": true
"setComment": true
}
]
"setCompressed": true
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": true
"setCols": true
"setLocation": true
"setInputFormat": true
}
"partitionKeys": [
]
"parameters": {
"transient_lastDdlTime": "1597985445"
"comment": "build table _ test Hive Hooks"
"totalSize": "0"
"numFiles": "0"
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": null
"temporary": false
"rewriteEnabled": false
"partitionKeysSize": 0
"setDbName": true
"setSd": true
"setParameters": true
"setCreateTime": true
"setLastAccessTime": true
"parametersSize": 4
"setTableName": true
"setPrivileges": false
"setOwner": true
"setPartitionKeys": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setRetention": true
"partitionKeysIterator": [
]
"setTemporary": false
"setRewriteEnabled": true
}
The output object does not contain the new column age, which represents the metadata information before the table is modified
Metastore Listeners basic usage code
The specific implementation code is as follows:
Public class CustomListener extends MetaStoreEventListener {
Private static final Logger LOGGER = LoggerFactory.getLogger (CustomListener.class)
Private static final ObjectMapper objMapper = new ObjectMapper ()
Public CustomListener (Configuration config) {
Super (config)
LogWithHeader ("created")
}
/ / listen for table creation operation
@ Override
Public void onCreateTable (CreateTableEvent event) {
LogWithHeader (event.getTable ())
}
/ / listen for table modification operations
@ Override
Public void onAlterTable (AlterTableEvent event) {
LogWithHeader (event.getOldTable ())
LogWithHeader (event.getNewTable ())
}
Private void logWithHeader (Object obj) {
LOGGER.info ("[CustomListener] [Thread:" + Thread.currentThread () .getName () + "] |" + objToStr (obj)
}
Private String objToStr (Object obj) {
Try {
Return objMapper.writeValueAsString (obj)
} catch (IOException e) {
LOGGER.error ("Error on conversion", e)
}
Return null
}
}
Use process interpretation
The way it is used is a little different from Hooks. Hive Hook interacts with Hiveserver, while Listener interacts with Metastore, that is, Listener runs in the Metastore process. The specific usage is as follows:
First put the jar package in the $HIVE_HOME/lib directory, and then configure the hive-site.xml file as follows:
Hive.metastore.event.listeners
Com.jmx.hooks.CustomListener
After the configuration is complete, you need to restart the metadata service:
Bin/hive-- service metastore &
Table creation operation CREATE TABLE testlistener (
Id int COMMENT "id"
Name string COMMENT "name"
) COMMENT "build table _ test Hive Listener"
ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'
LOCATION'/ user/hive/warehouse/'
Observe the hive.log log:
{
"tableName": "testlistener"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597989316
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
]
"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
"parameters": {
"serialization.format":
"field.delim":
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 2
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
}
"storedAsSubDirectories": false
"setCols": true
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"colsSize": 2
"colsIterator": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
]
"setCompressed": true
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": true
"setParameters": true
"setLocation": true
"setInputFormat": true
"parametersSize": 0
"setSkewedInfo": true
}
"partitionKeys": [
]
"parameters": {
"transient_lastDdlTime": "1597989316"
"comment": "build table _ test Hive Listener"
"totalSize": "0"
"numFiles": "0"
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": {
"userPrivileges": {
"anonymous": [
{
"privilege": "INSERT"
"createTime":-1
"grantor": "anonymous"
"grantorType": "USER"
"grantOption": true
"setGrantOption": true
"setCreateTime": true
"setGrantor": true
"setGrantorType": true
"setPrivilege": true
}
{
"privilege": "SELECT"
"createTime":-1
"grantor": "anonymous"
"grantorType": "USER"
"grantOption": true
"setGrantOption": true
"setCreateTime": true
"setGrantor": true
"setGrantorType": true
"setPrivilege": true
}
{
"privilege": "UPDATE"
"createTime":-1
"grantor": "anonymous"
"grantorType": "USER"
"grantOption": true
"setGrantOption": true
"setCreateTime": true
"setGrantor": true
"setGrantorType": true
"setPrivilege": true
}
{
"privilege": "DELETE"
"createTime":-1
"grantor": "anonymous"
"grantorType": "USER"
"grantOption": true
"setGrantOption": true
"setCreateTime": true
"setGrantor": true
"setGrantorType": true
"setPrivilege": true
}
]
}
"groupPrivileges": null
"rolePrivileges": null
"setUserPrivileges": true
"setGroupPrivileges": false
"setRolePrivileges": false
"userPrivilegesSize": 1
"groupPrivilegesSize": 0
"rolePrivilegesSize": 0
}
"temporary": false
"rewriteEnabled": false
"setParameters": true
"setPartitionKeys": true
"partitionKeysSize": 0
"setSd": true
"setLastAccessTime": true
"setRetention": true
"partitionKeysIterator": [
]
"parametersSize": 4
"setTemporary": true
"setRewriteEnabled": false
"setTableName": true
"setDbName": true
"setOwner": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setPrivileges": true
"setCreateTime": true
}
When we do the table modification operation again,
ALTER TABLE testlistener
ADD COLUMNS (age int COMMENT 'age')
Observe the log again:
You can see that there are two records above, the first record is the information of old table, and the second is the information of the modified table.
Old table {
"tableName": "testlistener"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597989316
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
]
"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
"parameters": {
"serialization.format":
"field.delim":
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 2
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
}
"storedAsSubDirectories": false
"setCols": true
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"colsSize": 2
"colsIterator": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
]
"setCompressed": true
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": true
"setParameters": true
"setLocation": true
"setInputFormat": true
"parametersSize": 0
"setSkewedInfo": true
}
"partitionKeys": [
]
"parameters": {
"totalSize": "0"
"numFiles": "0"
"transient_lastDdlTime": "1597989316"
"comment": "build table _ test Hive Listener"
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": null
"temporary": false
"rewriteEnabled": false
"setParameters": true
"setPartitionKeys": true
"partitionKeysSize": 0
"setSd": true
"setLastAccessTime": true
"setRetention": true
"partitionKeysIterator": [
]
"parametersSize": 4
"setTemporary": false
"setRewriteEnabled": true
"setTableName": true
"setDbName": true
"setOwner": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setPrivileges": false
"setCreateTime": true
}
New table {
"tableName": "testlistener"
"dbName": "default"
"owner": "anonymous"
"createTime": 1597989316
"lastAccessTime": 0
"retention": 0
"sd": {
"cols": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
{
"name": "age"
"type": "int"
"comment": "Age"
"setComment": true
"setType": true
"setName": true
}
]
"location": "hdfs://kms-1.apache.com:8020/user/hive/warehouse"
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat"
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
"compressed": false
"numBuckets":-1
"serdeInfo": {
"name": null
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
"parameters": {
"serialization.format":
"field.delim":
}
"setSerializationLib": true
"setParameters": true
"parametersSize": 2
"setName": false
}
"bucketCols": [
]
"sortCols": [
]
"parameters": {
}
"skewedInfo": {
"skewedColNames": [
]
"skewedColValues": [
]
"skewedColValueLocationMaps": {
}
"setSkewedColNames": true
"setSkewedColValues": true
"setSkewedColValueLocationMaps": true
"skewedColNamesSize": 0
"skewedColNamesIterator": [
]
"skewedColValuesSize": 0
"skewedColValuesIterator": [
]
"skewedColValueLocationMapsSize": 0
}
"storedAsSubDirectories": false
"setCols": true
"setOutputFormat": true
"setSerdeInfo": true
"setBucketCols": true
"setSortCols": true
"colsSize": 3
"colsIterator": [
{
"name": "id"
"type": "int"
"comment": "id"
"setComment": true
"setType": true
"setName": true
}
{
"name": "name"
"type": "string"
"comment": "name"
"setComment": true
"setType": true
"setName": true
}
{
"name": "age"
"type": "int"
"comment": "Age"
"setComment": true
"setType": true
"setName": true
}
]
"setCompressed": true
"setNumBuckets": true
"bucketColsSize": 0
"bucketColsIterator": [
]
"sortColsSize": 0
"sortColsIterator": [
]
"setStoredAsSubDirectories": true
"setParameters": true
"setLocation": true
"setInputFormat": true
"parametersSize": 0
"setSkewedInfo": true
}
"partitionKeys": [
]
"parameters": {
"totalSize": "0"
"last_modified_time": "1597989660"
"numFiles": "0"
"transient_lastDdlTime": "1597989660"
"comment": "build table _ test Hive Listener"
"last_modified_by": "anonymous"
}
"viewOriginalText": null
"viewExpandedText": null
"tableType": "MANAGED_TABLE"
"privileges": null
"temporary": false
"rewriteEnabled": false
"setParameters": true
"setPartitionKeys": true
"partitionKeysSize": 0
"setSd": true
"setLastAccessTime": true
"setRetention": true
"partitionKeysIterator": [
]
"parametersSize": 6
"setTemporary": false
"setRewriteEnabled": true
"setTableName": true
"setDbName": true
"setOwner": true
"setViewOriginalText": false
"setViewExpandedText": false
"setTableType": true
"setPrivileges": false
"setCreateTime": true
}
You can see that the metadata information of the modified table contains the newly added column age.
These are all the contents of the article "how Hive Hook and MetaStore Listener implement metadata Management". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.