In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you what Hive Metastore is, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to understand it!
Hive Metastore is a service used by Hive to manage database table metadata. With it, the upper services no longer have to deal with naked file data, but can build a computing framework based on structured database table information. Nowadays, in addition to Hive, many computing frameworks support using Hive Metastore as the metadata center to query the underlying Hadoop ecological data, such as Drill, Presto, Spark and so on.
By default, Hive Metastore does not do any user authentication, which means that as long as you specify the IP and port of the metastore service, you can connect and read the metadata through the Thrift protocol. Metastore also supports Kerberos-based authentication, but the authentication here is only to protect access to metastore. Once the authentication is passed, anyone calling the same api will get the same result, regardless of who is calling it.
Authorization in Hive there are several ways, one is based on the file permissions at the storage level for authorization management, if you have the corresponding directory / file permissions, you can query accordingly, the problem of this mechanism is relatively rough, because the object of its permissions is directories / files, can not be more fine-grained control, such as column-level permissions control. And for Blob Store services such as AWS's S3 and Aliyun's OSS, this mechanism cannot be implemented because there are no real HDFS-level "users".
Another authorization mechanism is the SQL standard-based authorization mechanism (GRANT/REVOKE) in HiveServer2, which provides a more fine-grained authorization mechanism, but this mechanism is provided in HiveServer2 and has nothing to do with Metastore. The community has also developed two other plug-in authorization solutions: Apache Ranger and Apache Sentry. To use Ranger, for example, to use Ranger, you have to embed it in the corresponding upper engines such as Presto and Spark. When sending query statements, Presto and Spark will be blocked by the Ranger plug-in and decide whether to intercept or release the request based on the permission information of the current user. In other words, these authorization mechanisms are implemented outside Metastore, and the role of Metastore is more of a simple service of the underlying database, without realizing more functions of multi-tenancy and authority authentication management.
The community has made some interesting extensions to Metastore. For example, Hotels.com has developed a framework called Waggle Dance, which is motivated by the fact that more than one big data cluster is often built within a company, so there are multiple Metastore metadata services. These Metastore are incompatible with each other, resulting in isolated islands of data / metadata. Waggle Dance is equivalent to a routing service of Metastore, which completely implements the Thrift API of Metastore. Users directly access Waggle Dance when they need to access Metastore, while there are multiple Hive Metastore behind Waggle Dance, which combines these disconnected Metastore into a whole and solves the problem of data / metadata island. However, Waggle Dance also has its problems, for example, there may be schema with the same name between different Hive Metastore, so it is necessary to discuss some shcema name prefixes between metastore in advance in order to ensure the uniqueness of the whole schema name. For example, the Thrift API of Hive Metastore is constantly evolving, and the API is not exactly the same between different versions, so Waggle Dance may need to support multiple versions of API. If some breaking change appears in Hive Metastore API, it may cause the whole solution of Waggle Dance not to be maintained.
AWS Glue is another extension to Hive Metastore. Unlike ordinary Hive Metastore, Glue is a metadata service that supports multi-tenancy-different users call the same metadata interface: `getAllDatabases () `returns different results. And Glue also builds authorization information into the service, but it does not provide authorization information in the way of grant/revoke in the SQL standard, but based on AWS's unified IAM permission policy, for example, the following permission rules allow users to access the finegrainaccess database and all tables under it that begin with dev_.
"arn:aws:glue:us-east-1:123456789012:catalog", "arn:aws:glue:us-east-1:123456789012:database/finegrainaccess", "arn:aws:glue:us-east-1:123456789012:tables/finegrainaccess/dev_*" so that even if different sub-accounts under the same tenant call the same metadata method `getAllDatabases () `, the data returned is not the same.
If we say that Waggle Dance is the "innovation within the system" of Hive Metastore, then AWS Glue is the "push-over innovation" of Hive Metastore: Waggle Dance implements the API of Hive Metastore, which is basically insensitive to the upper engine, while Glue only ensures that it achieves the same function as Hive Metastore, but it is no longer a Thrift API, but a REST API, and the upper engine needs to make corresponding changes to receive the Glue.
The above is all the content of this article "what is Hive Metastore?" thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.