In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Background >
What's OLAP?
If you are a data analyst or an R & D engineer who often works with SQL, the word OLAP is no stranger to you. You may have heard of OLAP and OLTP technology, but the protagonist of today's article is OLAP, a distributed data analysis service provided by the cloud technology platform.
Xiaomi OLAP is a distributed data analysis database service that integrates storage and computing. It realizes the real-time writing and updating of "hot data" through Kudu, regularly migrates "cold data" to HDFS through a custom window, and stores it in Parquet format, thus realizing the framework of separating hot and cold data. Finally, it provides the ability to analyze real-time data and historical data at the same time through SparkSQL engine. >
OldArchitecture & Drawbacks
Cdn.xitu.io/2019/8/29/16cdc8ef4797bb73?imageView2/0/w/1280/h/960/format/webp/ignore-error/1 "> figure 1. OLAP 1. 0 metadata and rights management
Let's start with the OLAP1.0 version of the metadata and rights management graph. As you can see, in the old architecture, the permissions and metadata related to the Kudu table and the metadata related to the Hive table are separate in terms of implementation and underlying storage. The former interacts with OLAP service components Metastore Manager and SparkSQL engine through Metadata Cache and Privilege Cache implemented by ourselves, and the data is stored on Kudu; the latter uses independent service Hive Metastore (HMS) and Sentry to manage metadata and permissions respectively, and the underlying data is stored in MySQL database. After understanding the old version of the architecture, you can more thoroughly understand the problems caused by such an architecture:
1. User's point of view:
(1) when using the OLAP service, if you want to access the Kudu table, you need to make a special configuration of the SparkSQL queue to enable the support for Kudu data sources.
(2) although the early architecture merged meta at the code level, it did not fundamentally solve the situation of separation of permissions. For example, if a user authorizes database A through Hive and database B through OLAP system, the relevant table information of database A can not be seen in OLAP system. There are also situations where the user has Kudu permissions but does not have Hive permissions. The above situation is not conducive to the access of user data, but also makes users confused in the process of use. At the same time, users need to switch queues to configure and restart the service, which is not user-friendly.
2. Development point of view:
There is redundancy in the implementation of Metadata Cache and Privilege Cache, and their interaction with the underlying metadata exists in both components. Its maintenance and development costs are relatively high, and there is no unified entrance and specification. At the same time, the underlying separation of metadata and permissions is not conducive to the subsequent development of unified SQL Proxy.
As you can see, it is necessary to integrate the underlying metadata and permissions from both the user's point of view and the developer's point of view.
After introducing the past disturbance, let's see how to say goodbye to one after another. As you can see from figure 1, the easiest way to solve this separation is to reuse existing HMS and Sentry components, migrate the original metadata and permissions data to the MySQL database, and change the interaction between the metadata and permissions parts of the upper components, including the SparkSQL layer and OLAP server-side components (OLAP Server, Metastore Manager, and Dynamic Manager). Figure 2. The metadata and rights management diagram of OLAP 2. 0 metadata and rights management changes is shown above. Below we are divided into two parts to introduce the related work. >
MetadataFederation
In the aspect of metadata integration, we introduce Kudu StorageHandler, which implements the interface of Hive Meta Hook, inherits the DefaultStorageHandler class, can interact with HMS, and completes the related operations of Kudu meta. On the basis of the original version, we add related operations on partitions and tables, as well as some necessary rollup operations to ensure meta consistency. In the SparkSQL layer, the way of calling Kudu meta is transformed into the direct use of Kudu Storage Handler, and the functions of the original Kudu-related modules are directly integrated into the Hive module, including query, table creation, table deletion, table modification, table presentation and other operations. We are generally compatible with the old version of DML syntax, and through tblproperties to transmit a variety of Kudu-related information, such as table name, range partition information, hash partition information, etc., at the same time, our custom information and data flow part of the information is also stored in the table properties for upper-level programs, such as whether the OLAP table, OLAP window value and so on. On the OLAP server side, we refactored all the parts related to the metadata. The original Metadata Cache is removed, and the operation of the metadata is realized by calling the API provided by HMS Client. At the same time, we migrate the system-related data from Kudu to MySQL database, so that the whole server is no longer directly dependent on Kudu Client. After the above integration, all meta operations are unified into the Hive MetastoreClient layer, implemented through Kudu Storage Handler, and the data is stored in MySQL, which is consistent with the operation on Hive meta. For developers, this architecture is generally clearer and easier to modify and maintain. For users, except for the difference in syntax between Kudu table and Hive table, other basic operations are no different from Hive. Users basically do not need to care about the underlying storage media after creating the table, and the experience is more consistent. >
PrivilegeFederation
The premise of permission integration is that Kudu-related metadata has been integrated into HMS, so that rights can be managed with the help of Sentry. Based on this, we need to achieve two channels: authentication and authorization. In the SparkSQL layer, since the Hive module itself has collected Sentry for permission authentication, after the metadata is migrated, beeline operations will be authenticated through Sentry, while the current syntax of SparkSQL in the authorization part does not support it. On the OLAP server, we reconstruct the operations related to the original permissions. The original Privilege Cache is removed, and all permission-related operations are implemented by calling Sentry Client API, including authentication, authorization, permission removal and permission display. In terms of rights display, due to the model limitations of Sentry itself, the API provided can not meet the needs. We have carried out customized development according to our own needs, such as adding the corresponding API to obtain permissions based on user roles, and so on. After the integration of permissions, all permissions of Kudu and Hive will be opened, and permission-related services can be provided through Sentry. Summary and Prospect >
Summary
After the integration of metadata and permissions, the scope of metadata and permissions of OLAP services have been expanded, which means that the scope of queries has also been expanded. The new architecture is shown in the figure below. Meta-related services are ultimately provided by Hive Metastore, and permissions-related services are ultimately provided by Sentry. We only need to call through the client interface at each layer. New Architecture figure 3. OLAP 2.0 framework >
Prospect
Based on the integrated architecture, we can provide more capabilities in the future, such as HMS-based metadata services, Sentry-based permissions services. In the future, we plan to support more data sources, such as MySQL data sources, and integrate more SQL engines, such as Hive and Kylin, to create unified SQL engine services.
Author: Xiaomiyun Technology
Link: https://juejin.im/post/5d679125f265da03b46c01b3
Source: Nuggets
The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.