In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what the Apache Hive 3 architecture is like, the article is very detailed, has a certain reference value, interested friends must read it!
Apache Tez
Apache Tez is the Hive execution engine for the Hive on Tez service, which includes HiveServer (HS2) in Cloudera Manager. Tez does not support MapReduce. In a Cloudera cluster, an exception occurs if the old script or application specifies the MapReduce to be executed. Most user-defined functions (UDF) do not need to be changed to execute on the Tez, without the need to perform MapReduce.
Using the expressions and data transfer primitives of a directed acyclic graph (DAG), executing Hive queries on Tez instead of MapReduce can improve query performance. In the Cloudera data platform (CDP), Hive usually uses only the Tez engine and automatically starts and manages Tez AM when Hive on Tez starts. The SQL query you submitted to Hive is executed as follows:
Hive compiles the query.
Tez executes the query.
Allocate resources to applications throughout the cluster.
Hive updates the data in the data source and returns the query results.
Hive on Tez runs tasks on temporary containers and uses standard YARN shuffle services.
Data storage and access control
One of the major architectural changes that support the Hive 3 design gives Hive more control over metadata memory resources and file systems or object storage. The following architectural changes from Hive 2 to Hive 3 provide greater security:
Tightly controlled file systems and computer memory resources replace flexible boundaries: clear boundaries improve predictability. Better file system control can improve security.
Optimize workloads in shared files and YARN containers
By default, CDP Private Cloud Foundation stores Hive data on HDFS, and CDP public cloud stores Hive data on S3 by default. In the public cloud, Hive uses HDFS only to store temporary files. Hive 3 is optimized for object storage, such as S3, in the following ways:
Hive uses ACID to determine which files to read, rather than relying on the storage system.
In Hive 3, file movement is less than in Hive 2.
Hive actively caches metadata and data to reduce file system operations.
The main authorization model of Hive is Ranger. Hive enforces the access control specified in Ranger. Compared with other security schemes, this model provides stronger security and more flexibility in managing policies.
This model only allows Hive to access the data warehouse. If the Ranger security service or other security is not enabled, by default, Hive for CDP Private Cloud Foundation will use storage-based authorization (SBA) based on user simulation.
HDFS permission change
In CDP Private Cloud Foundation, SBA relies heavily on HDFS access control tables (ACL). ACL is an extension of the privilege system in HDFS. By default, CDP Private Cloud Foundation opens ACL in HDFS, providing you with the following benefits:
Increased flexibility when granting specific permissions to multiple user groups and users
Easily apply permissions to a directory tree rather than a single file
Transaction processing
You can deploy new Hive application types with the following transaction features:
Mature versions of ACID transactions:
The ACID table is the default table type.
Enabling ACID by default does not cause performance or operational overload.
Simplified application development, strong transaction guaranteed operations and simple semantics of SQL commands
You do not need to bucket the ACID table.
Rewritten materialized view
Automatic query caching
Advanced optimization
Hive client change
CDP Private Cloud Foundation supports thin client Beeline to work on the command line. You can run Hive management commands from the command line. Beeline uses JDBC to connect to Hive on Tez to execute commands. Parsing, compilation, and execution are done in Hive on Tez. Beeline supports many of the command line options supported by Hive CLI. However, Beeline does not support hive-e set key=value to configure Hive Metastore.
You can enter supported Hive CLI commands by invoking Beeline using the hive keyword, command options, and commands. For example, hive-e set. Using Beeline instead of the fat client Hive CLI that is no longer supported has many advantages, including lower overhead. Beeline does not use the entire Hive code base. The small number of daemons required to execute queries simplifies monitoring and debugging.
Hive on Tez enforces whitelist and blacklist settings, which you can change using the SET command. Using blacklists, you can limit memory configuration changes to prevent instability. You can configure multiple Hive on Tez instances with different whitelists and blacklists to establish different levels of stability.
Apache Hive Metastore sharing
Hive, Impala, and other components can share remote Hive meta-storage. In the CDP public cloud, HMS uses a pre-installed MySQL database. On the public cloud, you need little or no configuration of HMS.
Integrate Spark
The Spark and Hive tables interoperate using Hive Warehouse Connector.
You can use Hive Warehouse Connector to access ACID tables and external tables from Spark. You do not need Hive Warehouse Connector to read Hive external tables from Spark and write Hive external tables from Spark. You do not need HWC to read or write to Hive external tables. Spark users simply read or write directly from the Hive. You can read Hive external tables in ORC or Parquet format. However, you can only write the external table of Hive in ORC format.
Execution of query batches and interactive workloads
You can connect to Hive using JDBC command-line tools (such as Beeline) or using JDBC / ODBC drivers and BI tools (such as Tableau). The client communicates with an instance of the same Hive on Tez version. You can configure settings files for each instance to perform batch or interactive processing.
These are all the contents of the article "what is the architecture of Apache Hive 3?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.