Summary of the characteristics and main points of hive 07/09 Update SLTechnology News&Howtos

Summary of the characteristics and main points of hive

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "summary of the characteristics and main points of hive". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the characteristics and main points of hive".

one。 What is Hive and its characteristics

Official website introduction: The Apache Hive ™data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. In other words, hive data warehouse software uses sql to read, write, and manage large datasets that reside in distributed storage. It shows that the development language of Hive is SQL, and in fact, our common distributed computing framework is spark,MapReduce,Storm, etc., so how does Hive use SQL language to do distributed computing?

1.1Hive can be thought of as a client of MapReduce

Because the underlying operation of Hive is the MapReduce computing framework, Hive only converts readable and easy-to-program SQL statements into MR programs through Hive software to execute on the cluster. Hive can be regarded as a mapreduce client, and basically all the tasks that can be completed with mapreduce programs can be replaced with hive tasks written by hql (Hive SQL). Therefore, because of the design characteristics of hadoop and hdfs, it also limits the job characteristics that hive can be competent for. The biggest limitation of Hive is that it does not support updates, deletions and additions based on row records. However, users can generate a new table through a query, or import the query results into a file to "implement" hive row record-based operations.

1.2.Hive is a batch processing system.

Because mapreduce is a batch system, hive is also based on batch processing of large amounts of data. Also because mapreduce has high latency (1. The startup time is 2. 5%. The intermediate results are placed locally rather than in memory), causing hive execution to take too long.

1.3.hive does not support things

Therefore, it does not support OLTP (connection transaction processing), but is more suitable for OLAP (online Analytical processing). Similarly, Hive does not support many uses of SQL, which will be discussed later.

II. The relationship between Hive and hdfs,mysql,mapreduce

1. Give an example of the relationship between hive,mysql and hdfs

The following is a completed process of creating a table from hive to importing data into the table. 1-9 illustrates the process between hive,mysql,hdfs.

two。 Summary of main points

1.Hive does not store data, Hive needs to analyze and calculate the data, and the calculated data is actually stored on distributed systems, such as HDFS.

To some extent, 2.Hive does not calculate data, but is just an interpreter. It only interprets the logic of data processing that users need to process through SQL programming and interprets it into MapReduce program, and then submits this MR program to Yarn for scheduling and execution. So it is the MapReduce program that actually carries out distributed operations.

3. Because in order to manipulate datasets on HDFS, Hive needs to know the segmentation format of the data, such as row and column delimiters, storage type, whether to compress, data storage address and so on. In order to facilitate later operation, he needs to store this information in a table, and then store the table (metadata) in mysql. Why it is stored in mysql (actually remote mysql), because hive itself is an interpreter, so it does not store data, why it is stored in remote mysql, later on.

The hive database in mysql is created manually by yourself, and then execute the following statement:

Schematool-dbType mysql-initSchema

The purpose of this statement is to create a large number of tables under mysql's hive database.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.