What's the difference between Hive and ordinary relational database? 07/12 Update SLTechnology News&Howtos

What's the difference between Hive and ordinary relational database?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the difference between Hive and ordinary relational database". In daily operation, I believe that many people have doubts about the difference between Hive and ordinary relational database. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubt of "what's the difference between Hive and ordinary relational database?" Next, please follow the editor to study!

Query language. Because SQL is widely used in data warehouse, a query language like SQL, HQL, is designed according to the characteristics of Hive. Developers who are familiar with SQL development can easily use Hive for development.

Data storage location. Hive is built on top of Hadoop, and all Hive data is stored in HDFS. The database, on the other hand, can save the data in a block device or a local file system.

Data format. There is no specific data format defined in Hive, which can be specified by the user. User-defined data formats need to specify three attributes: column delimiters (usually spaces, "\ t", "\ x001"), row delimiters ("\ n"), and methods to read file data (there are three file formats TextFile,SequenceFile and RCFile by default in Hive due to the loading of data. No conversion from user data format to Hive-defined data format is required, so Hive does not make any changes to the data itself during loading, but simply copies or moves the data contents to the appropriate HDFS directory. In the database, different databases have different storage engines and define their own data formats. All data is stored according to a certain organization, so the process of loading data in the database can be time-consuming.

Data update. Because Hive is designed for data warehouse applications, and the content of data warehouse is read more and write less. Therefore, rewriting and adding data are not supported in Hive, and all data is determined at load time. The data in the database usually needs to be modified frequently, so you can use INSERT INTO. VALUES adds data, using UPDATE... SET modifies the data.

Indexes. As mentioned earlier, Hive does not do any processing or even scan the data while loading the data, so some Key in the data is not indexed. When Hive wants to access a specific value in the data that meets the conditions, it needs to violently scan the entire data, so the access delay is high. Due to the introduction of MapReduce, Hive can access data in parallel, so even without an index, Hive can still show its advantages for accessing a large amount of data. In the database, the index is usually built on one or more columns, so the database can have high efficiency and low latency for the access of a small number of data with specific conditions. Due to the high latency of data access, Hive is not suitable for online data query.

Execute. Most queries in Hive are executed through MapReduce provided by Hadoop (queries like select * from tbl do not require MapReduce). Databases usually have their own execution engine.

Execution delay. As mentioned earlier, Hive has a high latency when querying data because there is no index and the entire table needs to be scanned. Another factor contributing to the high latency of Hive execution is the MapReduce framework. Because MapReduce itself has a high latency, there will also be a high latency when executing Hive queries using MapReduce. In contrast, the execution latency of the database is low. Of course, this low is conditional, that is, the data size is small, when the data scale is too large to exceed the processing capacity of the database, the parallel computing of Hive can obviously show its advantages.

Scalability. Because Hive is built on top of Hadoop, the scalability of Hive is consistent with that of Hadoop (the largest Hadoop cluster in the world is in Yahooqi, with a size of about 4000 nodes in 2009). However, due to the strict limitation of ACID semantics, the expansion of rows in database is very limited. At present, the most advanced parallel database, Oracle, can only expand about 100 sets in theory.

Data scale. Because Hive is built on a cluster and can use MapReduce for parallel computing, it can support large-scale data; correspondingly, the database can support a smaller scale of data.

At this point, the study on "what's the difference between Hive and ordinary relational databases" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.