Why HIVE external tables are slower than Internal tables 07/08 Update SLTechnology News&Howtos

Why HIVE external tables are slower than Internal tables

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you the "HIVE external table is slower than the internal table", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "why the HIVE external table is slower than the internal table" this article.

Take HBASE as an example. If you use HIVE as a query tool for a HBASE client, it will not be much slower if the statement is escaped and sent to HBASE,HBASE to return data. After all, there is only one more layer of escape from SQL to HBASE. Since the truth is slow, then we can assume that HIVE external tables can not be understood in this way, there should be other things hindering the performance of HIVE external tables, after all, HIVE is MAPREDUCE.

Hbase (main): 003VOUL0 > count'tassign devicefault fault statistics'

557 row (s) in 0.2890 seconds

= > 557

Here we test a HBASE table with only 557 pieces of data to see how the HIVE external table differs from the client side of HBASE itself. Do not work as follows:

1. When you open HBASE UI, http://hostname:60010/table.jsp?name=t_device_fault_statistics, one indicator here is requests (at first, I think this is the number of requests, after testing, I think this is the final number of rows obtained by the query request, because if you randomly query a number that does not exist, you will find that this number will not increase, but if you query output 10 pieces of data, this number will increase by 10)

two。 Write a JAVA program, or through the HBASE client

3. Establish the HIVE external table of HBASE

Testing begins as soon as the above work is done, and the whole guess is to compare the difference between requests growth after access through HIVE external tables and requests growth accessed through Hbase's own API or client.

Current requests: 74555

Here is the program access to see the requests growth by matching the prefix of ROWKEY:

Val scan = new Scan ()

Scan.setCaching (100)

Scan.setRowPrefixFilter (Bytes.toBytes ("i517T5100"))

The requests after the visit is 74559, an increase of 4, and the returned result is 4, so I can understand that through the ROWKEY prefix i517T5100, access to 4 records, requests has also increased by 4.

I can rewrite the above program as SQL: select count (*) from t_device_fault_statistics where id like 'i517T5100%'

After the visit, the return result is 4. Let's take a look at the requests question: 75216, 75216-74559657 (I tested it many times, the actual row of the table is 557, but each time it increases by 657, I'm not sure why it's not 557, but 657).

For the time being, regardless of why it is not 557, but the actual 657, you can see that by accessing the ROWKEY prefix, the HBASE client has only 4 requests growth, but the HIVE external table has 657. Can you understand it this way? HIVE uses the SQL query to send

All the data is queried and then filtered by the HIVE itself, while the HBASE query is filtered on the server, so the requests grows to the number of rows of the table after the HIVE query.

After testing, except that the SQL condition is equal to rowkey, the requests growth will be the same as the Hbase client, and the rest must be a full table scan.

From the above test, except that the HIVE external table is not slow to use equal to ROWKEY, other queries should load all data from HBASE and then filter through HIVE itself. The strange thing is why can HIVE be filtered through HBASE instead of relying on HIVE itself in the ROWKEY way? If the description is equal to ROWKEY, HIVE can be directly escaped into a HBASE execution statement to locate a piece of data, while other ways HIVE can do nothing but complete the table.

The above is all the contents of the article "Why the HIVE external table is slower than the internal table". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.