Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What basic knowledge does hive need to master?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what basic knowledge hive needs to master". The content is simple and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn this article "what basic knowledge does hive need to master?"

About Hive

Hive, as a data warehouse, originated from Facebook, and its working principle can be roughly explained as follows: interpreting, compiling, optimizing and generating the query plan entered by the user, and transforming it into a MapReduce task to execute: interpreter-compiler-optimizer-executor

Hive metadata is generally stored in relational databases, such as MySql

The underlying storage uses HDFS distributed storage system.

Advantages: 1. Simple and easy to use: provides a SQL-like query language HQL;2. Scalability: designed computing / scalability for very large data sets (MR as computing engine, HDFS as storage system); 3. Provide unified metadata management; 4. Hive supports user-defined functions, users can implement their own functions according to their own needs: inherit hive's UDF class, rewrite evaluate methods; 5. Fault tolerance: good fault tolerance, SQL can still execute if there is a problem with the node.

Disadvantages: 1. The HQL expression ability of hive is limited; 2. The efficiency is relatively low, and the automatically generated mapreduce jobs are usually not intelligent enough, and it is difficult to tune.

Similarities and differences between Hive and traditional Database

Hive data type

Hive user interface

1) Hive CLI (Hive CommandLine,Hive command line), the client can operate directly in command line mode

2) hwi (Hive WebInterface,Hive Web interface), Hive provides a more intuitive Web interface

3) hiveserver,Hive provides Thrift service, and Thrift client currently supports C++/Java/PHP/Python/Ruby.

Hive common file formats

Textfile: default format, no data compression, high disk overhead, high data parsing overhead

SEQUENCEFILE:SequenceFile is a kind of binary file support provided by Hadoop API, which is easy to use, divisible and compressible.

Rcfile:RCFILE is a storage method that combines row and column storage. First of all, it divides the data into rows to ensure that the same record is on the same block, avoiding the need to read multiple block to read a record. Secondly, block data column storage is beneficial to data compression and fast column access.

Parquet:Apache Parquet is a new type of column storage format in Hadoop biosphere, which is compatible with most computing frameworks in Hadoop biosphere.

Hive data import and export

Import from local data: load data local inpath'/ home/hadoop/test.txt' overwrite into table test01partition (state='good', city='xiamen')

Import data from other tables: from test01 insert overwrite table test02 partition (state='good', city='xiamen') select id,name,age,course,body,address

Export data to the local file system: insert overwrite localdirectory'/ home/hns/test' select * from test01

Export data to HDFS: insertoverwrite directory'/ home/hns/test' select * from test01

Another table exported to hive: insertinto table test02 partition (age='25') select id, name, tel from test01

Hive basic statement

Build a table:

CTRAT TABLE IF NOT EXISTS page_view (viewTime INT,userid BIGINT)

ROWFORMAT DELIMITED FIELDS TERMINATED BY','/ / specifies that the column delimiter is

LINE TERMINATED BY'\ n'/ / specifies that the line delimiter is a newline character\ n

PARTITIONEDBY (country STRING, state STRING) / / partitioned with these two fields

STORED AS TEXTFILE / / specify the format of the storage file

Delete the table:

DROPTABLE IF EXISTS page_view

Modify column information:

ALTERTABLE page_view ADD COLUMNS (appname STRING COMMENT 'Application name',sessionid LONG COMMENT' The current sessionid')

Query:

Select* from tablename; (full table query, not recommended in production environment, slow and resource-intensive, used as an example only)

You can add limit keywords to limit the number of queries, or where statements to filter the scope of queries.

Where... Like... Fuzzy queries can be made, such as:

SELECT* from test.scBUS where scBUS.country like concat ('%','A1 percent')

HAVING clause: allows users to perform tasks that require subqueries to conditionally filter groupings generated by GROUP BY statements through a simple syntax, such as:

Selectsno,sname,stall from student where sage=6 and sex=' male 'group by sno havingstall > 167'

Join statement:

Hive supports normal SQL JOIN statements, but only equivalent connections:

Selectdistinct a.sname from student a left join sc b on (a.sno=b.sno)

About inner, outer, and semi-connected:

The differences between several joins can be summarized as follows: the inner join shows only the data that exists in both tables, while the outer join shows all the data, and the left join is the output of the complete data from the left and outer table. The right outer join is the table on the right that outputs the complete data, and the full outer join is the output of all rows of the two tables. To make a semi-join to supplement the missing in/exists clause of hive relative to mysql and other statements, use left semi join instead. The difference from left join is that the table data on the right will not be loaded into the result.

Orderby and sortby:ORDER BY perform a global sort on the query result set, and all the data is processed by a reducer; SORT BY, which only sorts the data in each reducer, that is, performs a local sorting process, which can be used with DISTRIBUTE BY in production to achieve partitioning and sorting.

The above is all the contents of this article "what basic knowledge hive needs to master". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report