Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What can Hive do?

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is about what Hive can do. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

What can Hive do?

Hive is a data warehouse tool based on Hadoop, which can map structured data files to a database table, provide complete sql query functions, and transform sql statements into MapReduce tasks to run. Its advantage is that the learning cost is low, simple MapReduce statistics can be quickly realized through SQL-like statements, and there is no need to develop special MapReduce applications, so it is very suitable for statistical analysis of data warehouse.

Hive is a data warehouse infrastructure built on Hadoop. It provides a series of tools for ETL https://my.oschina.net/u/2000675/blog/746016#navbar-header, which is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop. Hive defines a simple SQL-like query language called HQL (Hive SQL), which allows users who are familiar with SQL to query data. At the same time, the language also allows familiar with MapReduce developers to develop custom mapper and reducer to handle complex analytical work (HQL libraries) that cannot be done by built-in mapper and reducer.

Why use Hive?

The operation interface adopts SQL-like syntax to provide the ability of rapid development.

Avoid writing MapReduce and reduce the learning cost of developers.

It is convenient to expand the function.

Comparison between Hive and traditional database

HiveRDBMS query language HQLSQL data storage HDFSRaw Device or Local FS executes MapReduceExcutor execution delay high or low processing data size data types all data (historical and online-analysis) online data redundancy with high redundancy and low redundancy (through paradigm).

...

Architecture of Hive

There are three main user interfaces: CLI,Client and WUI. One of the most common is when CLI,Cli starts, it starts a copy of Hive at the same time. Client is the client of Hive, and the user connects to Hive Server. When you start Client mode, you need to indicate the node where Hive Server is located, and start Hive Server on that node. WUI accesses Hive through a browser.

Hive stores metadata in databases such as mysql and derby. The metadata in Hive includes the name of the table, the columns and partitions of the table and its properties, the attributes of the table (whether it is an external table, etc.), the directory where the data of the table is located, and so on.

The interpreter, compiler and optimizer complete the HQL query sentence from lexical analysis, syntax analysis, compilation, optimization and query plan generation. The generated query plan is stored in HDFS and subsequently executed by a MapReduce call.

The data of Hive is stored in HDFS, and most of the queries and calculations are done by MapReduce (queries that include *, such as select * from tbl will not generate MapRedcue tasks).

Thriff (refer to http://www.ibm.com/developerworks/cn/java/j-lo-apachethrift/)

Related concepts of Hive

Operator (smallest processing unit): each operator represents an operation of HDFS or a MapReduce job

Operator is a process defined by Hive.

Operator definition (tree structure):

ProtectedListjavax.jdo.option.ConnectionURL jdbc:derby:;databaseName=metastore_db;create=true javax.jdo.option.ConnectionDriverNameorg.apache.derby.jdbc.EmbeddedDriverhive.metastore.localtruehive.metastore.warehouse.dir/user/hive/warehouse Note: when using derby storage, running hive generates a derby file and a metastore_db directory in the current directory. The disadvantage of this storage method is that only one hive client can use the database at the same time in the same directory, otherwise it will prompt the error [html] view plaincopyprint? Hive > show tables;FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database 'metastore_db', see the next exception for details.NestedThrowables:java.sql.SQLException: Failed to start database' metastore_db', see the next exception for details.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive > show tables FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database 'metastore_db', see the next exception for details.NestedThrowables:java.sql.SQLException: Failed to start database' metastore_db', see the next exception for details.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask2. Local mysql storage requires running a mysql server locally and configuring it as follows (you need to copy the driver jar package of mysql to the $HIVE_HOME/lib directory). # / opt/hive-1.2.1/conf/hive-site.xmlhive.metastore.warehouse.dir / user/hive_remote/warehouse hive.metastore.localtruejavax.jdo.option.ConnectionURL jdbc:mysql://localhost/hive_remote?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driverjavax.jdo.option.ConnectionUserNamehivejavax.jdo.option.ConnectionPasswordpassword attached: install mysqlYum install mysql-server-y start service service mysqld startmysql modify mysql permission: GRANT ALL PRIVILEGES ON *. * TO 'root'@'%' IDENTIFIED BY' 123' WITH GRANT OPTION Flush privileges;delete from user where Host! ='%'; delete extra data refresh permissions that will affect permissions [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expectedat jline.TerminalFactory.create (TerminalFactory.java:101) error: Hadoop jline version and hive jline inconsistency 3. Remote mysql3.1.remote all-in-one storage requires running a mysql server on the remote server and starting the meta service on the Hive server. Here, use mysql's test server, ip bit 192.168.1.214, to create a new hive_remote database Character set latine1 hive.metastore.warehouse.dir/user/hive/warehousejavax.jdo.option.ConnectionURLjdbc:mysql://192.168.57.6:3306/hive?createDatabaseIfNotExist=truejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driverjavax.jdo.option.ConnectionUserNamehivejavax.jdo.option.ConnectionPasswordpasswordhive.metastore.localfalsehive.metastore.uristhrift://192.168.1.188:9083 Note: here the server and client of hive are on the same server. The server and the client can be taken apart 3.2.Remote splits the hive-site.xml configuration file into the following two parts-server profile startup: hive--service metastorehive.metastore.warehouse.dir/user/hive/warehousejavax.jdo.option.ConnectionURLjdbc:mysql://192.168.57.6:3306/hive?createDatabaseIfNotExist=truejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driverjavax.jdo.option.ConnectionUserNamerootjavax.jdo.option.ConnectionPassword123456- client profile startup: hivehive.metastore.warehouse.dir / user/hive/warehousehive.metastore.localfalsehive.metastore.uristhrift://slave2:9083 Thank you for your reading! This is the end of this article on "what can Hive do?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report