In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Hive creates background
Apache Hive data warehouse software makes it easy to read, write, and manage large data sets distributed across distributed storage using SQL. Structures can be projected onto already stored data. A command-line tool and JDBC driver are provided to connect users to Hive.
·Open sourced by Facebook, originally used to solve massive structured log data statistics problems
MapReduce programming inconvenience
Files on HDFS lack Schema (field names, field types, etc.)
2 What is Hive?
·Data warehouse built on top of Hadoop
Hive defines a SQL-like query language: HQL (similar to SQL but not identical)
·Usually used for offline data processing (MapReduce)
·Low-level support for multiple execution engines (Hive on MapReduce, Hive on Tez, Hive on Spark)
·Supports a variety of different compression formats, storage formats and custom functions (compression: GZIP, LZO, Snappy, BZIP2.. ; Storage: TextFile, SequenceFile, RCFile, ORC, Parquet ; UDF: Custom Function)
What exactly is Hive? Let's first take a look at how Hive's official Wiki introduces Hive (https://cwiki.apache.org/confluence/display/Hive/Home):
Apache Hive Apache Hive™ data warehouse software provides great convenience for reading, writing, and managing large data sets stored in distributed storage, as well as querying large data sets using SQL syntax.
1. It is a tool that is easy to extract, transform and load data (ETL). It can be understood as data cleansing analysis presentation. It has a mechanism for imposing structure on large amounts of formatted data. It can analyze and process data stored directly in HDFS or data stored in other data storage systems, such as hbase. Query execution is done via mapreduce. Hive can use stored procedures 6. Apache YARN and Apache Slider to achieve sub-second query retrieval.
III. Hive installation
1. Hive stand-alone installation (using derby for metadata storage)
·Installation package preparation
Upload hive installation package apache-hive-1.2.1-bin.tar.gz to VM/bigdata/
JDK installation package jdk-8u151-x64.gz
Cluster Preparation (Linux1,Linux2,Linux3)
· Hive unzipping installation
Unzip the uploaded hive to the VM/app directory
tar -zxvf /app/apache-hive-1.2.1-bin.tar.gz -C /app
mv /app/apache-hive-1.2.1-bin/ /app/hive-1.2.1
·Configure Hive's profile
View Profile Contents
Copy the configuration file hive-env.sh. template to hive-env.sh
cp /app/hive-1.2.1/conf/hive-env.sh.template /app/hive-1.2.1/conf/hive-env.sh
vim /app/hive-1.2.1/conf/hive-env.sh
·Configure hive environment variables
vim /etc/profile
source /etc/profile
which hive
·Start hadoop cluster
·Start Hive service
hive
·View database
show databases;
·Create a database
create database myhive;
show databases;
·Creating tables
create table student(id int,chinese string,math string,English string);
·Load data and query
load data local inpath '/root/student.txt' into table student;
select * from student;
2. Hive's stand-alone installation mode (using mysql for metadata storage)
Install MySQL server and MySQL client and start MySQL service.
·Set up a MySQL account for Hive on linux1, and give it sufficient permissions
create user 'hive' identified by '123456';
GRANT ALL PRIVILEGES ON *.* TO hive@'%' IDENTIFIED BY '123456' with grant option;
GRANT ALL PRIVILEGES ON *.* TO hive@'localhost' IDENTIFIED BY '123456' with grant option;
flush privileges
See if it works.
·Continue to configure hive in inline mode: hive-site.xml, hive-env.sh
Configure hive-env.sh
Configure hive-site.xml, copy hive-default.xml file under/app/hive-1.2.1/conf as hive-site.xml
cp /app/hive-1.2.1/conf/hive-default.xml.template /app/hive-1.2.1/conf/hive-site.xml
vim /app/hive-1.2.1/conf/hive-site.xml
·Copy the data driver jar package to the specified directory/app/hive-1.2.1/lib/. No driver pack will report error
·Start hive service from command line, then view database, create database named heihei, view cluster web page
Looking at the cluster web page, you can see that the file directory corresponding to the heihei database has been generated on hdfs
·Access hive using beeline
Exit the hive service just now. Modify the hadoop configuration file etc/hadoop/core-site.xml on linux1, add the following configuration items, and log in to the hdfs file system anonymously through the httpfs interface. Then restart the cluster.
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.root.groups
*
Use the command hive --service hiveserver2 & background to start the hive service
hive --service hiveserver2 &
Cloned windows connect as clients and execute beeline scripts
Connect to the server. This method uses the thrift service. 10000 is the default connection port number.
! connect jdbc:hive2://linux1:10000
Verify that the connection is the hive service we just accessed from the command line
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.