Construction method of data Warehouse and Hive Environment 07/03 Update SLTechnology News&Howtos

Construction method of data Warehouse and Hive Environment

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "the method of building data warehouse and Hive environment". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the method of building data warehouse and Hive environment".

Last time I introduced HDFS, I wanted to enter Mapreduce, but I felt that Mapreduce was basically abandoned, so I went directly into Hive.

Data warehouse

Data warehouse, English name is Data Warehouse, can be abbreviated as DW or DWH. As the name implies, data warehouse is a large collection of data storage, which is created for the purpose of analytical reporting and decision support of enterprises to filter and integrate a variety of business data.

It provides enterprises with certain BI (business intelligence) capabilities to guide business process improvement, monitoring time, cost, quality and control.

The input side of the data warehouse is a variety of data sources, and the final output is used for enterprise data analysis, data mining, data reports and other directions.

The difference between database and data warehouse

Database is a transaction-oriented processing system, which is aimed at the daily operation of specific business online in the database, usually query and modify the records. Users are more concerned about the response time of operations, data security, integrity and the number of concurrent users.

Data warehouse generally analyzes the historical data of some topics and supports management decisions, also known as online analytical processing OLAP (On-Line Analytical Processing).

For example, Alipay's annual bill is essentially based on data visualization in a data warehouse.

Data warehouse is produced in order to further mine data resources and make decisions when a large number of databases already exist. It is by no means a so-called "large database".

Data warehouse layering

According to the process of data inflow and outflow, the data warehouse architecture can be divided into three layers-source data (ODS), data warehouse (DW) and data application (APP).

Hive

Hive is a data warehouse framework built on Hadoop. Originally, Hive was developed by Facebook and later handed over to Apache! Developed by the Software Foundation and as an open source Apache project.

Hive is a data warehouse infrastructure built on Hadoop. It provides a series of tools to store, query, and analyze large-scale data sets stored in distributed storage systems. Hive defines a simple SQL-like query language, which transforms SQL into specific computing tasks through the underlying computing engine.

Hive supports Mapreduce, Tez, Spark and other distributed computing engines.

Hive environment building

There is no need to configure a cluster to build a Hive environment. The installation of Hive actually consists of two parts, one is the server and the other is the client. The so-called server is actually the Hive of Hive managing Meta. The server can be installed on any node, it can be on Namenode or on any node of Datanode.

Hive's client interface tool was an early choice of SQuirrel SQL Client, but recently I fell in love with Apache Zeppelin,Apache Zeppelin, a Web-based NoteBook that is no different from Juypyter Notebook.

To build the Hive environment, you need to build the Mysql. Here, select the node node02 to build the Mysql environment.

[hadoop@node02 ~] $cd module/ [hadoop@node02 module] $mkdir mysql [hadoop@node02 module] $cd mysql/ [hadoop@node02 mysql] # wget https://dev.mysql.com/get/mysql57-community-release-el7-9.noarch.rpm [hadoop@node02 mysql] $sudo rpm-ivh mysql57-community-release-el7-9.noarch.rpm [hadoop@node02 yum.repos.d] $yum install mysql-server [hadoop@node02 yum.repos.d] # # One login skips permission authentication [hadoop@node02 yum.repos.d] # sudo vim / etc/my.cnf # [mysqld] # add the following line skip-grant-tables [hadoop@node02 yum.repos.d] # sudo systemctl start mysqld [hadoop@node02 yum.repos.d] # mysql-u root mysql > flush privileges Query OK, 0 rows affected (0.00 sec) mysql > ALTER USER 'root'@'localhost' IDENTIFIED BY' 123456 WITH GRANT OPTION; Query OK, 0 rows affected (0.00 sec) mysql > create database hive; Query OK, 1 row affected (0.00 sec) mysql > exit; [hadoop@node02 yum.repos.d] # mysql-u root-p123456 mysql > use mysql; # set remote connection permissions mysql > GRANT ALL PRIVILEGES ON *. * TO 'root'@'%' IDENTIFIED BY' 123456' WITH GRANT OPTION Query OK, 0 rows affected, 1 warning (0.00 sec) mysql > FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec)

Let's start installing Hive on the centos system. To take into account the Hadoop3.1.4 version, we chose to install the hive3.1.2 version. Hive download official: http://www.apache.org/dyn/closer.cgi/hive/

[hadoop@node02 module] $ls apache-hive-3.1.2-bin.tar.gz hadoop mysql [hadoop@node02 module] $tar-zxvf apache-hive-3.1.2-bin.tar.gz [hadoop@node02 module] $mv apache-hive-3.1.2-bin hive [hadoop@node02 module] $ls apache-hive-3.1.2-bin.tar.gz hadoop hive mysql [hadoop@node02 conf] $mv hive-env.sh.template hive-env.sh [hadoop @ node02 conf] $vim hive-env.sh # export HADOOP_HOME=/home/hadoop/module/hadoop/hadoop-3.1.4 export HIVE_CONF_DIR=/home/hadoop/module/hive/conf export HIVE_AUX_JARS_PATH=/home/hadoop/module/hive/lib [hadoop@node02 conf] $sudo vim / etc/profile # export HIVE_HOME=/home/hadoop/module/hive export PATH=$PATH:$HIVE_HOME/bin export HIVE_ CONF_DIR=$HIVE_HOME/conf [hadoop@node02 conf] $source / etc/profile [hadoop@node02 conf] $mv hive-default.xml.template hive-site.xml [hadoop@node02 conf] $vim hive-site.xml # hive.exec.local.scratchdir / home/hadoop/module/data/hive/jobs hive.downloaded.resources.dir / home/hadoop/module/data/hive/resources javax.jdo.option. ConnectionUserName root javax.jdo.option.ConnectionPassword 123456 javax.jdo.option.ConnectionURL jdbc:mysql://192.168.147.129:3306/hive?createDatabaseIfNotExsit=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver

In order to connect to Mysql using Java, you need to download the Mysql driver at https://maven.ityuan.com/maven2/mysql/mysql-connector-java/5.1.33.

After the download is complete, put it in the lib folder and initialize the Mysql database through hive.

[hadoop@node02 lib] $pwd / home/hadoop/module/hive/lib [hadoop@node02 lib] $wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.33/mysql-connector-java-5.1.33.jar [hadoop@node02 lib] $schematool-dbType mysql-initSchema

When initializing the Mysql database in Hive, it is easy to encounter two common Bug:

The first Hive initializes the Mysql database: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

The reason for the error: the system cannot find the jar package or jar package in which this class is located. The system does not know which one to use. The reason why hive starts the error report is the latter.

Solution:

The jar package where the class com.google.common.base.Preconditions.checkArgument resides is: guava.jar

The jar package in hadoop-3.2.1 (path: hadoop\ share\ hadoop\ common\ lib) is guava-27.0-jre.jar; and the jar package in hive-3.1.2 (path: hive/lib) is guava-19.0.1.jar.

Change the jar package to a consistent version: delete the lower version of the jar package in hive and copy the higher version of hadoop into the lib of hive.

The second Hive initializes the Mysql database: Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8 at

Cause of error: in its own hive-site.xml configuration file, line 3215 (see the second line of the error record) has special characters

Solution: enter the hive-site.xml file, jump to the corresponding line, and delete the special characters inside.

If you report to Unknown database 'hive', it is recommended to create a hive database directly in MySQL.

Finally, Hive successfully initializes the Mysql database as shown in the following figure:

If you look at the hive database, you will see the corresponding initialization table generation.

Enter hive to enter the Hive command line, indicating that the Hive has been built successfully.

Thank you for your reading. the above is the content of "Building method of data Warehouse and Hive Environment". After the study of this article, I believe you have a deeper understanding of the method of building data warehouse and Hive environment, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.