Hive for Hadoop deployment (5) 07/02 Update SLTechnology News&Howtos

Hive for Hadoop deployment (5)

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

HiveHive is a data warehouse tool based on Hadoop, which can map structured data into a database table and provide HQL (Hive SQL) query function. The underlying data is stored on HDFS. The essence of Hive is to transform SQL statements into MapReduce tasks, making it easy for users unfamiliar with MapReduce to process and compute structured data on HDFS using HQL, suitable for offline batch data computation. Hive relies on HDFS to store data, Hive converts HQL into MapReduce execution, so Hive is a data warehouse tool based on Hadoop, essentially a MapReduce computing framework based on HDFS to analyze and manage data stored in HDFS.

Why use Hive?

Problems with using MapReduce directly:

1. Personnel learning costs are too high

2. The project cycle requirements are too short

3. MapReduce is too difficult to implement complex query logic development

Why use Hive:

1. More friendly interface: The operation interface adopts SQL-like syntax to provide rapid development capability.

Lower learning costs: Avoid writing MapReduce and reduce developer learning costs

3. Better scalability: cluster size can be freely expanded without restarting services, and user-defined functions are also supported.

3. Hive's advantages:

1. Scalability, horizontal scale, Hive can freely expand the size of the cluster, under normal circumstances do not need to restart the service Horizontal scale: expand the size of the cluster by sharing the pressure Vertical scale: a server cpu i7 - 6700k 4 cores 8 threads, 8 cores 16 threads, memory 64 G => 128 G

2. Extensibility. Hive supports custom functions. Users can implement their own functions according to their own needs.

3. Good fault tolerance, which can ensure that SQL statements can still be executed even if there are problems with nodes.

Disadvantages:

1. Hive does not support record-level addition, deletion and modification, but users can generate new tables by query or import query results into files (currently selected hive-2.3.2 version supports record-level insertion).

Hive's query delay is very serious, because the MapReduce Job startup process takes a long time, so it cannot be used in interactive query systems.

Hive does not support transactions (because there is no addition, deletion and change, it is mainly used for OLAP (online analytical processing), not OLTP (online transaction processing), which is the two levels of data processing).

4. Hive's structure

Hive installation 1. MySQL installation (datanode01)

Hive's metadata is stored in RDBMS, and all data except metadata is stored based on HDFS. By default, Hive metadata is stored in an embedded Derby database that allows only one session connection and is only suitable for simple testing. It is not applicable in actual production environment. In order to support multi-user sessions, an independent metadata database is required. MySQL is used as the metadata database. Hive provides good support for MySQL internally.

yum install mariadb-server 2, MySQL startup

start the database

systemctl start mariadbsystemctl enable mariadb 3, Hive download installation #download installation package wget www.example.com extract installation package tar xf apache-hive-2.3.3-bin. tar. gzmv apache-hive-2.3.3-bin/usr/local/hive #create directory mkdir-p/home/hive/{log, tmp, job} 4, configure Hive environment variables

Edit the file/etc/profile. d/hive.sh to read as follows:

# HIVE ENVexport HIVE_HOME=/usr/local/hiveexport PATH=$PATH:$HIVE_HOME/bin

Make HIVE environment variable effective.

source/etc/profile. d/hive.sh III. Hive configuration 1. Configure metastore (datanode01) mysql> grant all privileges on *. * to 'hive'@'%' identified by 'hive123456' with grant option;mysql> grant all privileges on *.* to 'hive'@'datanode01' identified by 'hive123456' with grant option;mysql> grant all privileges on *.* to 'thbl_prd_hive'@'%' identified by 'hive123456' with grant option;mysql> grant all privileges on *.* to 'hive'@'localhost' identified by 'hive123456' with grant option;mysql> grant all privileges on *.* to 'thbl_prd_hive'@'localhost' identified by 'hive123456' with grant option; mysql> flush privileges; 2, Configure jdbc (datanode01) wget www.example.com tar xf mysql-connector-java-5.1.45.tar.gzcp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar/usr/local/hive/lib/3, http://mirrors.163.com/mysql/Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz (namenode 01) cd/usr/local/hive/confmkdir template v *. template template #arrange configuration files cp template/hive-exec-log4j2.properties.template hive-exec-log4j2.propertiescp template/hive-log4j 2. properties. template hive-log4j2.propertiescp template/hive-default. xml. template hive-default. xmlcp template/hive-env. sh. template www.example.com 4, Configure hive-env.sh (namenode01)

Edit the file/usr/local/hive/conf/hive-env.sh and modify it as follows:

HADOOP_HOME =/usr/local/hadoopexport HIVE_CONF_DIR =/usr/local/hive/confexport HIVE_AUX_JARS_PATH =/usr/local/hive/lib 5. Configuration data repository hive-site. xml (namenode 01)

Edit the file/usr/local/hive/conf/hive-site.xml and modify it as follows:

hive.exec.local.scratchdir /home/hive/job Hive's local temporary directory, used to store map/reduce execution plans at different stages hive.downloaded.resources.dir /home/hive/tmp/${hive.session.id}_resources Hive downloaded local temporary directory hive.querylog.location /home/hive/log/${system:user.name} hive runtime structured log path hive.hwi.war.file lib/hive-hwi-2.1.1.war HWI war file path, associated with ${HIVE_HOME}. hive.server2.logging.operation.log.location /home/hive/log/${system:user.name}/operation_logs Log open, operation log path hive.metastore.local false datanucleus.schema.autoCreateAll true Automatically create the necessary schema at startup hive.metastore.warehouse.dir /hive/warehouse Hive Data Warehouse Path in HDFS hive.metastore.uris thrift://datanode01:9083 Thrift URI of the remote metastore for the metastore client to connect to the metastore server javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver JDBC driver name javax.jdo.option.ConnectionURL jdbc:mysql://datanode01:3306/hive_db? createDatabaseIfNotExist=true JDBC Connection Name javax.jdo.option.ConnectionUserName hive User name to connect to metastore database javax.jdo.option.ConnectionPassword hive123456 Password to connect to metastore database hive.metastore.schema.verification false Enforce version consistency of metastore schema 6. Configure permissions (namenode 01) scp/usr/local/hive/conf/* datanode01:/usr/local/hive/conf/chmod 755/usr/local/hive/conf/* start hiveserver 2 hive--service hiveserver 2 &2. start metastorehive--service metastore &5. hive check 1. JPS [root@namenode01~]#jps 14512 NameNode 14786 ResourceManager 21348 RunJar 15894 HMaster 22047 Jps [root@namenode01~] datanode01~]#jps 3509 DataNode 3621 NodeManager 1097 QuorumPeerMain 9930 RunJar 3935 HRegionServer 10063 Jps 2, hive shell [root@namenode01~]#hiveSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar: file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/ org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/ org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive> show tables;OKTime taken: 0.833 seconds

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.