The arrangement of Hive Notes (1) 07/19 Update SLTechnology News&Howtos

The arrangement of Hive Notes (1)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

Hive Note arrangement (1) Hive

Hive, contributed to Apache by facebook, is the basic framework of a data warehouse based on Hadoop.

The characteristics of the data warehouse-- the explanation about the data stored in the data warehouse: the data that can provide data support for all levels of decision-making of the enterprise actually put it bluntly, it is the difference between a warehouse database and a data warehouse that stores data. The modern data warehouse is built on the database and uses the database as the carrier to store data. Data warehouse emphasizes the storage of historical data, while the database emphasizes the storage of online data. The data warehouse emphasizes the operation of OLAP, and the database emphasizes the operation of OLTP OLAP:Online Analysis Processing online analytical processing-- > analyzing and querying data, select, load OLTP:Online Transcation Processing online transaction processing-> transactional operations on data. Update delete data warehouse operations are all historical data. Almost all database operations are online transaction data ETL--- is used to build a data warehouse concept E (Extract extraction) to obtain data, which is called extraction, acquisition T (Transform conversion) classifies the data entering the warehouse, cleans L (Load load) data into the warehouse process is LoadBI (Business Intelligence) Hive is a SQL parsing engine The ability to transfer HQL to MR data on hadoop computing hdfs. Summary of HIve data storage structure Hive data storage based on Hadoop HDFSHive there is no special data storage format storage structure mainly includes: database, file, table, view, index Hive can load text files (TextFile) directly by default, and also supports SequenceFile, RCFile, ORCFile, Parquet when creating tables, specify the column separator and row separator of Hive data, and Hive can parse the data system architecture user interface: including CLI,JDBC/ODBC WebUI metadata storage: usually stored in relational databases such as mysql, derby, etc. Driver: interpreter, compiler, optimizer, executor Hadoop: using HDFS for storage Using MapReduce to calculate the three prerequisites for Hive installation JDKHADOOPMySQL is to ensure that the installation of Hive is completed after the above three steps are completed. The first step: install MySQL (offline) operation directory: / home/uplooking/soft-- > installation package directory 1 °, query the existing mysql dependency package in linux [uplooking@uplooking01 ~] $rpm-qa | grep mysql mysql-libs-5.1.71-1.el6.x86_642 °, Delete the existing mysql dependency package in linux [uplooking@uplooking01 ~] $sudo rpm-e-- nodeps `rpm-qa | grep mysql`3 °, install server and client [uplooking@uplooking01 ~] $sudo rpm-ivh soft/MySQL-server-5.5.45-1.linux2.6.x86_64.rpm [uplooking@uplooking01 ~] $sudo rpm-ivh soft/MySQL-client-5.5.45-1.linux2.6.x86_64.rpm4 °, Start the mysql server service [uplooking@uplooking01 ~] $sudo service mysql start (Note: after offline installation, the service name of mysql is mysql After online installation, the service name is msyqld) 5 °, add to the boot item [uplooking@uplooking01 ~] $sudo chkconfig mysql on6 °, set the user name and password [uplooking@uplooking01 ~] $sudo / usr/bin/mysql_secure_installation7 °, Authorize remote accessible machines uplooking@uplooking01 ~] $mysql-huplooking01-uroot-puplooking ERROR 1130 (HY000): Host 'uplooking01' is not allowed to connect to this MySQL server logs in to the mysql server: mysql-uroot-puplooking executes the following statement: mysql > grant all privileges on *. * to' root'@'%' identified by 'uplooking' Mysql > flush privileges Step 2: install Hive1 °, extract the Hive file: go to $HIVE_HOME/conf/ to modify the file cp hive-env.sh.template hive-env.sh cp hive-default.xml.template hive-site.xml2 °, modify the hive-env.sh of $HIVE_HOME/bin Add the following three lines export JAVA_HOME=/opt/jdk export HADOOP_HOME=/home/uplooking/app/hadoop export HIVE_HOME=/home/uplooking/app/hive3 °, Modify $HIVE_HOME/conf/hive-site.xml javax.jdo.option.ConnectionURL jdbc:mysql://uplooking01:3306/hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword uplooking hive.querylog.location / home/ Uplooking/app/hive/tmp hive.exec.local.scratchdir / home/uplooking/app/hive/tmp hive.downloaded.resources.dir / home/uplooking/app/hive/tmp 4 °, Copy the mysql driver jar package under $HIVE_HOME/lib [uplooking@uplooking01 hive] $cp ~ / soft/mysql-connector-java-5.1.39.jar lib/5 °, initialize the hive metadata warehouse the execution directory $HIVE_HOME/bin bin] $. / schematool-initSchema-dbType mysql- userName root-passWord uplooking6 °, The access to start hive. / hiveHive has local operation mode and cluster operation mode local mode need to be turned on: set hive.exec.mode.local.auto=true defaults to cluster mode local mode to debug sql Hive.exec.mode.local.auto false Let Hive determine whether to run in local mode automatically hive.exec.mode.local.auto.inputbytes.max 134217728 hive.exec.mode.local.auto false Let Hive determine whether to run in local mode automatically hive.exec.mode.local.auto.inputbytes.max 128m When hive.exec.mode.local.auto is true, inputbytes should less than this for local mode is often used in testing. Hive.exec.mode.local.auto.input.files.max 4 When hive.exec.mode.local.auto is true, the number of tasks should less than this for local mode. There are three kinds of access methods: CLI, WebGUI and JDBC. CLI is the most commonly used, that is, the command line mode WebGUI needs to make a war through the hive source code and deploy it to hive before it can use a web interface to access hive. JDBC is sql programming. If hive uses MR's computing engine, it runs very slowly. Cannot be used as an interactive query hive currently supports three computing engines: mr, spark, and tez. By default, mr,spark supports hive.execution.engine mrHive basic operation database operation after hive-2.0. Create database: create database mydb1 Use database: use mydb1; to display the database currently in use: set hive.cli.print.current.db=true; display database list: show databases; delete: drop database mydb1; table operation

The data types of tables are almost the same as mysql, except for string and compound data types (array, map, struct).

DDL

DDL is the database schema definition language DDL (Data Definition Language), which is used to describe the real world entities to be stored in the database. To put it bluntly, it is the statement about the table operation in the database.

The creation of the create table table name (field name field type annotation comment,..., field name field type annotation comment) comment "; eg. Create table T1 (id int, name string comment "this is T1's column name") comment "TABLE T1"; add a column: alter table T1 add columns (age int); delete a column? Delete a column of alter table T1 drop column age; replacement is not supported, the curve completes deleting a column of content alter table T1 replace columns (online string); replace all existing columns with the current column

At work, scripting is often used to write and execute hql:

Hive client: hive > source / opt/data/hive/hive-t1.hql;linux Terminal 1: / opt/hive/bin/hive-f hive-t1-1.hqllinux Terminal 2: / opt/hive/bin/hive-e "set hive.exec.mode.local.auto=true;select * from test "for the execution mode of the above linux terminal, we can also add the relevant parameter option / opt/hive/bin/hive-f hive-t1-1.hql-- hiveconf hive.exec.mode.local.auto=true / opt/hive/bin/hive-e" select * from test "--hiveconf hive.exec.mode.local.auto=true in this way, it is very convenient to compose the execution statements of hive into shell scripts to schedule the data in the relevant shell calculation hive in linux. These two-e and-f can not be interchangeable, eg. / opt/hive/bin/hive-e "use mydb1; select * from test where line like'% you%';"-- hiveconf hive.cli.print.header=true/opt/hive/bin/hive-f hive-t1-1.hql-- the way hiveconf hive.cli.print.header=true data operations import data into the hive table hive cli:hive > load data local inpath'/ opt/data/hive/hive-t1-1.txt' into table T1 Linux terminal: $hdfs dfs-put / opt/data/hive/hive-t1-1.txt / user/hive/warehouse/t1/

When we import the data, we find that the parsing of the data is not successful, because the custom data has the relevant delimiters in the rows and rows, and does not tell the current table how to parse, or the parsing method of the custom data is not consistent with the default parsing method of the hive table.

Hive table default parsing method-row and row separator default row separator\ nHow is the default column separator entered on the keyboard? when ctrl+v ctrl+a data enters the schema read mode of the database table and loads the data into the table, the validity of the data is not verified, but the validity of the data is verified only when the table is manipulated Illegal data is displayed as NULL (for example, the data type of a column is date type, if the data type of load is that column is not date type, then this column of data is illegal and will be displayed as NULL after importing hive). For example, when the hive write model loads the data into the table, it needs to verify the validity of the data and load the data into the database. It's legal data. It is suitable for transactional databases to load data. Common mysql, oracle, etc., all adopt this mode to customize the delimiter create table T2 (id int, name string, birthday date, online boolean) row format delimited-> enable the identity fields terminated by'\ t 'using the custom delimiter-> define lines terminated by'\ n 'for each column delimiter. Of course, the definition of each line delimiter can be omitted and is consistent with linux by default. At the same time, note that these two orders cannot reverse the compound data type array of Hive table-> arraycreate table t3_arr (id int, name string, hobby array) row format delimitedfields terminated by'\ t'in java. The default partition of array is\ 002. How to type out ctrl+v ctrl+b in shell does not meet the requirements by default. You need to customize the delimiter create table t3_arr_1 (id int, name string, hobby array) row format delimitedfields terminated by'\ t'collection items terminated by',' When importing the data, the format of the data should be as follows: 1 the citation of the fragrant leaf IT,Computerarray, using Array name [index], the index starts from 0 map---- > the map in java everyone has a learning (Chinese, name string, score map comment "this is score") score create table t4_map (id int, name string, score map comment "this is score") row format delimited fields terminated by'\ t'collection items terminated by', 'map keys terminated by' =' According to the above definition The imported data should be in the following format: 1 the delimiter between the default key and value in Xiangpiaoye Chinese=102,Math=121,English=124map:\ 003. Call format of ctrl+v ctrl+cmap specific value in shell, column name ["attribute"], such as objectid name address (province:string, city:string, zip:int) 1 Xiao Chen bj,chaoyang,1000022 Lao Wang hb,shijiazhuang,0522603 Xiaohe hn,huaiyang in score ["chinese"] struct---- > java 4660004 hlj,harbin,10000create table t5_struct pony (id int, name string, address struct) row format delimited fields terminated by'\ t'collection items terminated by',' Format of the call: column name. Attributes, such as the address.province compound data type case

There is an employee table:

Id intname stringsubordinate arraysalary floattax maphome_info struct

Create a table:

Create table t7_emp (id int, name string, subordinate array, salary float, tax map, home_info struct)

Inquire about the after-tax wages of employees, and inquire about the people whose hometown is Hebei:

Select id, name, salary * (1-tax ["gs"]-tax ["gjj"]-tax ["sb"]) sh_salary from t7_emp where home_info.province = "Hebei"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.