The method of building sqoop1.4.7 environment and importing and exporting mysql data to hive 04/21 Update SLTechnology News&Howtos

The method of building sqoop1.4.7 environment and importing and exporting mysql data to hive

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the construction of sqoop1.4.7 environment and the method of importing and exporting mysql data to hive". In the daily operation, I believe that many people have doubts about the method of building sqoop1.4.7 environment and importing and exporting mysql data to hive. The editor consulted all kinds of materials and sorted out simple and useful operation methods. I hope it will be helpful for you to answer the questions about how to build sqoop1.4.7 environment and how to import and export mysql data to hive. Next, please follow the editor to study!

Sqoop documents: http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_prerequisites

Delimiters must be added when hive creates tables and imports data, otherwise an error will be reported when the data is exported

1. Download and install

[root@node1 ~] # wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

[root@node1] # tar xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz-C / opt/

[root@node1 ~] # cd / opt/

[root@node1 opt] # mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7

[root@node1 opt] # vim / etc/profile

Export SQOOP_HOME=/opt/sqoop-1.4.7

Export HADOOP_HOME=/opt/hadoop-2.8.5

Export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*

Export HCAT_HOME=/opt/sqoop-1.4.7/testdata/hcatalog

Export ACCUMULO_HOME=/opt/sqoop-1.4.7/src/java/org/apache/sqoop/accumulo

Export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$SQOOP_HOME/bin

[root@node1 opt] # source / etc/profile

[root@node1 opt] # sqoop help-help information

[root@node1 opt] # sqoop import-- help-- Parameter help

two。 Modify yarn configuration file

[root@node1 ~] # vim / opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml

Yarn.nodemanager.resource.memory-mb 2048 yarn.nodemanager.resource.cpu-vcores 2

[root@node1 ~] # scp / opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml node2:/opt/hadoop-2.8.5/etc/hadoop/-copy configuration files to each node

Yarn-site.xml 100% 1414 804.3KB/s 00:00

[root@node1 ~] # scp / opt/hive-2.3.4/conf/hive-site.xml / opt/sqoop-1.4.7/conf/-- the configuration file for hive should also be placed under sqoop, because sqoop calls hive

[root@node1 ~] # stop-all.sh

[root@node1 ~] # start-all.sh

3. Import mysql data into HDFS

Parameter explanation:

-- append appends data

-- as-textfile is imported to form a text file

-- which fields are imported by columns?

-- delete-target-dir-- delete and then import if the imported directory exists

-- fetch-size-- how much data do you read each time?

-m-- how many tasks

-e-- query statement (select)

-- table-- Table name

-- target-dir dir-- specify the HDFS directory

-- warehouse-dir dir-- the imported table will be under this directory (table name and directory name)

-where where clause-where condition

-z-- data compression

-- direct-- bypass the mysql database and import directly (worry parameters)

[root@node1] # sqoop import-- connect jdbc:mysql://172.16.9.100/hive-- username hive-- password system-- table TBL_PRIVS-- target-dir / user/sqoop-- direct-M1-- fields-terminated-by'\ t'

[root@node1 ~] # hdfs dfs-ls / user/sqoop-View the imported directory

Found 2 items

-rw-r--r-- 3 root supergroup 0 2019-03-19 12:43 / user/sqoop/_SUCCESS

-rw-r--r-- 3 root supergroup 176 2019-03-19 12:43 / user/sqoop/part-m-00000

[root@node1 ~] # hdfs dfs-cat / user/sqoop/part-m-00000-View imported data

6,1552878877,1,root,USER,root,USER,INSERT,6

7,1552878877,1,root,USER,root,USER,SELECT,6

8,1552878877,1,root,USER,root,USER,UPDATE,6

9,1552878877,1,root,USER,root,USER,DELETE,6

[root@node1 ~] #

4. Import mysql data into hive

Detailed description of parameters:

-- hive-home dir specifies hive directory

-- hive-import is imported into hive

-- hive-database imports the specified library

-- hive-overwrite overwrites to hive

-- create-hive-table creates tables in hive

-- hive-table table-name specifies the hive table name

-- hive-partition-value v hive partition

[root@node1] # sqoop import-- connect jdbc:mysql://172.16.9.100/hive-- username hive--password system-- table TBL_PRIVS-- target-dir / user/tmp-- hive-import-- hive-table tt-M1-- create-hive-table-- delete-target-dir-- direct-- fields-terminated-by'\ t'

[root@node1 conf] # hive

Logging initialized using configuration in jar Velocity fileVlue Universe OptUnique optUniverse 2.3.4 Universe HiveMutual CommonMuray 2.3.4.JarVouche Log 4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.x releases.

Hive > show tables

Time taken: 11.464 seconds, Fetched: 1 row (s)

Hive > select * from tt

6 1552878877 1 root USER root USER INSERT 6

7 1552878877 1 root USER root USER SELECT 6

8 1552878877 1 root USER root USER UPDATE 6

9 1552878877 1 root USER root USER DELETE 6

Time taken: 3.978 seconds, Fetched: 4 row (s)

Hive >

5. Import mysql data into a library specified by hive

[root@node1] # sqoop import-- connect jdbc:mysql://172.16.9.100/hive-- username hive--password system-- table TABLE_PARAMS-- hive-import-- hive-table tt1-M1-- create-hive-table-- hive-database tong-- direct-- fields-terminated-by'\ t'

[root@node1 conf] # hive

Logging initialized using configuration in jar Velocity fileVlue Universe OptUnique optUniverse 2.3.4 Universe HiveMutual CommonMuray 2.3.4.JarVouche Log 4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. Spark, tez) or using Hive 1.x releases.

Hive > use tong

Time taken: 14.34 seconds

Hive > show tables

Tt1

Time taken: 0.374 seconds, Fetched: 1 row (s)

Hive > select * from tt1

6 numFiles 1

6 numRows 0

6 rawDataSize 0

6 totalSize 8

6 transient_lastDdlTime 1552878901

11 comment Imported by sqoop on 2019-03-19 15:36:21

11 numFiles 1

11 numRows 0

11 rawDataSize 0

11 totalSize 176

11 transient_lastDdlTime 1552981011

16 comment Imported by sqoop on 2019-03-19 16:04:22

16 numFiles 1

16 numRows 0

16 rawDataSize 0

16 totalSize 239

16 transient_lastDdlTime 1552982688

Time taken: 3.004 seconds, Fetched: 17 row (s)

Hive >

6. Import HDFS data into mysql

[root@node1 ~] # hdfs dfs-cat / user/tmp/part-m-00000

1 2

3 4

5 6

[root@node1] # sqoop export--connect jdbc:mysql://172.16.9.100/tong-- username tong-- password system-- export-dir / user/tmp/part-m-00000-- table T1-- direct-- fields-terminated-by'\ t

[root@node1] # mysql-u root-psystem

Welcome to the MariaDB monitor. Commands end with; or\ g.

Your MySQL connection id is 1006876

Server version: 5.6.35 MySQL Community Server (GPL)

Type 'help;' or'\ h' for help. Type'\ c'to clear the current input statement.

MySQL [(none)] > use tong

MySQL [tong] > select * from T1

+-+ +

| | a | b | |

+-+ +

| | 3 | 4 |

| | 5 | 6 |

| | 1 | 2 |

+-+ +

3 rows in set (0.00 sec)

MySQL [tong] >

Error message: (the card does not move in Running job and does not execute down)

19-03-19 11:20:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552965562217_0001

19-03-19 11:20:10 INFO impl.YarnClientImpl: Submitted application application_1552965562217_0001

11:20:10 on 19-03-19 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552965562217_0001/

19-03-19 11:20:10 INFO mapreduce.Job: Running job: job_1552965562217_0001

Solution:

[root@node1 ~] # vim / opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml-limit memory, cpu resources, synchronize configuration files to other node, restart hadoop service

Yarn.nodemanager.resource.memory-mb 2048 yarn.nodemanager.resource.cpu-vcores 2

[root@node1 ~] #

Error message: (mysql imported into hive)

19-03-19 14:34:25 INFO hive.HiveImport: Loading uploaded data into Hive

19-03-19 14:34:25 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.

19-03-19 14:34:25 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf

At org.apache.sqoop.hive.HiveConfig.getHiveConf (HiveConfig.java:50)

At org.apache.sqoop.hive.HiveImport.getHiveArgs (HiveImport.java:392)

At org.apache.sqoop.hive.HiveImport.executeExternalHiveScript (HiveImport.java:379)

Solution:

[root@node1 ~] # vim / etc/profile-add lib variable

Export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*

[root@node1 ~] # source / etc/profile

Error message: (due to jackson package conflict between sqoop and hive)

19-03-19 15:32:11 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager

15:32:11 on 19-03-19 INFO ql.Driver: Executing command (queryId=root_20190319153153_63feddd9-a2c8-4217-97d4-23dd9840a54b): CREATE TABLE `tt` (`TBL_GRANT_ ID`BIGINT, `CREATE_ TIME` INT)

`GRANTOR` INT, `GRANTOR` STRING, `GRANTOR_ TYPE` STRING, `PRINCIPAL_ NAME` STRING, `PRINCIPAL_ TYPE` STRING, `TBL_ PRIV` STRING, `TBL_ ID` BIGINT) COMMENT 'Imported by sqoop on 2019-03-19

15 STORED AS TEXTFILE 31 STORED AS TEXTFILE 49 'ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ 001' LINES TERMINATED BY'\ 012'

19-03-19 15:32:11 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode

15:32:12 on 19-03-19 ERROR exec.DDLTask: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor (Ljava/lang/Class;) Lcom/fasterxml/jackson/databind/ObjectReader

At org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate. (StatsSetupConst.java:165)

At org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc (StatsSetupConst.java:297)

At org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsState (StatsSetupConst.java:230)

At org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsStateForCreateTable (StatsSetupConst.java:292)

Solution:

[root@node1 ~] # mv / opt/sqoop-1.4.7/lib/jackson-* / home/

[root@node1] # cp-a / opt/hive-2.3.4/lib/jackson-* / opt/sqoop-1.4.7/lib/

Error message:

19-03-19 18:38:40 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore

19-03-19 18:38:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore

19-03-19 18:38:40 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=1

At org.apache.sqoop.hive.HiveImport.executeScript (HiveImport.java:355)

At org.apache.sqoop.hive.HiveImport.importTable (HiveImport.java:241)

At org.apache.sqoop.tool.ImportTool.importTable (ImportTool.java:537)

At org.apache.sqoop.tool.ImportTool.run (ImportTool.java:628)

Solution:

Create table T1 (an int,b int) row format delimited fields terminated by'\ tmarker;-- delimiters must be added when creating tables

Sqoop import-connect jdbc:mysql://172.16.9.100/hive-username hive-password system-table TBL_PRIVS-target-dir / user/sqoop-direct-M1-fields-terminated-by'\ t'

At this point, the study on "the method of building sqoop1.4.7 environment and importing and exporting mysql data to hive" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.