How to install and use sqoop under Linux 10/25 Update SLTechnology News&Howtos

How to install and use sqoop under Linux

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to install and use sqoop under Linux", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how to install and use sqoop under Linux" this article.

Sqoop is an open source tool, mainly used in Hadoop (Hive) and traditional databases (mysql, postgresql …) Data transfer between can import the data from a relational database (such as MySQL, Oracle, Postgres, etc.) into the HDFS of Hadoop, and also import the data from HDFS into the relational database.

Sqoop-1.4.7 installation and configuration process (1) Sqoop environment premise: Hadoop Relational Database (MySQL/Oracle) HBaseHiveZooKeeper (2) extract the sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz installation package to the target directory: tar-zxvf .tar.gz-C target directory (3) for subsequent convenience Rename the Sqoop folder: mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7 (4) modify the configuration file:

Enter the sqoop-1.4.7/conf path and rename the configuration file:

Mv sqoop-env-template.sh sqoop-env.sh

Modify the sqoop-env.sh information: (if the environment variable is configured, you can use the

Echo $XXXXX_HOME

Query installation location)

Vi sqoop-env.sh

# Set path to where bin/hadoop is availableexport HADOOP_COMMON_HOME=Hadoop installation path # Set path to where hadoop-*-core.jar is available#export HADOOP_MAPRED_HOME=Hadoop installation path # set the path to where bin/hbase is available#export HBASE_HOME=HBase installation path # Set the path to where bin/hive is available#export HIVE_HOME=Hive installation path # Set the path for where zookeper config dir is#export ZOOCFGDIR=ZooKeeper configuration folder path copy Code (5) Associate Hive:cp / XXX / hive/conf/hive-site.xml / XXX/sqoop-1.4.7/conf/ (5) configure environment variables:

Modify the configuration file:

Vi / etc/profile

Add the following:

Export SQOOP_HOME=sqoop installation path export PATH=$PATH:$SQOOP_HOME/bin

Declare environment variables:

Source / etc/profile (6) start to view version number sqoop version (7) add driver: import MySQL driver to sqoop/lib import Oracle driver to sqoop/lib 3. Sqoop operation (1) Common parameters:

View parameters: Sqoop official website-> documentation-> Sqoop User Guide

Import imports data into the cluster

Export exports data from a cluster

Create-hive-table creates hive table

Import-all-tables specifies all tables in the relational database to the cluster

List-databases lists all databases

List-tables lists all database tables

Merge merges data

Codegen acquires a table data to generate JavaBean and package it with Jar

(2) Import operation of import--Sqoop:

Function: MySQL/Oracle-> HDFS/Hive

Modify MySQL access:

Update user set host='%' where host='localhost';delete from user where Host='127.0.0.1';delete from user where Host='bigdata01';delete from user where Host='::1';flush privileges;use mysql;select User, Host, Password from user; view permissions: modify the permissions to be accessible to all users:

Operation command:

Preparation: import command: open hive service to create the corresponding table to import in hive FAILED: SemanticException [Error 10072]: Database does not exist: XXXXXXXX error reason: Sqoop is not associated with Hive solution: cp / XXX/hive/conf/hive-site.xml / XXX/sqoop-1.4.7/conf/ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://bigdata01:9000/XXXXXXXXXX already exists error reason: There is a solution to the path of the same name in hdfs: specify a new path or delete the original file in hdfs ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf error reason: hive environment variable configuration missing solution:-- Hadoop environment adds Hive depending on source / etc/profileexport HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*vi / etc/profile to modify configuration file: add to The following content: declare environment variables: sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password password-- table table name-- num-mappers 1-- hive-import-- fields-terminated-by "\ t"-- hive-overwrite-- hive-table hive database name. Table name you can see in Hive that data information has been passed into the specified table. Possible error 1: possible error 2: possible error 3: export command: Linux local view hdfs upload results: use query to filter data: direct filter field: sqoop import-connect jdbc:mysql://bigdata01:3306/ database name # connection MySQL-username root # user name-password XXXXXX # password-table table Table uploaded to HDFS by name #-target-dir / YYYYYYY # HDFS destination folder-num-mappers 1 # specify map run-- fields-terminated-by "\ t" # specify delimiter hdfs dfs-cat / XXXXXXX/part-m-00000sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-username root-- password XXXXXX-- table table name-target-dir / YYYYYYY-- num-mappers 1-- fields -terminated-by "\ t"-- query 'select * from table name where condition and $CONDITIONS' # $CONDITIONS index mapper sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password XXXXXX-- table table name-- target-dir / YYYYYYY-- num-mappers 1-- columns field name local mysql table upload to hdfs: local mysql table upload to hive:

(3) Export operation of emport--Sqoop:

Function: HDFS/Hive-> MySQL/Oracle

Operation command:

Export command: sqoop emport-- connect jdbc:mysql://bigdata01:3306/ database name # connection MySQL-- username root # user name-- password XXXXXX # password-- table table name # destination mysql table-- export-dir / user/hive/warehouse/YYYYYYY # hive folder-- num-mappers 1 # specify map run-- input-fields-terminated-by "\ t" # specify delimiter hive table export To local mysql: (4) list all databases:

Operation command:

Sqoop list-databases-- connect jdbc:mysql://bigdata01:3306/-- username root-- password password (5) obtain database table data to generate JavaBean:

Operation command:

Sqoop codegen-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password password-- table table name-- bindir Linux local path # specify Jar package packaging path-- class-name class name # specify Java class name-- fields-terminated-by "\ t" (6) merge data from different directories in hdfs:

Operation command:

Sqoop merge--new-data hdfs new table path-- onto hdfs old table path-- target-dir / YYYYYYY # merged hdfs path-- jar-file = # Linux local Jar package path-- class-name XXXXX # Jar package class-- merge-key id # merge basis

* * Note: * * the merge operation is an operation in which the new table replaces the old table. If there is an id conflict, the new table data replaces the old table data. If there is no conflict, the new table data is added to the old table data.

The above is all the contents of the article "how to install and use sqoop under Linux". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.