How to install sqoop on Linux system 07/01 Update SLTechnology News&Howtos

How to install sqoop on Linux system

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

The content of this article mainly focuses on how to install sqoop in Linux system. The content of the article is clear and clear. It is very suitable for beginners to learn and is worth reading. Interested friends can follow the editor to read together. I hope you can get something through this article!

Introduction to sqoop: as the name of Sqoop indicates: Sqoop is a tool for transferring data from a relational database and Hadoop to each other. You can import data from a relational database (such as Mysql, Oracle) into Hadoop (such as HDFS, Hive, Hbase), or data from Hadoop (such as HDFS, Hive, Hbase) into relational databases (such as Mysql, Oracle). As shown in the following figure: 2. Sqoop architecture Sqoop architecture: as shown in the figure above: after receiving the shell command or Java api command from the client, the Sqoop tool converts the command into the corresponding MapReduce task through the task translator (Task Translator) in Sqoop, and then transfers the data in the relational database and Hadoop to complete the copy of the data.

Sqoop-1.4.7 installation and configuration process (1) Sqoop environment premise: Hadoop

Relational database (MySQL/Oracle)

HBase

Hive

ZooKeeper

(2) extract the sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz installation package to the target directory: tar-zxvf .tar.gz-C target directory

(3) for subsequent convenience, rename the Sqoop folder: mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7

(4) modify the configuration file: enter the sqoop-1.4.7/conf path and rename the configuration file:

Mv sqoop-env-template.sh sqoop-env.sh

Modify the sqoop-env.sh information: (if the environment variable is configured, you can use the

Echo $XXXXX_HOME query installation location)

Vi sqoop-env.sh

# Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=Hadoop installation path # Set path to where hadoop-*-core.jar is available # export HADOOP_MAPRED_HOME=Hadoop installation path # set the path to where bin/hbase is available # export HBASE_HOME=HBase installation path # Set the path to where bin/hive is available # export HIVE_HOME=Hive installation path # Set the path for where zookeper config dir is # export ZOOCFGDIR=ZooKeeper configuration folder path copy Code (5) off Link Hive:cp / XXX/hive/conf/hive-site.xml / XXX/sqoop-1.4.7/conf/

(5) configure environment variables: modify the configuration file:

Vi / etc/profile

Add the following:

Export SQOOP_HOME=sqoop installation path

Export PATH=$PATH:$SQOOP_HOME/bin

Declare environment variables:

Source / etc/profile

(6) start to view the version number sqoop version

(7) add drivers: import MySQL drivers into sqoop/lib

Import Oracle driver to sqoop/lib

3. Sqoop operation (1) Common parameters: parameter view: Sqoop official website-> documentation-> Sqoop User Guide

Import imports data into the cluster

Export exports data from a cluster

Create-hive-table creates hive table

Import-all-tables specifies all tables in the relational database to the cluster

List-databases lists all databases

List-tables lists all database tables

Merge merges data

Codegen acquires a table data to generate JavaBean and package it with Jar

(2) Import operation of import--Sqoop: function: MySQL/Oracle-> HDFS/Hive

Modify MySQL access:

Update user set host='%' where host='localhost'

Delete from user where Host='127.0.0.1'

Delete from user where Host='bigdata01'

Delete from user where Host='::1'

Flush privileges

Use mysql

Select User, Host, Password from user

View permissions:

Modify permissions to be accessible to all users:

Operation command:

Preparatory work:

Import commands:

Enable the hive service

Create the corresponding table to import in hive

FAILED: SemanticException [Error 10072]: Database does not exist: XXXXXXXX

Reason for error: Sqoop is not associated with Hive

Solution:

Cp / XXX/hive/conf/hive-site.xml / XXX/sqoop-1.4.7/conf/

ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://bigdata01:9000/XXXXXXXXXX already exists

Error reason: there is a path with the same name in hdfs

Solution:

Specify a new path or delete the original file in hdfs

ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf

Reason for error: missing configuration of hive environment variables

Solution:-- add Hive dependency to Hadoop environment

Source / etc/profile

Export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*

Vi / etc/profile

Modify the configuration file:

Add the following:

Declare environment variables:

Sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password password-- table table name-num-mappers 1-- hive-import-- fields-terminated-by "\ t"-- hive-overwrite-- hive-table hive database name. Table name

You can see the data information passed in the specified table in Hive.

Possible error 1:

Possible error 2:

Possible error 3:

Export commands:

Check the upload result of hdfs locally by Linux:

Use query to filter data:

Directly filter fields:

Sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name # connection MySQL-- username root # username-- password XXXXXX # password-- table table name # uploaded to HDFS-- target-dir / YYYYYYY # HDFS destination folder-- num-mappers 1 # specify map run-- fields-terminated-by "\ t" # specify delimiter

Hdfs dfs-cat / XXXXXXX/part-m-00000

Sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password XXXXXX-- table table name-- target-dir / YYYYYYY-- num-mappers 1-- fields-terminated-by "\ t"-- query 'select * from table name where strip and $CONDITIONS' # $CONDITIONS to index mapper

Sqoop import-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password XXXXXX-- table table name-- target-dir / YYYYYYY-- num-mappers 1-- columns field name

Upload the local mysql table to hdfs:

Upload the local mysql table to hive:

(3) Export operation of emport--Sqoop: function: HDFS/Hive-> MySQL/Oracle

Operation command:

Export commands:

Sqoop emport-- connect jdbc:mysql://bigdata01:3306/ database name # connection MySQL-- username root # username-- password XXXXXX # password-- table table name # destination mysql table-- export-dir / user/hive/warehouse/YYYYYYY # hive folder-- num-mappers 1 # specify map run-- input-fields-terminated-by "\ t" # specify delimiter

The hive table exports to the local mysql:

(4) list all databases: operation commands:

Sqoop list-databases-- connect jdbc:mysql://bigdata01:3306/-- username root-- password password

(5) obtain database table data and generate JavaBean: operation command:

Sqoop codegen-- connect jdbc:mysql://bigdata01:3306/ database name-- username root-- password password-- table table name-- bindir Linux local path # specify Jar package packaging path-- class-name class name # specify Java class name-- fields-terminated-by "\ t"

(6) merge data under different directories in hdfs: operation command:

Sqoop merge--new-data hdfs new table path-- onto hdfs old table path-- target-dir / YYYYYYY # merged hdfs path-- jar-file = # Linux local Jar package path-- class-name XXXXX # Jar package class-- merge-key id # merge basis

Note: the merge operation is an operation in which the new table replaces the old table. If there is an id conflict, the new table data replaces the old table data, and if there is no conflict, the new table data is added to the old table data.

Thank you for your reading. I believe you have some understanding of "how to install sqoop in Linux system". Go to practice quickly. If you want to know more about it, you can follow the website! The editor will continue to bring you better articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.