Installation and deployment of sqoop tools 07/19 Update SLTechnology News&Howtos

Installation and deployment of sqoop tools

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Sqoop Introduction

Sqoop is a tool used to transfer data from Hadoop to relational databases. It can import data from a relational database (e.g. MySQL ,Oracle ,Postgres, etc.) into HDFS of Hadoop, and also import HDFS data into relational databases.

It also provides connectors for some NoSQL databases. Sqoop, similar to other ETL tools, uses a metadata model to determine data types and ensure type-safe data processing when data is transferred from a data source to Hadoop. Sqoop is designed for large data bulk transfers, with the ability to split data sets and create Hadoop tasks to process each chunk.

1.sqoop Download

https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/

2. sqoop upload to server and extract to corresponding directory

3. Modify sqoop configuration file

4. modify the configuration file

5. mysql database driver required to copy sqoop

cp /home/nflow/servers/hive/lib/mysql-connector-java-5.1.26-bin.jar /home/nflow/servers/sqoop-1.4.7/lib/

6. Start the sqoop test (you can see that the database is connected)

./ sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root --password 123456

7. sqoop import data

SQL taken from blogger

drop database if exists userdb;create database userdb;use userdb;drop table if exists emp;drop table if exists emp_add;drop table if exists emp_conn; CREATE TABLE emp (id INT NOT NULL,name VARCHAR(100),deg VARCHAR(100),salary BIGINT,dept VARCHAR(50)); CREATE TABLE emp_add (id INT NOT NULL,hno VARCHAR(50),street VARCHAR(50),city VARCHAR(50)); CREATE TABLE emp_conn (id INT NOT NULL,phno VARCHAR(50),email VARCHAR(50)); insert into emp values (1201,'gopal','manager','50000','TP');insert into emp values (1202,'manisha','Proof reader','50000','TP');insert into emp values (1203,'khalil','php dev','30000','AC');insert into emp values (1204,'prasanth','php dev','30000','AC');insert into emp values (1205,'kranthi','admin','20000','TP'); insert into emp_add values (1201,'288A','vgiri','jublee');insert into emp_add values (1202,'108I','aoc','sec-bad');insert into emp_add values (1203,'144Z','pgutta','hyd');insert into emp_add values (1204,'78B','old city','sec-bad');insert into emp_add values (1205,'720X','hitec','sec-bad'); insert into emp_conn values (1201,'2356742','gopal@tp.com');insert into emp_conn values (1202,'1661663','manisha@tp.com');insert into emp_conn values (1203,'8887776','khalil@ac.com');insert into emp_conn values (1204,'9988774',' prasanth@ac.com');insert into emp_conn values(1205,'1231231',' kranthi@tp.com');--------------###Thank you for this blogger Copyright is the copyright of others I just try the following copyright statement: This article is the original article of CSDN blogger "Record every note", follow CC 4.0 BY-SA copyright agreement, reprint please attach the original source link and this statement. Original link: blog.csdn.net/yumingzhu1/article/details/80678525

From MySQL to HDFS

#/bin/bash./ bin/sqoop import \--connect jdbc:mysql://192.168.249.10:3306/userdb \--username root \--password 123456 \--table emp \--m 1[nflow@hadoop-master1 sqoop-1.4.7]$ pwd/home/nflow/servers/sqoop-1.4.7[nflow@hadoop-master1 sqoop-1.4.7]$default export location is/usr/user/table name The database cannot use localhost or 127.0.0.1, otherwise an error will be reported. IP address must be used.

HDFS directory cannot be duplicated next time imported

Modify the script as follows so that a new script can be generated each time

mysql data import into hive

database data

./ sqoop import \--connect jdbc:mysql://192.168.249.10:3306/userdb \ ##userdb--username admin \ #database admin user--password 123456 \ #database admin user password--table emp_add \ #database admin emp_add table--delete-target-dir \ #Every time you delete--num-mappers 1 \ ##mapreduce processes--hive-import \ ##Specify hive--hive-database default \ ##hive-table empadd \ ##table name of default data in hive--fields-terminated-by '\t' ##

sqoop imports the same table again. The test result is that if there is new data in the database, hive will also take it, resulting in duplication. As shown in the following figure, how to avoid this problem? sqoop incremental synchronization

######sqoop incremental synchronization to hive

ID greater than 1207 will synchronize without duplication

./ sqoop import \--connect jdbc:mysql://192.168.249.10:3306/userdb \--username admin \--password 123456 \--table emp_add \--num-mappers 1 \--hive-import \--hive-database default \--hive-table empadd \--fields-terminated-by '\t' \--incremental append \--check-column id \--last-value 1207

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.