What is the principle analysis of Sqoop? 07/02 Update SLTechnology News&Howtos

What is the principle analysis of Sqoop?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly analyzes the relevant knowledge of what is the principle analysis of Sqoop, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn more about "what is the principle analysis of Sqoop".

A brief introduction

Sqoop is a tool used to transfer data between Hadoop and relational database. It can import data from a relational database (such as MySQL, Oracle, Postgres, etc.) into HDFS of Hadoop, and also import data from HDFS into relational database.

Two characteristics

One of the highlights of Sqoop is the ability to import data from a relational database to HDFS through hadoop's mapreduce.

Three Sqoop commands

Sqoop has about 13 commands, and several common parameters (all of which support 13 commands), which are listed here first.

Then list the various common parameters for Sqoop, and then list their own parameters for the above 13 commands. The common parameters of Sqoop are divided into Common arguments,Incremental import arguments,Output line formatting arguments,Input parsing arguments,Hive arguments,HBase arguments,Generic Hadoop command-line arguments, which are described below:

1.Common arguments

General parameters, mainly for relational database links

Example of four sqoop commands

1) list all databases in the mysql database

Sqoop list-databases-connect jdbc:mysql://localhost:3306/-username root-password 123456

2) Connect to mysql and list the tables in the test database

Sqoop list-tables-connect jdbc:mysql://localhost:3306/test-username root-password 123456

The test in the command is the test database name in the mysql database username password is the user password of the mysql database

3) copy the table structure of relational data to hive, only copy the structure of the table, and the contents of the table are not copied.

Sqoop create-hive-table-connect jdbc:mysql://localhost:3306/test

-table sqoop_test-username root-password 123456-hive-table

Test

Where-table sqoop_test is the table in database test in mysql-hive-table

Test is the name of the newly created table in hive

4) Import files from relational database into hive

Sqoop import-connect jdbc:mysql://localhost:3306/zxtest-username

Root-password 123456-table sqoop_test-hive-import-hive-table

S_test-m 1

5) Import the table data from hive into mysql, and before importing, the tables in mysql

The hive_test must have been created in advance.

Sqoop export-connect jdbc:mysql://localhost:3306/zxtest-username

Root-password root-table hive_test-export-dir

/ user/hive/warehouse/new_test_partition/dt=2012-03-05

6) Export the table data from the database to the file on HDFS

. / sqoop import-connect

Jdbc:mysql://10.28.168.109:3306/compression-username=hadoop

-password=123456-table HADOOP_USER_INFO-M1-target-dir

/ user/test

7) Import table data incrementally from database to hdfs

. / sqoop import-connect jdbc:mysql://10.28.168.109:3306/compression

-username=hadoop-password=123456-table HADOOP_USER_INFO-M1

-target-dir / user/test-check-column id-incremental append

-last-value 3

Five Sqoop principle (taking import as an example)

When Sqoop is in import, you need to set the split-by parameter. Sqoop splits according to different split-by parameter values, and then assigns the sliced regions to different map. Each map processes the value of a row and a row obtained in the database and writes it to the HDFS. At the same time, split-by has different segmentation methods according to different parameter types, such as the simple int type, Sqoop will take the maximum and minimum split-by field values, and then determine how many domains are divided according to the passed num-mappers. For example, the max (split-by) and min (split-by) obtained by select max (split_by) and min (split-by) from are 1000 and 1, respectively, while if num-mappers is 2, it will be divided into two regions (1500) and (501100). At the same time, it will be divided into two sql to two map for import operation, which is select XXX from table where split-by > = 1 and split-by=501 and split-by respectively.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.