In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Data migration tool Sqoop how to use, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.
Sqoop is a tool used to transfer data between Hadoop and relational database. It is a bridge between relational database and Hadoop storage system in Hadoop environment. It supports the mutual import of multiple relational data sources and Hive, HDFS, Hbase. Support full table import, but also support incremental data import mechanism, Sqoop working mechanism uses MapReduce distributed batch processing to speed up data transmission speed and fault tolerance.
Choose Sqoop reason:
1. Resources can be used efficiently, and the concurrency of tasks can be controlled by adjusting the number of tasks.
2. Data type mapping and conversion can be completed automatically. The imported data is typed, which can be automatically converted to Hadoop according to the types in the database, and of course their mapping can be customized.
3. It supports a variety of databases, such as eg:mysql, Oracle and PostgreSQL.
How Sqoop works:
Sqoop takes advantage of MapReduce parallelism to speed up data transmission in batches, thus providing concurrency and fault tolerance. Sqoop mainly connects relational databases through jdbc. In theory, only relational databases support JDBC can use Sqoop to communicate with HDFS.
1. The principle of Sqoop importing HDFS from a relational database: the user first enters a Sqoop import command, and Sqoop will obtain metadata information from the relational database, including database information, fields and field types of the table, etc. After obtaining the information, the import command will be converted into a Map-based MapReduce task. Many map tasks are opened, each map task reads part of the data, and multiple map tasks copy the data to the HDFS distributed file system in parallel.
2. The principle of Sqoop export function: when the user enters the export command, Sqoop will obtain the structure information of the relational data table, establish the mapping relationship between the table fields of the system database related to the Hadoop field, convert the command to the MapReduce function based on Map, produce many Map tasks, read the data file from HDFS in parallel, and copy the data to the database.
Sqoop version and architecture
There are two versions of Sqoop, 1.4.x and 1.99.x, usually referred to as sqoop1 and sqoop2
The Sqoop1 architect uses direct submission from the Sqoop client, which is accessed through the CLI console, specifying the database name and password in a command or script.
Sqoop2 architecture introduces Sqoop Server, centralized management Connector, provides a variety of access methods, such as CLI, Web UI, REST API, while Sqoop2 access through CLI will have an interactive interface, so that the input password information can not be seen.
Sqoop can be used not only for data conversion between relational databases and HDFS file systems, but also for transferring data from relational databases to Hive or Hbase, while for data transfer from Hive or Hbase to relational databases, you can extract data from Hive or Hbase to HDFS, and then use Sqoop to import the output of the previous step into the relational database.
There are two modes of incremental import using Sqoop: append and lastmodified. Lastmodified mode is different from apend in that you can specify a timestamp field to import in chronological order. This model can specify how the incremental data is in HDFS, for example, the final incremental result is a file.
The main parameters required in the application:
-check-column: specifies the dependent field of incremental import, usually a self-increasing primary key id or timestamp
-incremental: specify the import mode (append or lastmodified)
-last-value: specifies the maximum value of the last import, which is also the starting value of this time.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.