Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Organize records related to sqoop

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Production background: in the import from mysql to hive, encountered the following problems: export from hive to mysql, encountered the following problems: sqoop shortcomings: 1 based on the command line mode of operation, error-prone and unsafe. 2 data transmission and data format are tightly coupled, which makes connector unable to support all data formats 3 user names and passwords are exposed 4 sqoop installation requires root permissions Sqoop advantages: 1 efficient and controllable use of resources, task parallelism, timeout. 2 data type mapping and conversion, can be carried out automatically, users can also be customized. 3 support a variety of mainstream databases, MySQL,Oracle,SQL Server,DB2 and so on. Sqoop principle: Sqoop's inport principle: Sqoop's export principle: verify all kinds of sqoop errors: 1 mysql field is too short; 2 hive empty field conversion 3 delimiter error 4 mysql network is not in the cluster network 5 mysql stops service 6 mysql utf8 encoding is only 3 bytes, probably because some unicode characters are converted to utf8 into 4 bytes, mysql needs to support utf8mb47 Sqoop mode information 8 to modify the generated Java class, repackage. Sqoop command line description

Production background:

When importing from mysql to hive, you encountered the following problems:

1) the source mysql and the cluster machine are not in the same network segment, resulting in the execution of the import command and the failure of network connection.

2) some characters are imported into hive and the error is terminated.

2.1 the JDBC-connector version used by sqoop is too low (replacement version).

Exporting from hive to mysql encountered the following problems:

1) some characters are inserted into mysql and the error is terminated.

1.1 it is possible that some characters are not supported due to the encoding limitations of mysql itself, such as uft8 and utf8mb4

1.2 the JDBC-connector version used by sqoop is too low (replacement version).

Disadvantages of sqoop:

1 based on the operation mode of the command line, it is error-prone and unsafe.

2 data transmission and data formats are tightly coupled, which makes it impossible for connector to support all data formats

3. The user name and password are exposed.

4 root permission is required for sqoop installation

Advantages of Sqoop:

1 efficient and controllable use of resources, task parallelism, timeout.

Data type mapping and conversion can be carried out automatically and can also be customized by users.

3 support a variety of mainstream databases, MySQL,Oracle,SQL Server,DB2 and so on.

Sqoop principle:

Sqoop's inport principle:

When Sqoop is in import, you need to set the split-by parameter. Sqoop splits according to different split-by parameter values, and then allocates the split regions to different map. Each map processes the value of a row and a row obtained in the database and writes it to the HDFS. At the same time, split-by has different segmentation methods according to different parameter types, such as the simple int type, Sqoop will take the maximum and minimum split-by field values, and then determine how many regions are divided according to the passed num-mappers. For example, the max (split-by) and min (split-by) obtained by select max (split_by) and min (split-by) from are 1000 and 1, respectively, while if num-mappers is 2, it will be divided into two regions (1500) and (501100). At the same time, it will be divided into two sql to two map for import operation, which is select XXX from table where split-by > = 1 and split-by=501 and split-by respectively.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report