Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Exception Analysis of using hexadecimal Separator in 0026-Hive

2025-03-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

1. Problem description

Extract Mysql table data to hive table through sqoop, and find that all columns of hive table are displayed as null.

The delimiter of the Hive table is "\ u001B", and the delimiter specified by sqoop is also "\ u001B"

Use the command show create table test_hive_delimiter to view the table creation statement as follows:

0: jdbc:hive2://localhost:10000/ > show create table test_hive_delimiter ... INFO: OK+---+--+ | createtab_stmt | +-- -+-+ | CREATE EXTERNAL TABLE `test_hive_ delimiter` (| | `id` int) | `name` string | | `address` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES (| |' field.delim'='\ u0015''| | 'serialization.format'='\ u0015') | | STORED AS INPUTFORMAT | |' org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://ip-172-31-6-148.fayson.com' 8020 Universe. | TBLPROPERTIES (| | 'COLUMN_STATS_ACCURATE'='false'') | | | 'numFiles'='0', | |' numRows'='-1', | | 'rawDataSize'='-1', | |' totalSize'='0' | | | 'transient_lastDdlTime'='1504705887') | +-+-- + 22 rows selected (0.084 seconds) 0: jdbc:hive2://localhost:10000/ > |

It is found that the delimiter in the original table statement of Hive is "\ u001B" and the delimiter queried by the show create table test_hive_delimiter command is "\ u0015". The delimiter has been modified.

two。 Problem recurrence

1. Create the Hive table test_hive_delimiter with the "\ u001B" delimiter

Create external table test_hive_delimiter (id int, name string, address string) row format delimited fields terminated by'\ u001B'stored as textfile location'/ fayson/test_hive_delimiter'

two。 Use sqoop to extract test data from MySQL to hive table (test_hive_delimiter)

# sqoop import-- connect jdbc:mysql://ip-172-31-6-148.fayson.com:3306/fayson-username root-password 123456-- table test-M1-- hive-import-- fields-terminated-by "\ 0x001B"-- target-dir / fayson/test_hive_delimiter-- hive-table test_hive_delimiter

Successful data extraction:

# hadoop fs-ls / fayson/test_hive_delimiterFound 2 items-rw-r--r-- 3 fayson supergroup 0 2017-09-06 13:46 / fayson/test_hive_delimiter/_SUCCESS-rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 / fayson/test_hive_delimiter/part-m-00000 [root@ip-172-31-6-148C] # hadoop fs -ls / fayson/test_hive_delimiter/part-m-00000-rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 / fayson/test_hive_delimiter/part-m-00000 [root@ip-172-31-6-148i] #

3. View test_hive_ delimiter table data

[root@ip-172-31-6-148C] # beeline Beeline version 1.1.0-cdh6.12.1 by Apache Hivebeeline >! connect jdbc:hive2://localhost:10000/;principal=hive/ip-172-31-6-148.fayson.com@FAYSON.COM...Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://localhost:10000/ > select * from test_hive_delimiter ... INFO: OK+-+--+ | test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address | +- -+ | NULL | NULL | NULL | | NULL | | NULL | +-+-| -+-- + 3 rows selected (0.287 seconds) 0: jdbc:hive2://localhost:10000/ >

The construction statement of the 4.Hive table is as follows

3. Solution method

The delimiter "\ u001B" is hexadecimal, while the Hive delimiter is actually octal, so it is escaped by Hive when using the hexadecimal delimiter, so the delimiter displayed after creating the hive table with the "\ u001B" delimiter is "\ u0015".

The Hive table is created by converting hexadecimal delimiters to octal delimiters without changing the data file delimiter.

1. Convert hexadecimal delimiters to octal delimiters

"\ u001B" convert octal to "033", online conversion tool: http://tool.lu/hexconvert/

two。 Modify the table statement to use octal "\ 033" as the delimiter

Create external table test_hive_delimiter (id int, name string, address string) row format delimited fields terminated by'\ 033'stored as textfile location'/ fayson/test_hive_delimiter'

Use the command show create table test_hive_delimiter to view table building statements

0: jdbc:hive2://localhost:10000/ > show create table test_hive_delimiter ... INFO: OK+---+--+ | createtab_stmt | +-- -+-+ | CREATE EXTERNAL TABLE `test_hive_ delimiter` (| | `id` int) | `name` string | | `address` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES (| |' field.delim'='\ u001B'| | | 'serialization.format'='\ u001B') | | STORED AS INPUTFORMAT | |' org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://ip-172-31-6-148.fayson.com' 8020 Universe. | TBLPROPERTIES (| | 'COLUMN_STATS_ACCURATE'='false'') | | | 'numFiles'='0', | |' numRows'='-1', | | 'rawDataSize'='-1', | |' totalSize'='0' | | | 'transient_lastDdlTime'='1504707693') | +-+-- + 22 rows selected (0.079 seconds) 0: jdbc:hive2://localhost:10000/ > |

3. Query test_hive_ delimiter table data

0: jdbc:hive2://localhost:10000/ > select * from test_hive_delimiter ... INFO: OK+-+--+ | test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address | +- -+ | 1 | fayson | guangdong | | 2 | zhangsan | shenzheng | | 3 | lisi | shanghai | +-- -+-- + 3 rows selected (0.107 seconds) 0: jdbc:hive2://localhost:10000/ >

Convert hexadecimal "\ u001B" to octal "\ 033" to create table, and the problem is solved.

4. Remarks

When creating a Hive table using hexadecimal delimiters, it should be noted that some delimiters are escaped (such as 001B/001C, etc.) Sqoop specifies hexadecimal delimiters, why is "\ 0x001B" instead of "\ u001B", please see Sqoop

Official website description: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#\_file\_formats

Drunken whips are famous horses, and teenagers are so pompous! Lingnan Huan Xisha, under the vomiting liquor store! The best friend refuses to let go, the flower of data play!

Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.

It is recommended to follow Hadoop practice, the first time, share more Hadoop practical information, welcome to forward and share.

Original article, welcome to reprint, reprint please indicate: reproduced from the official account of Wechat Hadoop

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report