Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to convert Avro data to Parquet format

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to convert Avro data to Parquet format". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's ideas to study and learn "how to convert Avro data to Parquet format".

This article focuses on testing the process of converting Avro data to Parquet format and viewing the schema and metadata of Parquet files.

Prepare for

To convert the text data to Parquet format and read the content, you can refer to Cloudera's MapReduce example: https://github.com/cloudera/parquet-examples.

Prepare the text data a.txt in CSV format:

1,23,44,5

To prepare Avro test data, you can refer to the article loading Avro data into Spark.

The test environment of this article is CDH 5.2, and the Avro and Parquet components have been installed through the YUM source.

Convert CSV to Parquet

Create a table in Hive and import the data:

Create table mycsvtable (x int, y int) row format delimitedFIELDS TERMINATED BY', 'STORED AS TEXTFILE;LOAD DATA LOCAL INPATH' a.txt 'OVERWRITE INTO TABLE mycsvtable

Create the Parquet table and transform the data:

Create table myparquettable (an INT, b INT) STORED AS PARQUETLOCATION'/ tmp/data';insert overwrite table myparquettable select * from mycsvtable

View the data for the generated myparquettable table on hdfs:

$hadoop fs-ls / tmp/dataFound 1 items-rwxrwxrwx 3 hive hadoop 331 2015-03-25 15:50 / tmp/data/000000_0

View the data of the myparquettable table in hive:

Hive (default) > select * from myparquettable;OKmyparquettable.a myparquettable.b1 23 44 5Time taken: 0.149 seconds, Fetched: 3 row (s)

View the schema of the / tmp/data/000000_0 file:

$hadoop parquet.tools.Main schema / tmp/data/000000_0message hive_schema {optional int32 a; optional int32 b;}

View the metadata of the / tmp/data/000000_0 file:

$hadoop parquet.tools.Main meta / tmp/data/000000_0creator: parquet-mr version 1.5.0-cdh6.2.0 (build 8e266e052e423af5 [more]... file schema: hive_schema -a: OPTIONAL INT32 Rv0 DRV 1b: OPTIONAL INT32 RRV 0 D:1row group 1: RC:3 TS:102 -a: INT32 UNCOMPRESSED DO:0 FPO:4 SZ:51/51/1.00 VC:3 ENC:BIT [more]... b: INT32 UNCOMPRESSED DO:0 FPO:55 SZ:51/51/1.00 VC:3 ENC:BI [more]... Convert Avro to Parquet

Generate Avro data from json data using schema and json data that load avro data into Spark:

$java-jar / usr/lib/avro/avro-tools.jar fromjson-- schema-file twitter.avsc twitter.json > twitter.avro

Upload twitter.avsc and twitter.avro to hdfs:

$hadoop fs-put twitter.avsc$ hadoop fs-put twitter.avro

Use https://github.com/laserson/avro2parquet to convert avro to parquet format:

$hadoop jar avro2parquet.jar twitter.avsc twitter.avro / tmp/out

Then, create the table in hive and import the data:

Create table tweets_parquet (username string, tweet string, timestamp bigint) STORED AS PARQUET;load data inpath'/ tmp/out/part-m-00000.snappy.parquet' overwrite into table tweets_parquet; thank you for reading, the above is the content of "how to convert Avro data into Parquet format". After the study of this article, I believe you have a deeper understanding of how to convert Avro data into Parquet format. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report