Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the Parquet column storage mode

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article focuses on "what is the Parquet column storage mode", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what is the Parquet column storage mode"!

Brief introduction

Apache Parquet is a column storage format that can effectively store nested data. (column storage format performs well in terms of file size and query performance.) the outstanding contribution of Parquet is the ability to save data with deeply nested structures in a true column storage format. Reference blog-brief Book

Data model

Atomic type

Type description boolean binary value int3232 bit signed integer int6464 bit signed integer int9696 bit signed integer float (32 bit) IEEE754 single precision floating point number double (64 bit) IEEE754 single precision floating point binary8 bit unsigned byte sequence fixed_len_byte_array fixed number of 8 bit unsigned bytes

A simple Parquet mode:

Message WeatherRecord {required int32 year; required int32 temperature; required binary stationId (UTF-8);}

The atomic type of parquet does not include the string type. Required binary stationId (UTF-8); represents a string

Logical type

Logical type annotation description pattern sample UTF-8 A string of UTF-8 characters that can be used to annotate a collection of named values for binary message m {required binary a (UTF-8);} ENUM, and for binary message m {required binary a (ENUM) } DECIMAL (precision,scale) arbitrary precision signed decimal, can be used to annotate int32, int64, binary or fixed_len_byte_array message m {required int32 a (DECIMAL (5Power2));} DATE date value without time, can be used to annotate int32. The number of days since the first year of Unix represents message m {required int32 a (DATE);} LIST, an ordered set of values that can be used to annotate group message m {required group a (LIST) {required group list {required int32 element } MAP an unordered set of key-value pairs that can be used to annotate group message m {required group a (MAP) {required group key_value {required binary key (UTF-8); optional int32 value;}

Parquet file format

A Paruet file consists of a file header, one or more file blocks immediately following it, and a trailer for the end of the file. The header contains only a 4-byte number called PAR1, which is used to identify the entire Parquet file format. All metadata for the file is saved at the end of the file.

Configuration of Parquet

When setting the size of a file block, you need to make a compromise between scan rate and memory usage. Larger file blocks contain more lines, so scanning is more efficient. It can also improve the efficiency of sequential Istroke O operations (because there is less extra overhead when setting column blocks). However, each file block needs to be cached in memory during read / write operations, which prevents the file block from being too large. The default file block is 128MB.

At this point, I believe you have a deeper understanding of "what is the Parquet column storage mode". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report