Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

ObjectInspector Design in Hive

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

ObjectInspector is a confusing concept in Hive at first glance. It took a long time to understand when I read the Hive source code. After reading it, it is found that ObjectInspector is quite useful, decoupling data usage and data format, thus improving the degree of code reuse. To put it simply, the ObjectInspector interface allows Hive to be free from a specific data format, making the data stream 1) switch different input / output formats between input and output. 2) use different data formats on different Operator.

This is ObjectInspector interface.

Public interface ObjectInspector extends Cloneable {

Public static enum Category {

PRIMITIVE, LIST, MAP, STRUCT, UNION

}

String getTypeName ()

Category getCategory ()

}

This interface provides the most general methods getTypeName and getCategory. Let's take a look at its sub-abstract class and interface:

StructObjectInspector

MapObjectInspector

ListObjectInspector

PrimitiveObjectInspector

UnionObjectInspector

Among them, PrimitiveObjectInspector is used to parse the basic data types, while StructObjectInspector uses to complete the parsing of a row of data, which itself consists of a set of ObjectInspector. Because Hive supports Nested Data Structure, arbitrary ObjectInspector can be nested (one or more layers) in StructObjectInspector. Struct, Map, List, and Union are the four collection data types supported by Hive. For example, the data of a column can be declared as a Struct type, so that another StructObjectInspector is nested in the StructObjectInspector that parses this column.

Now we can look at how ObjectInspector works from a small example, which is a test case code for Hive SerDe:

/ * *

* Test the LazySimpleSerDe class.

, /

Public void testLazySimpleSerDe () throws Throwable {

Try {

/ / Create the SerDe

LazySimpleSerDe serDe = new LazySimpleSerDe ()

Configuration conf = new Configuration ()

Properties tbl = createProperties ()

/ / initialize serDe with Properties

SerDe.initialize (conf, tbl)

/ / Data

Text t = new Text ("123\ t456\ t789\ t1000\ t5.3\ thive and hadoop\ T1.\ tNULL")

String s = "123\ t456\ t789\ t1000\ t5.3\ thive and hadoop\ tNULL\ tNULL"

Object [] expectedFieldsData = {new ByteWritable ((byte) 123)

New ShortWritable (short) 456), new IntWritable (789)

New LongWritable (1000), new DoubleWritable (5.3)

New Text ("hive and hadoop"), null, null}

/ / Test

DeserializeAndSerialize (serDe, t, s, expectedFieldsData)

} catch (Throwable e) {

E.printStackTrace ()

Throw e

}

}

Private void deserializeAndSerialize (LazySimpleSerDe serDe, Text t, String s)

Object [] expectedFieldsData) throws SerDeException {

/ / Get the row ObjectInspector

StructObjectInspector oi = (StructObjectInspector) serDe

.getObjectInspector ()

/ / get column information

List

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report