Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does hive specify multiple characters as column delimiters

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to specify multiple characters as column separators in hive". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how hive specifies multiple characters as column delimiters.

Hive create table to specify delimiters, do not support multiple characters as delimiters, if you want to use multiple characters as delimiters, you need to implement InputFormat. The next method is mainly rewritten. The code is as follows

Package hiveStream;import java.io.IOException;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileSplit;import org.apache.hadoop.mapred.InputSplit;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.JobConfigurable;import org.apache.hadoop.mapred.RecordReader;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.TextInputFormat Public class MyHiveInputFormat extends TextInputFormat implements JobConfigurable {public RecordReader getRecordReader (InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException {reporter.setStatus (genericSplit.toString ()); return new MyRecordReader ((FileSplit) genericSplit, job);}}

Package hiveStream;import java.io.IOException;import java.io.InputStream;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.compress.CompressionCodecFactory;import org.apache.hadoop.mapred.FileSplit;import org.apache.hadoop.mapred.RecordReader Import org.apache.hadoop.util.LineReader;public class MyRecordReader implements RecordReader {private CompressionCodecFactory compressionCodecs = null; private long start; private long pos; private long end; private LineReader lineReader; int maxLineLength; / / Construction method public MyRecordReader (FileSplit inputSplit, Configuration job) throws IOException {maxLineLength = job.getInt ("mapred.mutilCharRecordReader.maxlength", Integer.MAX_VALUE); start = inputSplit.getStart () End = start + inputSplit.getLength (); final Path file = inputSplit.getPath (); / / create compressor compressionCodecs = new CompressionCodecFactory (job); final CompressionCodec codec = compressionCodecs.getCodec (file); / / Open file system FileSystem fs = file.getFileSystem (job); FSDataInputStream fileIn = fs.open (file); boolean skipFirstLine = false If (codec! = null) {lineReader = new LineReader (codec.createInputStream (fileIn), job); end = Long.MAX_VALUE;} else {if (start! = 0) {skipFirstLine = true;-- start; fileIn.seek (start);} lineReader = new LineReader (fileIn, job) } if (skipFirstLine) {start + = lineReader.readLine (new Text (), 0, (int) Math.min ((long) Integer.MAX_VALUE, end-start));} this.pos = start;} public MyRecordReader (InputStream in, long offset, long endOffset, int maxLineLength) {this.maxLineLength = maxLineLength; this.start = offset This.lineReader = new LineReader (in); this.pos = offset; this.end = endOffset;} public MyRecordReader (InputStream in, long offset, long endOffset, Configuration job) throws IOException {this.maxLineLength = job.getInt ("mapred.mutilCharRecordReader.maxlength", Integer.MAX_VALUE); this.lineReader = new LineReader (in, job); this.start = offset This.end = endOffset;} @ Override public void close () throws IOException {if (lineReader! = null) lineReader.close ();} @ Override public LongWritable createKey () {return new LongWritable ();} @ Override public Text createValue () {return new Text ();} @ Override public long getPos () throws IOException {return pos } @ Override public float getProgress () throws IOException {if (start = = end) {return 0.0f;} else {return Math.min (1.0f, (pos-start) / (float) (end-start));} @ Override public boolean next (LongWritable key, Text value) throws IOException {while (pos < end) {key.set (pos) Int newSize = lineReader.readLine (value, maxLineLength, Math.max ((int) Math.min (Integer.MAX_VALUE, end-pos), maxLineLength)); / / convert the "# #" in the string to "#" String strReplace = value.toString (). Replace ("#", "#") Text txtReplace = new Text (); txtReplace.set (strReplace); value.set (txtReplace.getBytes (), 0, txtReplace.getLength ()); if (newSize = = 0) return false; pos + = newSize; if (newSize < maxLineLength) return true;} return false;}}

Table creation statement: after customizing outputformat/inputformat, you need to specify outputformat/inputformat when creating a table

Create external table testHiveInput (id int,name string,age int) row format delimited fields terminated by'| 'stored as INPUTFORMAT' hiveStream.MyHiveInputFormat' OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location'/ user/hdfs/hiveInput'

Test data:

1##Tom##22

2##Jerry##22

3##Jeny##22

Test the code (query the data through jdbc):

Public static void testHive () throws Exception {String sql = "select id,name,age from testHiveInput"; Class.forName ("org.apache.hive.jdbc.HiveDriver"); String url = "jdbc:hive2://xxx.xxx.xxx.xxx:10000"; Connection conn = DriverManager.getConnection (url, "hive", "passwd"); Statement stmt = conn.createStatement () Stmt.execute ("add jar / usr/lib/hive/lib/hiveInput.jar"); ResultSet rs = stmt.executeQuery (sql); while (rs.next ()) {System.out.println (rs.getString ("id") + "+ rs.getString (" name ") +" + rs.getString ("age")) }} at this point, I believe you have a deeper understanding of "how to specify multiple characters as column separators in hive". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report