In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces Mapreduce RCFile how to write and read API, the article is very detailed, has a certain reference value, interested friends must read it!
RCFile is a row and column storage structure with high compression ratio and efficient reading developed by FaceBook. You can usually use insert-select transformations directly on an Text table in Hive, but sometimes you want to use Mapreduce for RCFile reading and writing.
Org.apache.hadoop
Hadoop-client
2.5.0-cdh6.2.1
Org.apache.hive
Hive-serde
0.13.1-cdh6.2.1
Org.apache.hive.hcatalog
Hive-hcatalog-core
0.13.1-cdh6.2.1
Read the text file and use mapreduce to generate the RCFile format file
Import org.apache.hadoop.conf.Configuration
Import org.apache.hadoop.fs.Path
Import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable
Import org.apache.hadoop.hive.serde2.columnar.BytesRefWritable
Import org.apache.hadoop.io.NullWritable
Import org.apache.hadoop.io.Text
Import org.apache.hadoop.mapreduce.Job
Import org.apache.hadoop.mapreduce.Mapper
Import org.apache.hadoop.mapreduce.Reducer
Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
Import org.apache.hive.hcatalog.rcfile.RCFileMapReduceInputFormat
Import java.io.IOException
Public class RcFileReaderJob {
Static class RcFileMapper extends Mapper {
@ Override
Protected void map (Object key, BytesRefArrayWritable value
Context context)
Throws IOException, InterruptedException {
Text txt = new Text ()
StringBuffer sb = new StringBuffer ()
For (int I = 0; I < value.size (); iTunes +) {
BytesRefWritable v = value.get (I)
Txt.set (v.getData (), v.getStart (), v.getLength ())
If (I = = value.size ()-1) {
Sb.append (txt.toString ())
} else {
Sb.append (txt.toString () + "\ t")
}
}
Context.write (new Text (sb.toString ()), NullWritable.get ())
}
@ Override
Protected void cleanup (Context context) throws IOException
InterruptedException {
Super.cleanup (context)
}
@ Override
Protected void setup (Context context) throws IOException
InterruptedException {
Super.setup (context)
}
}
Static class RcFileReduce extends Reducer {
@ Override
Protected void reduce (Text key, Iterable values
Context context) throws IOException, InterruptedException {
Context.write (key, NullWritable.get ())
}
}
Public static boolean runLoadMapReducue (Configuration conf, Path input, Path output) throws IOException
ClassNotFoundException, InterruptedException {
Job job = Job.getInstance (conf)
Job.setJarByClass (RcFileReaderJob.class)
Job.setJobName ("RcFileReaderJob")
Job.setNumReduceTasks (1)
Job.setMapperClass (RcFileMapper.class)
Job.setReducerClass (RcFileReduce.class)
Job.setInputFormatClass (RCFileMapReduceInputFormat.class)
/ / MultipleInputs.addInputPath (job, input, RCFileInputFormat.class)
RCFileMapReduceInputFormat.addInputPath (job, input)
Job.setOutputKeyClass (Text.class)
Job.setOutputValueClass (NullWritable.class)
FileOutputFormat.setOutputPath (job, output)
Return job.waitForCompletion (true)
}
Public static void main (String [] args) throws Exception {
Configuration conf = new Configuration ()
If (args.length! = 2) {
System.err.println ("Usage: rcfile")
System.exit (2)
}
RcFileReaderJob.runLoadMapReducue (conf, new Path (args [0]), new Path (args [1]))
}
}
Read RCFile format file and use mapreduce to generate Text format file
Import org.apache.hadoop.conf.Configuration
Import org.apache.hadoop.conf.Configured
Import org.apache.hadoop.fs.Path
Import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable
Import org.apache.hadoop.hive.serde2.columnar.BytesRefWritable
Import org.apache.hadoop.io.NullWritable
Import org.apache.hadoop.io.Text
Import org.apache.hadoop.mapreduce.Job
Import org.apache.hadoop.mapreduce.Mapper
Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
Import org.apache.hadoop.util.GenericOptionsParser
Import org.apache.hadoop.util.Tool
Import org.apache.hadoop.util.ToolRunner
Import org.apache.hive.hcatalog.rcfile.RCFileMapReduceOutputFormat
Import java.io.IOException
Public class RcFileWriterJob extends Configured implements Tool {
Public static class Map extends Mapper {
Private byte [] fieldData
Private int numCols
Private BytesRefArrayWritable bytes
@ Override
Protected void setup (Context context) throws IOException, InterruptedException {
NumCols = context.getConfiguration () .getInt ("hive.io.rcfile.column.number.conf", 0)
Bytes = new BytesRefArrayWritable (numCols)
}
Public void map (Object key, Text line, Context context
) throws IOException, InterruptedException {
Bytes.clear ()
String [] cols = line.toString () .split ("\ t",-1)
System.out.println ("SIZE:" + cols.length)
For (int item0; I
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.