Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the operation of Hadoop's Icano?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What is the I/O operation of Hadoop, many novices are not very clear about this, in order to help you solve this problem, the following small series will explain in detail for everyone, people who have this need can learn, I hope you can gain something.

1. data integrity

Detect data corruption

Calculate checksum when data is first introduced into the system and calculate checksum again when data is transmitted over an unreliable channel

Checksums can also be corrupted

Error detection code CRC-32 HDFS uses a more efficient variant CRC-32C

HDFS data integrity

Calculates checksums for all data written and verifies checksums when data is read

hadoop fs -checksum Checks the checksum of a file

LocalFileSystem performs client-side checksum validation

ChecksumFileSystem

2. compression

Benefits: Reduce disk space required to store files and accelerate data transfer across networks and disks

Compression Format Tool Algorithm File Extension Split DEFLATE No DEFLATE.deflate No gzipgzipDEFLATE.gz No bzip2bzip2.bz2 Yes LZOlzopLZO.lzo No LZ4 No LZ$.lz4 No Snappy No Snappy. happy No

All compression algorithms require space/time tradeoffs

bzip2 has better compression than gzip, but slower speed

codec

Compression format HadoopCompression CodeCgziporg.apache.hadoop.io.compress.GzipCodeCbzip2org.apache.hadoop.io.compress.BZip2Codec

It's better to use native libraries than Java implementations

A lot of compression and decompression, consider using CodecPool

Compression and input fragmentation

Using compression in MapReduce

public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperatureWithCompression "); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); //Reduce data transfer between map and reducer job.setReducerClass(MaxTemperatureReducer.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }

Compression of map task output

3. serialization

Definition: Serialization is the process of converting a structured object into a byte stream for transmission over a network or writing to disk for permanent storage, deserialization is the reverse process of converting a byte stream back into a structured object

Serialization is used in two areas of distributed data processing: interprocess communication and persistent storage

Writable interface

void write(DataOutput out) throws IOException;void readFields(DataInput in) throws IOException;

IntWritable

WritableComparable

org.apache.hadoop.io

VIntWriteable VLongWriteable

Text Maximum 2GB

4. file-based data structure

About SequenceFile

Suitable for binary type data

hadoop fs -text numbers.seq|head

About MapFile

MapFile is a SequenceFile that has been sorted. It has an index, so you can press the key to find it. The index itself is a SequenceFile containing a small portion of the keys in the map.

Avro Data Files

These are all row-oriented data formats, and there is also a column-oriented format, RC File.

Did reading the above help you? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report