In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
What is the I/O operation of Hadoop, many novices are not very clear about this, in order to help you solve this problem, the following small series will explain in detail for everyone, people who have this need can learn, I hope you can gain something.
1. data integrity
Detect data corruption
Calculate checksum when data is first introduced into the system and calculate checksum again when data is transmitted over an unreliable channel
Checksums can also be corrupted
Error detection code CRC-32 HDFS uses a more efficient variant CRC-32C
HDFS data integrity
Calculates checksums for all data written and verifies checksums when data is read
hadoop fs -checksum Checks the checksum of a file
LocalFileSystem performs client-side checksum validation
ChecksumFileSystem
2. compression
Benefits: Reduce disk space required to store files and accelerate data transfer across networks and disks
Compression Format Tool Algorithm File Extension Split DEFLATE No DEFLATE.deflate No gzipgzipDEFLATE.gz No bzip2bzip2.bz2 Yes LZOlzopLZO.lzo No LZ4 No LZ$.lz4 No Snappy No Snappy. happy No
All compression algorithms require space/time tradeoffs
bzip2 has better compression than gzip, but slower speed
codec
Compression format HadoopCompression CodeCgziporg.apache.hadoop.io.compress.GzipCodeCbzip2org.apache.hadoop.io.compress.BZip2Codec
It's better to use native libraries than Java implementations
A lot of compression and decompression, consider using CodecPool
Compression and input fragmentation
Using compression in MapReduce
public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperatureWithCompression "); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); //Reduce data transfer between map and reducer job.setReducerClass(MaxTemperatureReducer.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }
Compression of map task output
3. serialization
Definition: Serialization is the process of converting a structured object into a byte stream for transmission over a network or writing to disk for permanent storage, deserialization is the reverse process of converting a byte stream back into a structured object
Serialization is used in two areas of distributed data processing: interprocess communication and persistent storage
Writable interface
void write(DataOutput out) throws IOException;void readFields(DataInput in) throws IOException;
IntWritable
WritableComparable
org.apache.hadoop.io
VIntWriteable VLongWriteable
Text Maximum 2GB
4. file-based data structure
About SequenceFile
Suitable for binary type data
hadoop fs -text numbers.seq|head
About MapFile
MapFile is a SequenceFile that has been sorted. It has an index, so you can press the key to find it. The index itself is a SequenceFile containing a small portion of the keys in the map.
Avro Data Files
These are all row-oriented data formats, and there is also a column-oriented format, RC File.
Did reading the above help you? If you still want to have further understanding of related knowledge or read more related articles, please pay attention to the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.