How to implement Hadoop Serialization 07/06 Update SLTechnology News&Howtos

How to implement Hadoop Serialization

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains the "Hadoop serialization how to achieve", the article explains the content is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "Hadoop serialization how to achieve" it!

Hadoop I/OData Integrity

Hdfs:% hadoop fs-cat hdfs://namenode/data/a.txt

LocalFS:% hadoop fs-cat file:///tmp/a.txt

Generate crc check sum file

% hadoop fs-copyToLocal-crc / data/a.txt file:///data/a.txt

Check sum file: .a.txt.crc is a hidden file.

Ref: CRC-32, cyclic redundancy check algorithm, error-detecting.

Io.bytes.per.checksum is deprecated, it's dfs.bytes-per-checksum, default is 512, Must not be larger than dfs.stream-buffer-size,which is the size of buffer to stream files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.

Data Compression

Common algorithm

When reading, hadoop supports four compression algorithms, if you adjust the space and efficiency,-1 ~-9, representing from the optimal speed to the optimal space. The compression algorithm is supported in org.apache.hadoop.io.compress.*.

Deflate (.accepate), which is commonly used as gzip, package.. DefaultCodec

Gzip (.gz), adding a header and trailer to the deflate format. Compression speed (moderate), decompression speed (moderate), compression efficiency (moderate), package.. GzipCodec, both of java and native

Bzip2 (.bz2), compression speed (worst)

< 解压速度(最差),压缩效率 (最好)，特点是支持可切分(splitable)，对map-red非常友好。,package ..BZip2Codec,java only LZO (.lzo), 压缩速度(最快),解压速度(最快),压缩效率(最差),,package com.hadoop.compressiojn.lzo.lzopCodec, native only 如果禁用原生库,使用hadoop.native.lib. 如果使用原生库,可能对象创建的成本较高,所以可以使用CodecPool，重复使用这些对象。对于一个非常大的数据文件，存储如下方案：使用支持切分的bzip2 手动切分，并使压缩后的part接近于block size. 使用Sequence File, 它支持压缩和切分使用Avro数据文件，它也支持压缩和切分，而且增加了很多编程语言的可读写性。如果Map-Red的output自动压缩: conf.setBoolean ("mared.output.compress",true);conf.setClass("mapred.output.compression.codec",GzipCodec.class,CompressionCodec.class); 如果Map-Red的中间结果的自动压缩: //or conf.setCompressMapOutput(true);conf.setBoolean ("mared.compress.map.output",true);//or conf.setMapOutputComressorClass(GzipCodec.class)conf.setClass("mapred.map.output.compression.codec",GzipCodec.class,CompressionCodec.class);序列化(Serialization/Deserialization)Writable and WritableComparable // core class for hadooppublic interface Writable{ void write(DataOutput out) throw IOException; void readFields(DataInput in) throw IOException;}public interface Comparable{ int compareTo(T o);}//core class for map-reduce shufflepublic interface WritableComparable extends Writable, Comparable {}// Samplepublic class MyWritableComparable implements WritableComparable { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public int compareTo(MyWritableComparable o) { int thisValue = this.value; int thatValue = o.value; return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } public int hashCode() { final int prime = 31; int result = 1; result = prime * result + counter; result = prime * result + (int) (timestamp ^ (timestamp >

> > 32); return result} / / optimize for stream comparasionpublic interface RawComparator extends Comparator {/ / S1 start position, L1, length of bytes public int compare (byte [] b1, int S1 start position int L1 byte [] b2);} public class WritableComparator implements RawComparator {} Comparator RawComparator WritableComparator

WritableComparator provides an implementation of compare deserialization of the original compator with poor performance. However, it serves as a factory for RawComparator instances:

RawComparator comparator = WritableComparator.get (IntWritable.class)

/ / register an optimized comparison operator. Register an optimized comparator for a WritableComparable implementation.

Static void define (Class c, WritableComparator comparator)

/ / get a comparison operator of WritableComparable. Get a comparator for a WritableComparable implementation.

Static WritableComparator get (Class

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.