How to add LZO compression support for Hadoop2.6.0 and Spark1.3.1 07/12 Update SLTechnology News&Howtos

How to add LZO compression support for Hadoop2.6.0 and Spark1.3.1

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article focuses on "Hadoop2.6.0 and Spark1.3.1 how to add LZO compression support", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to add LZO compression support for Hadoop2.6.0 and Spark1.3.1.

Because MR will generate a large number of disk and network IO in the computing process, if you can compress the intermediate results of MR, it is bound to further improve the computing performance of MR. As a compression algorithm, LZO not only has a higher compression ratio and better compression speed, but also allows compressed data to be sliced by block. Therefore, most of the current clusters are compressed by LZO algorithm.

This article focuses on Hadoop2.6.0 and introduces the four steps required to add LZO support:

Install LZO

In http://www.oberhumer.com/, download LZO's tar package lzo-2.09.tar.gz

Extract the tar package

Install LZO

Export CFLAGS=-64m

. / configure-enable-shared-prefix=/usr/local/lzo/lzo-2.09

Make

Sudo make install

Install Hadoop-LZO

Synchronize all data under the / usr/local/lzo package to all other nodes in the cluster

Install Hadoop-LZO

Git clone https://github.com/twitter/hadoop-lzo.git

Modify the pom file to change hadoop.current.version from 2.4.0 to 2.6.0

Install Hadoop-LZO

Export CFLAGS=-64m

Export CXXFLAGS=-64m

Export C_INCLUDE_PATH=/usr/local/lzo/lzo-2.09/include

Export LIBRARY_PATH=/usr/local/lzo/lzo-2.09/lib

Mvn clean package-Dmaven.test.skip=true

Cd target/native/Linux-amd64-64

Tar-cBf-- C lib. | | tar-xBvf-- C ~ |

Copy the libgplcompression.* generated under the ~ directory to the $HADOOP_HOME/lib/native directory of each node of the cluster

Copy the hadoop-lzo-0.4.20-SNAPSHOT.jar generated under the target directory to the $HADOOP_HOME/share/hadoop/common directory of each node of the cluster

Hadoop profile modification

In hadoop-env.sh, add

Export LD_LIBRARY_PATH=/usr/local/lzo/lzo-2.09/lib

In core-site.xml, add

Io.compression.codecs

Org.apache.hadoop.io.compress.GzipCodec

Org.apache.hadoop.io.compress.DefaultCodec

Com.hadoop.compression.lzo.LzoCodec

Com.hadoop.compression.lzo.LzopCodec

Org.apache.hadoop.io.compress.BZip2Codec

Io.compression.codec.lzo.class

Com.hadoop.compression.lzo.LzoCodec

In mapred-site.xml, add

Mapred.compress.map.output

True

Mapred.map.output.compression.codec

Com.hadoop.compression.lzo.LzoCodec

Mapred.child.env

LD_LIBRARY_PATH=/usr/local/lzo/lzo-2.09/lib

After restarting the cluster, you can use LZO to compress the data

Sparkp profile modification

In spark-env.sh, add the following configuration

Export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/data/hadoop-2.6.0/lib/nativeexport SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/hadoop-2.6.0/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar so far, I believe you have a deeper understanding of "how to add LZO compression support for Hadoop2.6.0 and Spark1.3.1". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.