Detailed tutorials for Hadoop HBase configuration and Snappy installation 04/28 Update SLTechnology News&Howtos

Detailed tutorials for Hadoop HBase configuration and Snappy installation

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "the detailed tutorial of Hadoop HBase configuration and installation of Snappy". In the daily operation, I believe that many people have doubts about the detailed tutorial of Hadoop HBase configuration and installation of Snappy. The editor consulted all kinds of materials and sorted out a simple and easy-to-use method of operation. I hope it will be helpful to answer the doubts of "detailed tutorial of Hadoop HBase configuration and installation of Snappy". Next, please follow the editor to study!

This article mainly includes:

1. Introduction of Snappy compression algorithm and comparison of centralized compression algorithm.

2. Snappy installation process and verification

3. The compilation process of Hadoop Snappy source code and the solution to the problem

4. Hadoop Snappy installation and configuration process and verification on Hadoop

5. HBase configuration Snappy and verification

6. How to deploy all nodes in a cluster

Don't talk too much nonsense, starting now:

1. Introduction of Snappy compression algorithm and comparison of several compression algorithms

This section can refer to my previous blog post: Hadoop Compression-SNAPPY algorithm, or directly refer to the Google documents: http://code.google.com/p/snappy/ and http://code.google.com/p/hadoop-snappy/. No more details.

2. Snappy installation process and verification

① precondition

Gcc caterpillar, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3

If you are not sure, you can directly use yum install XXX to confirm it. If it has been installed, it will be prompted, and if it is not installed, it will be installed automatically.

② downloads Snappy 1.0.5

Download address: http://code.google.com/p/snappy/downloads/list.

③ compiles and installs dynamic link libraries locally

one

two

three

. / configure

Make

Make install

/ usr/local/lib is installed by default. At this point, viewing in this directory generates:

one

two

three

four

five

six

seven

eight

nine

ten

[root@slave1 lib] # pwd

/ usr/local/lib

[root@slave1 lib] # ll

Total 536

-rw-r--r--. 1 root root 369308 Jan 14 11:02 libsnappy.a

-rwxr-xr-x. 1 root root 957 Jan 14 11:02 libsnappy.la

Lrwxrwxrwx. 1 root root 18 Jan 14 11:02 libsnappy.so-> libsnappy.so.1.1.3

Lrwxrwxrwx. 1 root root 18 Jan 14 11:02 libsnappy.so.1-> libsnappy.so.1.1.3

-rwxr-xr-x. 1 root root 171796 Jan 14 11:02 libsnappy.so.1.1.3

[root@slave1 lib] #

If there are no errors and the files and links are consistent, the installation of snappy has been successful.

3. The compilation process of Hadoop Snappy source code and the solution to the problem

Download Hadoop-Snappy source code from ①

Download address: http://code.google.com/p/hadoop-snappy/

② compiles hadoop snappy source code

Mvn package [- Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]

Note: if the second step snappy installation path is the default, that is, / usr/local/lib, then [- Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] can not be written here, or-Dsnappy.prefix=/usr/local/lib

In this process, if the versions of your CentOS software are exactly the same as those required by Hadoop Snappy, congratulations, you can have a success, but some will still have problems. There are three thornier problems I have encountered:

Error 1: / root/modules/hadoop-snappy/maven/build-compilenative.xml:62: Execute failed: java.io.IOException: Cannot run program "autoreconf" (in directory "/ root/modules/hadoop-snappy/target/native-src"): java.io.IOException: error=2, No such file or directory

Solution: explain that the file is missing, but this file is under target, is automatically generated during compilation, and should not exist in the first place. what is the question? In fact, the fundamental problem is not the lack of files, but that Hadoop Snappy requires certain prerequisites: Requirements: gcc cased documentation, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3.

Because of the lack of autoconf,automake,libtool in me. In ubuntu, you can apt-get install autoconf,automake,libtool directly. If you are in CentOS, just replace apt-get with yum.

Error 2:

one

two

three

four

[exec] make: * * [src/org/apache/hadoop/io/compress/snappy/SnappyCompressor.lo] Error 1

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile) on project hadoop-snappy: An Ant BuildException has occured: The following error occurred while executing this line:

[ERROR] / home/ngc/Char/snap/hadoop-snappy/hadoop-snappy-read-only/maven/build-compilenative.xml:75: exec returned: 2

Solution: this is the most disgusting. The prerequisites for Hadoop Snappy require gcc to be installed, but its official documentation lists only what version of gcc is required, not what version of gcc is required. After searching Chinese and English in Google for a long time, I finally found that there is a saying that Hadoop Snappy needs gcc4.4. And mine is gcc4.6.3.

one

two

three

four

five

[root@master modules] # gcc-- version

Gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)

This is free software; see the source for copying conditions. There is NO

Warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Then we have to fall back, how to fall back:

one

two

three

four

five

six

seven

1. Apt-get install gcc-3.4

2. Rm / usr/bin/gcc

3. Ln-s / usr/bin/gcc-4.4 / usr/bin/gcc

After that, gcc-- version, you will find that gcc has become 4.4.7.

Error 3:

one

two

three

four

five

[exec] / bin/bash. / libtool-- tag=CC-- mode=link gcc-g-Wall-fPIC-O2-M64-g-O2-version-info 0:1:0-L/usr/local//lib-o libhadoopsnappy.la-rpath / usr/local/lib src/org/apache/hadoop/io/compress/snappy/SnappyCompressor.lo src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.lo-ljvm-ldl

[exec] / usr/bin/ld: cannot find-ljvm

[exec] collect2: ld returned 1 exit status

[exec] make: * * [libhadoopsnappy.la] error 1

[exec] libtool: link: gcc-shared-fPIC-DPIC src/org/apache/hadoop/io/compress/snappy/.libs/SnappyCompressor.o src/org/apache/hadoop/io/compress/snappy/.libs/SnappyDecompressor.o-L/usr/local//lib-ljvm-ldl-O2-M64-O2-Wl,-soname-Wl,libhadoopsnappy.so.0-o. Libs / libhadoopsnappy.so.0.0.1

Solution: if you search, you will find that there are many blogs like usr/bin/ld: cannot find-lxxx online, but here, I tell you, none of them apply. Because there is nothing missing here, nor is it the wrong version, it is because there is no libjvm.so symbolic link for installing jvm to usr/local/lib. If your system is amd64, you can go to / root/bin/jdk1.6.0_37/jre/lib/amd64/server/ to see where libjvm.so link goes, and modify it here as follows:

The ln-s / root/bin/jdk1.6.0_37/jre/lib/amd64/server/libjvm.so / usr/local/lib/ problem can be solved.

After the ③ hadoop snappy source code is compiled successfully, under the target package, you will have the following files:

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

[root@master snappy-hadoop] # cd target/

[root@master target] # ll

Total 928

Drwxr-xr-x. 2 root root 4096 Jan 13 19:42 antrun

Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 archive-tmp

Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 classes

-rw-r--r--. 1 root root 168 Jan 13 19:44 copynativelibs.sh

Drwxr-xr-x. 4 root root 4096 Jan 13 19:42 generated-sources

-rw-r--r--. 1 root root 11526 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT.jar

-rw-r--r--. 1 root root 337920 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT-Linux-amd64-64.tar

Drwxr-xr-x. 3 root root 4096 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT-tar

-rw-r--r--. 1 root root 180661 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT.tar.gz

Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 maven-archiver

Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 native-build

Drwxr-xr-x. 7 root root 4096 Jan 13 19:42 native-src

Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 surefire-reports

Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 test-classes

-rw-r--r--. 1 root root 365937 Jan 13 19:44 test.txt.snappy

[root@master target] #

4. Hadoop Snappy installation and configuration process and verification on Hadoop

This process is also relatively complicated, and more configuration points should be carefully arranged:

Decompress ① step 3: hadoop-snappy-0.0.1-SNAPSHOT.tar.gz under target. After decompression, copy the lib file.

one

Cp-r / root/modules/snappy-hadoop/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 Compact * $HADOOP_HOME/lib/native/Linux-amd64-64 /

② copies the hadoop-snappy-0.0.1-SNAPSHOT.jar under step 3 target to $HADOOP_HOME/lib.

③ configure hadoop-env.sh, add:

one

Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/

④ configures mapred-site.xml, and all the compression-related configuration options in this file are:

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

twenty

twenty-one

twenty-two

twenty-three

twenty-four

twenty-five

twenty-six

twenty-seven

twenty-eight

twenty-nine

thirty

thirty-one

thirty-two

thirty-three

thirty-four

thirty-five

thirty-six

thirty-seven

Mapred.output.compress

False

Should the job outputs be compressed?

Mapred.output.compression.type

RECORD

If the job outputs are to compressed as SequenceFiles, how should

They be compressed? Should be one of NONE, RECORD or BLOCK.

Mapred.output.compression.codec

Org.apache.hadoop.io.compress.DefaultCodec

If the job outputs are compressed, how should they be compressed?

Mapred.compress.map.output

False

Should the outputs of the maps be compressed before being

Sent across the network. Uses SequenceFile compression.

Mapred.map.output.compression.codec

Org.apache.hadoop.io.compress.DefaultCodec

If the map outputs are compressed, how should they be

Compressed?

Just configure it according to your own needs. To facilitate verification, we only configure the map part:

one

two

three

four

five

six

seven

eight

Mapred.compress.map.output

True

Mapred.map.output.compression.codec

Org.apache.hadoop.io.compress.SnappyCodec

⑤ restarts hadoop. To verify success, upload a text file to hdfs, type in some phrases, and run the wordcount program. If the map part is 100% complete, it means that our hadoop snappy installation is successful.

Because hadoop doesn't provide a util.CompressionTest class like HBase (or I can't find it), I have to test it this way. Next, the configuration process for HBase using Snappy is listed in detail.

5. HBase configuration Snappy and verification

After successfully configuring Snappy on Hadoop, it is relatively easier to configure on HBase.

① configures the lib file in HBase lib/native/Linux-amd64-64 /. The lib file in HBase, that is, all the lib files under / root/modules/snappy-hadoop/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 / in step 3, and the lib file of hadoop under $HADOOP_HOME/lib/native/Linux-amd64-64 / in Hadoop (most of the articles I have seen about snappy mention this). For simplicity, we just need to copy all the lib files under $HADOOP_HOME/lib/native/Linux-amd64-64 / to the corresponding HBase directory:

one

Cp-r $HADOOP_HOME/lib/native/Linux-amd64-64 Compact * $HBASE_HOME/lib/native/Linux-amd64-64 /

② configure HBase environment variable hbase-env.sh

one

two

Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/

Export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/

③ restarts HBase.

④ verifies that the installation is successful

First, use CompressionTest to see if snappy is enabled and can loaded successfully:

Hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://192.168.205.5:9000/output/part-r-00000 snappy

Where / output/part-r-00000 is the output of wordcount when we verify hadoop snappy.

The result after executing the command is:

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

[root@master ~] # hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://192.168.205.5:9000/output/part-r-00000 snappy

13-01-13 21:59:24 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.

13-01-13 21:59:24 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32

13-01-13 21:59:24 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.

13-01-13 21:59:24 DEBUG util.FSUtils: Creating file:hdfs://192.168.205.5:9000/output/part-r-00000with permission:rwxrwxrwx

13-01-13 21:59:24 WARN snappy.LoadSnappy: Snappy native library is available

13-01-13 21:59:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13-01-13 21:59:24 INFO snappy.LoadSnappy: Snappy native library loaded

13-01-13 21:59:24 INFO compress.CodecPool: Got brand-new compressor

13-01-13 21:59:24 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled

13-01-13 21:59:24 INFO compress.CodecPool: Got brand-new decompressor

SUCCESS

Indicates that the Snappy installation has been enable and can be successfully loaded.

⑤ then creates and manipulates the table in Snappy compressed format

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

twenty

twenty-one

twenty-two

twenty-three

twenty-four

[root@master ~] # hbase shell

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.94.2, r1395367, Sun Oct 7 19:11:01 UTC 2012

/ / create a table

Hbase (main): 001tsnappy', 0 > create 'tsnappy', {NAME = >' fallow, COMPRESSION = > 'snappy'}

0 row (s) in 10.6590 seconds

/ / describe table

Hbase (main): 002 describe 0 > tsnappy'

DESCRIPTION ENABLED

{NAME = > 'tsnappy', FAMILIES = > [{NAME = >' fallow, DATA_BLOCK_ENCODING = > 'NONE', BLOOMFILTER = >' NONE', REPLICATION_ true

SCOPE = > '0mm, VERSIONS = >' 34th, COMPRESSION = > 'SNAPPY', MIN_VERSIONS = >' 0mm, TTL = > '2147483647, KEEP_DELETED_CE

LLS = > 'false', BLOCKSIZE = >' 65536', IN_MEMORY = > 'false', ENCODE_ON_DISK = >' true', BLOCKCACHE = > 'true'}]}

1 row (s) in 0.2140 seconds

/ / put data

Hbase (main): 003row1', 0 > put 'tsnappy',' row1', 'foul col1colors, 'value'

0 row (s) in 0.5190 seconds

/ / scan data

Hbase (main): 004scan 0 > tsnappy'

ROW COLUMN+CELL

Row1 column=f:col1, timestamp=1358143780950, value=value

1 row (s) in 0.0860 seconds

Hbase (main): 005VR 0 >

All the above procedures have been successfully performed, indicating that Snappy has been successfully configured on Hadoop and HBase.

6. How to deploy all nodes in a cluster

This step is very simple, especially if you have configured a Hadoop cluster. You just need to distribute all the configured files above to the appropriate directories of all other nodes, including the snappy link libraries under the generated / usr/lib/local.

At this point, the study on the "detailed tutorial on Hadoop HBase configuration and installation of Snappy" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.