In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "the detailed tutorial of Hadoop HBase configuration and installation of Snappy". In the daily operation, I believe that many people have doubts about the detailed tutorial of Hadoop HBase configuration and installation of Snappy. The editor consulted all kinds of materials and sorted out a simple and easy-to-use method of operation. I hope it will be helpful to answer the doubts of "detailed tutorial of Hadoop HBase configuration and installation of Snappy". Next, please follow the editor to study!
This article mainly includes:
1. Introduction of Snappy compression algorithm and comparison of centralized compression algorithm.
2. Snappy installation process and verification
3. The compilation process of Hadoop Snappy source code and the solution to the problem
4. Hadoop Snappy installation and configuration process and verification on Hadoop
5. HBase configuration Snappy and verification
6. How to deploy all nodes in a cluster
Don't talk too much nonsense, starting now:
1. Introduction of Snappy compression algorithm and comparison of several compression algorithms
This section can refer to my previous blog post: Hadoop Compression-SNAPPY algorithm, or directly refer to the Google documents: http://code.google.com/p/snappy/ and http://code.google.com/p/hadoop-snappy/. No more details.
2. Snappy installation process and verification
① precondition
Gcc caterpillar, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3
If you are not sure, you can directly use yum install XXX to confirm it. If it has been installed, it will be prompted, and if it is not installed, it will be installed automatically.
② downloads Snappy 1.0.5
Download address: http://code.google.com/p/snappy/downloads/list.
③ compiles and installs dynamic link libraries locally
?
one
two
three
. / configure
Make
Make install
/ usr/local/lib is installed by default. At this point, viewing in this directory generates:
?
one
two
three
four
five
six
seven
eight
nine
ten
[root@slave1 lib] # pwd
/ usr/local/lib
[root@slave1 lib] # ll
Total 536
-rw-r--r--. 1 root root 369308 Jan 14 11:02 libsnappy.a
-rwxr-xr-x. 1 root root 957 Jan 14 11:02 libsnappy.la
Lrwxrwxrwx. 1 root root 18 Jan 14 11:02 libsnappy.so-> libsnappy.so.1.1.3
Lrwxrwxrwx. 1 root root 18 Jan 14 11:02 libsnappy.so.1-> libsnappy.so.1.1.3
-rwxr-xr-x. 1 root root 171796 Jan 14 11:02 libsnappy.so.1.1.3
[root@slave1 lib] #
If there are no errors and the files and links are consistent, the installation of snappy has been successful.
3. The compilation process of Hadoop Snappy source code and the solution to the problem
Download Hadoop-Snappy source code from ①
Download address: http://code.google.com/p/hadoop-snappy/
② compiles hadoop snappy source code
Mvn package [- Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
Note: if the second step snappy installation path is the default, that is, / usr/local/lib, then [- Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] can not be written here, or-Dsnappy.prefix=/usr/local/lib
In this process, if the versions of your CentOS software are exactly the same as those required by Hadoop Snappy, congratulations, you can have a success, but some will still have problems. There are three thornier problems I have encountered:
Error 1: / root/modules/hadoop-snappy/maven/build-compilenative.xml:62: Execute failed: java.io.IOException: Cannot run program "autoreconf" (in directory "/ root/modules/hadoop-snappy/target/native-src"): java.io.IOException: error=2, No such file or directory
Solution: explain that the file is missing, but this file is under target, is automatically generated during compilation, and should not exist in the first place. what is the question? In fact, the fundamental problem is not the lack of files, but that Hadoop Snappy requires certain prerequisites: Requirements: gcc cased documentation, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3.
Because of the lack of autoconf,automake,libtool in me. In ubuntu, you can apt-get install autoconf,automake,libtool directly. If you are in CentOS, just replace apt-get with yum.
Error 2:
?
one
two
three
four
[exec] make: * * [src/org/apache/hadoop/io/compress/snappy/SnappyCompressor.lo] Error 1
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile) on project hadoop-snappy: An Ant BuildException has occured: The following error occurred while executing this line:
[ERROR] / home/ngc/Char/snap/hadoop-snappy/hadoop-snappy-read-only/maven/build-compilenative.xml:75: exec returned: 2
Solution: this is the most disgusting. The prerequisites for Hadoop Snappy require gcc to be installed, but its official documentation lists only what version of gcc is required, not what version of gcc is required. After searching Chinese and English in Google for a long time, I finally found that there is a saying that Hadoop Snappy needs gcc4.4. And mine is gcc4.6.3.
?
one
two
three
four
five
[root@master modules] # gcc-- version
Gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
Warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Then we have to fall back, how to fall back:
?
one
two
three
four
five
six
seven
1. Apt-get install gcc-3.4
2. Rm / usr/bin/gcc
3. Ln-s / usr/bin/gcc-4.4 / usr/bin/gcc
After that, gcc-- version, you will find that gcc has become 4.4.7.
Error 3:
?
one
two
three
four
five
[exec] / bin/bash. / libtool-- tag=CC-- mode=link gcc-g-Wall-fPIC-O2-M64-g-O2-version-info 0:1:0-L/usr/local//lib-o libhadoopsnappy.la-rpath / usr/local/lib src/org/apache/hadoop/io/compress/snappy/SnappyCompressor.lo src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.lo-ljvm-ldl
[exec] / usr/bin/ld: cannot find-ljvm
[exec] collect2: ld returned 1 exit status
[exec] make: * * [libhadoopsnappy.la] error 1
[exec] libtool: link: gcc-shared-fPIC-DPIC src/org/apache/hadoop/io/compress/snappy/.libs/SnappyCompressor.o src/org/apache/hadoop/io/compress/snappy/.libs/SnappyDecompressor.o-L/usr/local//lib-ljvm-ldl-O2-M64-O2-Wl,-soname-Wl,libhadoopsnappy.so.0-o. Libs / libhadoopsnappy.so.0.0.1
Solution: if you search, you will find that there are many blogs like usr/bin/ld: cannot find-lxxx online, but here, I tell you, none of them apply. Because there is nothing missing here, nor is it the wrong version, it is because there is no libjvm.so symbolic link for installing jvm to usr/local/lib. If your system is amd64, you can go to / root/bin/jdk1.6.0_37/jre/lib/amd64/server/ to see where libjvm.so link goes, and modify it here as follows:
The ln-s / root/bin/jdk1.6.0_37/jre/lib/amd64/server/libjvm.so / usr/local/lib/ problem can be solved.
After the ③ hadoop snappy source code is compiled successfully, under the target package, you will have the following files:
?
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
[root@master snappy-hadoop] # cd target/
[root@master target] # ll
Total 928
Drwxr-xr-x. 2 root root 4096 Jan 13 19:42 antrun
Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 archive-tmp
Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 classes
-rw-r--r--. 1 root root 168 Jan 13 19:44 copynativelibs.sh
Drwxr-xr-x. 4 root root 4096 Jan 13 19:42 generated-sources
-rw-r--r--. 1 root root 11526 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT.jar
-rw-r--r--. 1 root root 337920 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT-Linux-amd64-64.tar
Drwxr-xr-x. 3 root root 4096 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT-tar
-rw-r--r--. 1 root root 180661 Jan 13 19:44 hadoop-snappy-0.0.1-SNAPSHOT.tar.gz
Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 maven-archiver
Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 native-build
Drwxr-xr-x. 7 root root 4096 Jan 13 19:42 native-src
Drwxr-xr-x. 2 root root 4096 Jan 13 19:44 surefire-reports
Drwxr-xr-x. 3 root root 4096 Jan 13 19:42 test-classes
-rw-r--r--. 1 root root 365937 Jan 13 19:44 test.txt.snappy
[root@master target] #
4. Hadoop Snappy installation and configuration process and verification on Hadoop
This process is also relatively complicated, and more configuration points should be carefully arranged:
Decompress ① step 3: hadoop-snappy-0.0.1-SNAPSHOT.tar.gz under target. After decompression, copy the lib file.
?
one
Cp-r / root/modules/snappy-hadoop/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 Compact * $HADOOP_HOME/lib/native/Linux-amd64-64 /
② copies the hadoop-snappy-0.0.1-SNAPSHOT.jar under step 3 target to $HADOOP_HOME/lib.
③ configure hadoop-env.sh, add:
?
one
Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
④ configures mapred-site.xml, and all the compression-related configuration options in this file are:
?
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
twenty-five
twenty-six
twenty-seven
twenty-eight
twenty-nine
thirty
thirty-one
thirty-two
thirty-three
thirty-four
thirty-five
thirty-six
thirty-seven
Mapred.output.compress
False
Should the job outputs be compressed?
Mapred.output.compression.type
RECORD
If the job outputs are to compressed as SequenceFiles, how should
They be compressed? Should be one of NONE, RECORD or BLOCK.
Mapred.output.compression.codec
Org.apache.hadoop.io.compress.DefaultCodec
If the job outputs are compressed, how should they be compressed?
Mapred.compress.map.output
False
Should the outputs of the maps be compressed before being
Sent across the network. Uses SequenceFile compression.
Mapred.map.output.compression.codec
Org.apache.hadoop.io.compress.DefaultCodec
If the map outputs are compressed, how should they be
Compressed?
Just configure it according to your own needs. To facilitate verification, we only configure the map part:
?
one
two
three
four
five
six
seven
eight
Mapred.compress.map.output
True
Mapred.map.output.compression.codec
Org.apache.hadoop.io.compress.SnappyCodec
⑤ restarts hadoop. To verify success, upload a text file to hdfs, type in some phrases, and run the wordcount program. If the map part is 100% complete, it means that our hadoop snappy installation is successful.
Because hadoop doesn't provide a util.CompressionTest class like HBase (or I can't find it), I have to test it this way. Next, the configuration process for HBase using Snappy is listed in detail.
5. HBase configuration Snappy and verification
After successfully configuring Snappy on Hadoop, it is relatively easier to configure on HBase.
① configures the lib file in HBase lib/native/Linux-amd64-64 /. The lib file in HBase, that is, all the lib files under / root/modules/snappy-hadoop/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64 / in step 3, and the lib file of hadoop under $HADOOP_HOME/lib/native/Linux-amd64-64 / in Hadoop (most of the articles I have seen about snappy mention this). For simplicity, we just need to copy all the lib files under $HADOOP_HOME/lib/native/Linux-amd64-64 / to the corresponding HBase directory:
?
one
Cp-r $HADOOP_HOME/lib/native/Linux-amd64-64 Compact * $HBASE_HOME/lib/native/Linux-amd64-64 /
② configure HBase environment variable hbase-env.sh
?
one
two
Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
Export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
③ restarts HBase.
④ verifies that the installation is successful
First, use CompressionTest to see if snappy is enabled and can loaded successfully:
Hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://192.168.205.5:9000/output/part-r-00000 snappy
Where / output/part-r-00000 is the output of wordcount when we verify hadoop snappy.
The result after executing the command is:
?
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
[root@master ~] # hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://192.168.205.5:9000/output/part-r-00000 snappy
13-01-13 21:59:24 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
13-01-13 21:59:24 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13-01-13 21:59:24 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.
13-01-13 21:59:24 DEBUG util.FSUtils: Creating file:hdfs://192.168.205.5:9000/output/part-r-00000with permission:rwxrwxrwx
13-01-13 21:59:24 WARN snappy.LoadSnappy: Snappy native library is available
13-01-13 21:59:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13-01-13 21:59:24 INFO snappy.LoadSnappy: Snappy native library loaded
13-01-13 21:59:24 INFO compress.CodecPool: Got brand-new compressor
13-01-13 21:59:24 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
13-01-13 21:59:24 INFO compress.CodecPool: Got brand-new decompressor
SUCCESS
Indicates that the Snappy installation has been enable and can be successfully loaded.
⑤ then creates and manipulates the table in Snappy compressed format
?
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
[root@master ~] # hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.94.2, r1395367, Sun Oct 7 19:11:01 UTC 2012
/ / create a table
Hbase (main): 001tsnappy', 0 > create 'tsnappy', {NAME = >' fallow, COMPRESSION = > 'snappy'}
0 row (s) in 10.6590 seconds
/ / describe table
Hbase (main): 002 describe 0 > tsnappy'
DESCRIPTION ENABLED
{NAME = > 'tsnappy', FAMILIES = > [{NAME = >' fallow, DATA_BLOCK_ENCODING = > 'NONE', BLOOMFILTER = >' NONE', REPLICATION_ true
SCOPE = > '0mm, VERSIONS = >' 34th, COMPRESSION = > 'SNAPPY', MIN_VERSIONS = >' 0mm, TTL = > '2147483647, KEEP_DELETED_CE
LLS = > 'false', BLOCKSIZE = >' 65536', IN_MEMORY = > 'false', ENCODE_ON_DISK = >' true', BLOCKCACHE = > 'true'}]}
1 row (s) in 0.2140 seconds
/ / put data
Hbase (main): 003row1', 0 > put 'tsnappy',' row1', 'foul col1colors, 'value'
0 row (s) in 0.5190 seconds
/ / scan data
Hbase (main): 004scan 0 > tsnappy'
ROW COLUMN+CELL
Row1 column=f:col1, timestamp=1358143780950, value=value
1 row (s) in 0.0860 seconds
Hbase (main): 005VR 0 >
All the above procedures have been successfully performed, indicating that Snappy has been successfully configured on Hadoop and HBase.
6. How to deploy all nodes in a cluster
This step is very simple, especially if you have configured a Hadoop cluster. You just need to distribute all the configured files above to the appropriate directories of all other nodes, including the snappy link libraries under the generated / usr/lib/local.
At this point, the study on the "detailed tutorial on Hadoop HBase configuration and installation of Snappy" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.