Deploy hadoop for production under centos6.5 and connect to hadoop using C language API 04/25 Update SLTechnology News&Howtos

Deploy hadoop for production under centos6.5 and connect to hadoop using C language API

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

# install hadoop2.6.0 fully distributed cluster

# File and system version:

Hadoop-2.6.0

Java version 1.8.0_77

Centos 64 bit

# preparation

Under / home/hadoop/: mkdir Cloud

Put the java and hadoop installation packages under / home/hadoop/Cloud

# configure static ip

Master192.168.116.100

Slave1192.168.116.110

Slave2192.168.116.120

# modify machine-related names (all under root permission)

Su root

Vim / etc/hosts

Enter: (space + tab key) under the original information

192.168.116.100 master

192.168.116.110 slave1

192.168.116.120 slave2

Vim / etc/hostname

Master

Shutdown-r now (restart the machine)

Vim / etc/hostname

Slave1

Shutdown-r now

Vim / etc/hostname

Slave2

Shutdown-r now

# install openssh

Su root

Yum install openssh

Ssh-keygen-t rsa

And then confirmed all the time.

Send the public keys of slave1 and slave2 to master:

Scp / home/hadoop/.ssh/id_rsa.pub hadoop@master:~/.ssh/slave1.pub

Scp / home/hadoop/.ssh/id_rsa.pub hadoop@master:~/.ssh/slave2.pub

Under master: cd .ssh /

Cat id_rsa.pub > > authorized_keys

Cat slave1.pub > > authorized_keys

Cat slave2.pub > > authorized_keys

Send the public key package to slave1 and slave2:

Scp authorized_keys hadoop@slave1:~/.ssh/

Scp authorized_keys hadoop@slave2:~/.ssh/

Ssh slave1

Ssh slave2

Ssh master

Corresponding input yes

Here, ssh password-less login configuration is complete.

# Design JAVA_HOME HADOOP_HOME

Su root

Vim / etc/profile

Enter:

Export JAVA_HOME=/home/hadoop/Cloud/jdk1.8.0_77

Export JRE_HOME=$JAVA_HOME/jre

Export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Export HADOOP_HOME=/home/hadoop/Cloud/hadoop-2.6.0

Export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Then source / etc/profile

(all three should be configured)

# configure hadoop file

Under / home/hadoop/Cloud/hadoop-2.6.0/sbin:

Vim hadoop-daemon.sh

Modify the path of pid

Vim yarn-daemon.sh

Modify the path of pid

Under / home/hadoop/Cloud/hadoop-2.6.0/etc:

Vim slaves input:

Master

Slave1

Slave2

Vim hadoop-env.sh input:

Export JAVA_HOME=/home/hadoop/Cloud/jdk1.8.0_77

Export HADOOP_HOME_WARN_SUPPRESS= "TRUE"

Vim core-site.xml input:

# core

Io.native.lib.avaliable

True

Fs.default.name

Hdfs://master:9000

True

Hadoop.tmp.dir

/ home/hadoop/Cloud/workspace/temp

# core

Vim hdfs-site.xml

# hdfs

Dfs.replication

three

Dfs.permissions

False

Dfs.namenode.name.dir

/ home/hadoop/Cloud/workspace/hdfs/data

True

Dfs.namenode.dir

/ home/hadoop/Cloud/workspace/hdfs/name

Dfs.datanode.dir

/ home/hadoop/Cloud/workspace/hdfs/data

Dfs.webhdfs.enabled

True

# hdfs

Vim mapred-site.xml

# # mapred

Mapred.job.tracker

Master:9001

# # mapred

Send the configured hadoop to slave1 and slave2

Scp-r hadoop-2.6.0 hadoop@slave1:~/Cloud/

Scp-r hadoop-2.6.0 hadoop@slave2:~/Cloud/

Send the Java package to slave1 and slave2:

Scp-r jdk1.8.0_77 hadoop@slave1:~/Cloud/

Scp-r jdk1.8.0_77 hadoop@slave2:~/Cloud/

At this point, the hadoop cluster configuration is complete

# you can now start hadoop

Format namenode first

Hadoop namenode-format (because of the hadoop-env.sh and system environment designed earlier, it can be executed in any directory)

If it's right to check the log, go down.

Start-all.sh

And then

For complete information, check it out through jps:

[hadoop@master ~] $jps

42306 ResourceManager

42407 NodeManager

42151 SecondaryNameNode

41880 NameNode

41979 DataNode

[hadoop@slave1 ~] $jps

21033 NodeManager

20926 DataNode

[hadoop@slave2 ~] $jps

20568 NodeManager

20462 DataNode

At this point, the hadoop-2.6.0 fully distributed configuration is complete.

The following is the browser port number of hadoop:

Localhost:50070

Localhost:8088

# configure C's API connection HDFS

Find /-name libhdfs.so.0.0.0

Vi / etc/ld.so.conf

Write:

/ home/hadoop/Cloud/hadoop-2.6.0/lib/native/

/ home/hadoop/Cloud/jdk1.8.0_77/jre/lib/amd64/server/

Then design the boot load:

/ sbin/ldconfig-v

Then configure the environment variables:

Find and print:

Find / home/hadoop/Cloud/hadoop-2.6.0/share/-name * .jar | awk'{printf ("export CLASSPATH=%s:$CLASSPATH\ n", $0);}'

You will see printed content such as:

Export CLASSPATH=/home/hadoop/Cloud/hadoop-2.6.0/share/hadoop/common/lib/activation-1.1.jar:$CLASSPATH

Export CLASSPATH=/home/hadoop/Cloud/hadoop-2.6.0/share/hadoop/common/lib/jsch-0.1.42.jar:$CLASSPATH

Add everything printed to the environment variable vim / etc/profile

Then write C language code to verify that the configuration is successful:

Vim above_sample.c

The code is as follows:

# include "hdfs.h"

# include

Int main (int argc, char * * argv) {

HdfsFS fs = hdfsConnect ("192.168.116.100", 9000); / / A little modification has been made here

Const char* writePath = "/ tmp/testfile.txt"

HdfsFile writeFile = hdfsOpenFile (fs,writePath, O_WRONLY | O_CREAT, 0,0,0)

If (! writeFile) {

Fprintf (stderr, "Failed toopen% s for writing!\ n", writePath)

Exit (- 1)

}

Char* buffer = "Hello,World!"

TSize num_written_bytes = hdfsWrite (fs,writeFile, (void*) buffer, strlen (buffer) + 1)

If (hdfsFlush (fs, writeFile)) {

Fprintf (stderr, "Failed to'flush'% s\ n", writePath)

Exit (- 1)

}

HdfsCloseFile (fs, writeFile)

}

Compile the C language code:

Gcc above_sample.c-I / home/hadoop/Cloud/hadoop-2.6.0/include/-L / home/hadoop/Cloud/hadoop-2.6.0/lib/native/-lhdfs / home/hadoop/Cloud/jdk1.8.0_77/jre/lib/amd64/server/libjvm.so-o above_sample

Perform the compilation to complete the generated above_sample file:

. / above_sample

Check to see if the log and hadoop file directories generate testfile files

At this point, the API connection HDFS configuration of C language is complete.

# File operation of cluster

# (automatically distribute scripts) auto.sh

Vim auto.sh

Chmod + x auto.sh

. / auto.sh jdk1.8.0_77 ~ / Cloud/

Automatically distribute scripts

# #

#! / bin/bash

Nodes= (slave1 slave2)

Num=$ {# nodes [@]}

File=$1

Dst_path=$2

For ((iTun0) I test1.txt imports all files from the current directory into hdfs's in directory: hadoop dfs-put / inhadoop dfs-ls / in/*hadoop dfs-cp / in/test1.txt / in/test1.txt.bakhadoop dfs-ls / in/*hadoop dfs-rm / in/test1.txt.bakmkdir dir_from_hdfs to dir_from_hdfs: hadoop dfs-get / in/* / dir_from_hdfscd / home/hadoop/Cloud / hadoop-1.2.1 is separated by spaces Count the number of words in all text files in the in directory (note that the output/wordcount directory cannot be an existing directory): hadoop jar hadoop-examples-2.6.0.jar wordcount in / output/wordcount view the statistical results: hadoop fs-cat output/wordcount/part-r-00000# management # 1. Cluster-related management: edit log: modify the log. When the file system client client writes, we will put this record in the modification log. After recording the modification log, NameNode modifies the data structure in memory. Before each write is successful, the edit log is synchronized to the file system fsp_w_picpath: the namespace mirror, which is the checkpoint of the in-memory metadata on the hard disk. When NameNode fails, the metadata information for the latest checkpoint is loaded into memory from fsp_w_picpath, and then note that the operation in the modification log is re-performed. Secondary NameNode is used to help metadata nodes checkpoint the metadata information in memory to the hard disk. two。 Cluster attributes: advantages: 1) ability to handle very large files; 2) streaming access to data. HDFS can handle the task of "write once, read and write multiple times" well. That is, once a dataset is generated, it is copied to different storage nodes and then responds to a variety of data analysis task requests. In most cases, the analysis task involves most of the data in the dataset. Therefore, the HDFS request to read the entire dataset is more efficient than reading a single record. Disadvantages: 1) not suitable for low-latency data access: HDFS is designed to handle large data set analysis tasks, mainly to achieve big data analysis, so the delay time may be high. 2) unable to store a large number of small files efficiently: because Namenode places the metadata of the file system in memory, the number of files that the file system can hold is determined by the memory size of Namenode. 3) Multi-user writing and arbitrary modification of files are not supported: there is only one writer in a file in HDFS, and the write operation can only be completed at the end of the file, that is, only append operations can be performed. At present, HDFS does not support multiple users to write to the same file and modify it anywhere in the file.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.