Hadoop research 04/29 Update SLTechnology News&Howtos

Hadoop research

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Package download

Http://archive.cloudera.com/cdh5/cdh/4/

Http://apache.fayea.com/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz

Http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz

Http://apache.opencas.org/hbase/1.2.0/hbase-1.2.0-bin.tar.gz

Http://download.oracle.com/otn-pub/java/jdk/8u73-b02/jdk-8u73-linux-x64.tar.gz

Environment

10.200.140.58 hadoop-308.99bill.com # physical machine datanode zookeeper regionserver

10.200.140.59 hadoop-309.99bill.com # physical machine datanode zookeeper regionserver

10.200.140.60 hadoop-310.99bill.com # physical machine datanode zookeeper regionserver

10.200.140.45 hadoop-311.99bill.com # Virtual machine master

10.200.140.46 hadoop-312.99bill.com # Virtual machine second hmaster

Modify the hostname to disable ipv6

Cat / etc/profile

Export JAVA_HOME=/opt/jdk1.7.0_80/

PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Export JAVA_HOME

Export PATH

Export CLASSPATH

HADOOP_BASE=/opt/oracle/hadoop

HADOOP_HOME=/opt/oracle/hadoop

YARN_HOME=/opt/oracle/hadoop

PATH=$HADOOP_BASE/bin:$PATH

Export HADOOP_BASE PATH

10.200.140.45 can log in without secret.

[oracle@hadoop-311 hadoop] $cat core-site.xml

Fs.defaultFS

Hdfs://hadoop-311.99bill.com:9000

Io.file.buffer.size

16384

[oracle@hadoop-311 hadoop] $cat hdfs-site.xml

Dfs.replication

three

Dfs.namenode.name.dir

/ opt/hadoop/name

Dfs.datanode.data.dir

/ opt/hadoop/data/dfs

Dfs.datanode.handler.count

one hundred and fifty

Dfs.blocksize

64m

Dfs.datanode.du.reserved

1073741824

True

Dfs.hosts.exclude

/ opt/oracle/hadoop/etc/hadoop/slave-deny-list

Dfs.namenode.http-address

Hadoop-311.99bill.com:50070

Dfs.namenode.secondary.http-address

Hadoop-312.99bill.com:50090

Dfs.permissions

False

[oracle@hadoop-311 hadoop] $cat mapred-site.xml

Mapreduce.framework.name

Yarn

Mapreduce.map.memory.mb

4000

Mapreduce.reduce.memory.mb

4000

Define datanode

[oracle@hadoop-311 hadoop] $cat slaves

Hadoop-308.99bill.com

Hadoop-309.99bill.com

Hadoop-310.99bill.com

Hadoop-env.sh

Export HADOOP_LOG_DIR=$HADOOP_HOME/logs

Export HADOOP_PID_DIR=/opt/oracle/hadoop

Export HADOOP_SECURE_DN_PID_DIR=/opt/oracle/hadoop

Export JAVA_HOME=/opt/jdk1.7.0_80/

Export HADOOP_HEAPSIZE=6000

Exec_time= `date +'% Y% m% dmurf% H% M% S``

Export HADOOP_NAMENODE_OPTS= "- Xmx6g ${HADOOP_NAMENODE_OPTS}"

Export HADOOP_SECONDARYNAMENODE_OPTS= "- Xmx6g ${HADOOP_SECONDARYNAMENODE_OPTS}"

Export HADOOP_DATANODE_OPTS= "- server-Xmx6000m-Xms6000m-Xmn1000m-XX:PermSize=128M-XX:MaxPermSize=128M-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-Xloggc:$HADOOP_LOG_DIR/gc-$ (hostname)-datanode-$ {exec_time} .log-XX:+UseParNewGC-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:+UseCMSCompactAtFullCollection-XX:CMSFullGCsBeforeCompaction=10-XX:+CMSClassUnloadingEnabled-XX:+CMSParallelRemarkEnabled-XX:+UseCMSInitiatingOccupancyOnly-XX:TargetSurvivorRatio=90-XX:MaxTenuringThreshold=20"

[oracle@hadoop-311 hadoop] $cat yarn-site.xml

Yarn.resourcemanager.address

Hadoop-311.99bill.com:8032

Yarn.resourcemanager.scheduler.address

Hadoop-311.99bill.com:8030

Yarn.resourcemanager.resource-tracker.address

Hadoop-311.99bill.com:8031

Yarn.resourcemanager.admin.address

Hadoop-311.99bill.com:8033

Yarn.resourcemanager.webapp.address

Hadoop-311.99bill.com:8088

Yarn.nodemanager.aux-services

Mapreduce.shuffle

Start the hadoop cluster

For the first time, you need to format the namenode, and you don't need to do this step to start it later.

Hadoop/bin/hadoop-format

Then start hadoop

Hadoop/sbin/start-all.sh

After startup, if there are no errors, execute jps to query the current process. NameNode is the Hadoop Master process and SecondaryNameNode,ResourceManager is the Hadoop process.

[oracle@hadoop-311 hadoop] $jps

13332 Jps

5430 NameNode

5719 ResourceManager

III. ZooKeeper cluster installation

1. Extract zookeeper-3.4.8.tar.gz and rename zookeeper, enter the zookeeper/conf directory, cp zoo_sample.cfg zoo.cfg and edit

[oracle@hadoop-308 conf] $cat zoo.cfg

# The number of milliseconds of each tick

TickTime=2000

MaxClientCnxns=0

# The number of ticks that the initial

# synchronization phase can take

InitLimit=50

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

SyncLimit=5

# the directory where the snapshot is stored.

# number of reserved snapshots

Autopurge.snapRetainCount=2

# Purge task interval in hours

# interval between cleaning up snapshots (hours)

Autopurge.purgeInterval=84

DataDir=/opt/hadoop/zookeeperdata

# the port at which the clients will connect

ClientPort=2181

Server.1=hadoop-308:2888:3888

Server.2=hadoop-309:2888:3888

Server.3=hadoop-310:2888:3888

two。 Create and edit a myid file

one

Mkdir / opt/hadoop/zookeeperdata

Echo "1" > / opt/hadoop/zookeeperdata/myid

3. Then synchronize the zookeeper to the other two nodes, and then modify the myid to the corresponding number on the other nodes.

Start zookeeper

Cd / opt/oracle/zookeeper

. / bin/zkServer.sh start

[oracle@hadoop-308 tools] $jps

11939 Jps

4373 DataNode

8579 HRegionServer

IV. Installation and configuration of HBase cluster

1. Extract hbase-1.2.0-bin.tar.gz and rename it to hbase, edit / hbase/conf/hbase-env.sh

Export HBASE_MANAGES_ZK=false

Export HBASE_HEAPSIZE=4000

Export JAVA_HOME=/opt/jdk1.7.0_80/

[oracle@hadoop-311 conf] $cat hbase-site.xml

Hbase.rootdir

Hdfs://hadoop-311:9000/hbase

The directory shared by region servers.

Hbase.cluster.distributed

True

Hbase.master.port

60000

Hbase.master

Hadoop-312

Hbase.zookeeper.quorum

Hadoop-308,hadoop-309,hadoop-310

Hbase.regionserver.handler.count

three hundred

Hbase.hstore.blockingStoreFiles

seventy

Zookeeper.session.timeout

60000

Hbase.regionserver.restart.on.zk.expire

True

Zookeeper session expired will force regionserver exit.

Enable this will make the regionserver restart.

Hbase.replication

False

Hfile.block.cache.size

0.4

Hbase.regionserver.global.memstore.upperLimit

0.35

Hbase.hregion.memstore.block.multiplier

eight

Hbase.server.thread.wakefrequency

one hundred

Hbase.master.distributed.log.splitting

False

Hbase.regionserver.hlog.splitlog.writer.threads

three

Hbase.client.scanner.caching

ten

Hbase.hregion.memstore.flush.size

134217728

Hbase.hregion.memstore.mslab.enabled

True

Hbase.coprocessor.user.region.classes

Org.apache.hadoop.hbase.coprocessor.AggregateImplementation

Dfs.datanode.max.xcievers

2096

PRIVATE CONFIG VARIABLE

Distribute hbase to the other 4 nodes

Start the cluster

1. Start zookeeper

one

Zookeeper/bin/zkServer.sh start

two。 Start Hadoop

$hadoop/sbin/start-all.sh

Modify hbase/conf/hbase-site.xml

[oracle@hadoop-311 conf] $cat hbase-site.xml

Hbase.rootdir

Hdfs://hadoop-311:9000/hbase

The directory shared by region servers.

Hbase.cluster.distributed

True

Hbase.master.port

60000

Hbase.master

Hadoop-312

Hbase.zookeeper.quorum

Hadoop-308,hadoop-309,hadoop-310

Hbase.regionserver.handler.count

three hundred

Hbase.hstore.blockingStoreFiles

seventy

Zookeeper.session.timeout

60000

Hbase.regionserver.restart.on.zk.expire

True

Zookeeper session expired will force regionserver exit.

Enable this will make the regionserver restart.

Hbase.replication

False

Hfile.block.cache.size

0.4

Hbase.regionserver.global.memstore.upperLimit

0.35

Hbase.hregion.memstore.block.multiplier

eight

Hbase.server.thread.wakefrequency

one hundred

Hbase.master.distributed.log.splitting

False

Hbase.regionserver.hlog.splitlog.writer.threads

three

Hbase.client.scanner.caching

ten

Hbase.hregion.memstore.flush.size

134217728

Hbase.hregion.memstore.mslab.enabled

True

Hbase.coprocessor.user.region.classes

Org.apache.hadoop.hbase.coprocessor.AggregateImplementation

Dfs.datanode.max.xcievers

2096

PRIVATE CONFIG VARIABLE

Hbase-env.sh

Export JAVA_HOME=/opt/jdk1.7.0_80/

Export HBASE_CLASSPATH=/opt/oracle/hadoop/conf

Export HBASE_HEAPSIZE=4000

Export HBASE_OPTS= "- XX:PermSize=512M-XX:MaxPermSize=512M-XX:+UseParNewGC-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:+UseCMSCompactAtFullCollection-XX:CMSFullGCsBeforeCompaction=10-XX:+CMSClassUnloadingEnabled-XX:+CMSParallelRemarkEnabled-XX:+UseCMSInitiatingOccupancyOnly-XX:TargetSurvivorRatio=90-XX:MaxTenuringThreshold=20"

Exec_time= `date +'% Y% m% dmurf% H% M% S``

Export HBASE_MASTER_OPTS= "- Xmx4096m-Xms4096m-Xmn128m-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-Xloggc:$HBASE_HOME/logs/gc-$ (hostname)-master-$ {exec_time} .log"

Export HBASE_REGIONSERVER_OPTS= "- Xmx8192m-Xms8192m-Xmn512m-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-Xloggc:$HBASE_HOME/logs/gc-$ (hostname)-regionserver-$ {exec_time} .log"

Export HBASE_MANAGES_ZK=fals

[oracle@hadoop-311 conf] $cat regionservers

Hadoop-308

Hadoop-309

Hadoop-310

Distributed to the other four.

Cd / opt/oracle/hbase

Sh bin/start-hbase.sh

[oracle@hadoop-311 bin] $. / hbase shell

16-03-23 20:20:47 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.94.15-cdh5.7.1, r, Tue Nov 18 08:42:59 PST 2014

Hbase (main): 001VR 0 > status

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/oracle/hbase/lib/slf4j-log4j12-1.6.1.jar _ r _ r]

SLF4J: Found binding in [jar:file:/opt/oracle/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar _ r _ r]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

16-03-23 20:20:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable

3 servers, 0 dead, 0.6667 average load

10. common problem

10.1. Abnormal Namenode shutdown

Use the jps command on all hadoop environment machines, list all processes, then kill, and start in startup order

10.2. Abnormal Datanode shutdown

Start HDFS on namenode

Run hadoop/bin/start-all.sh

If Datanode is also zookeeper, you also need to start zookeeper

Run zookeeper/bin/zkServer.sh start on this datanode.

Start Hbase on namenode

Run hbase/bin/start-hbase.sh

Http://10.200.140.46:60010/master-status

10.3. Stop a non-master server

Run on this server:

Hadoop/bin/hadoop-daemon.sh stop datanode

Hadoop/bin/hadoop-daemon.sh stop tasktracker

Hbase/bin/hbase-daemon.sh stop regionserver

After http://10.200.140.45:50070/dfshealth.jsp checks to see if the node has become dead nodes and dead nodes, you can stop the server

When the service is stopped, the screenshot you see is as follows:

When the service is stopped successfully, the screenshot shown is as follows:

After restarting the server, run on hadoop001 and start the service:

Hadoop/bin/start-all.sh

Hbase/bin/start-hbase.sh

11. Monitoring port

11.1. Namenode Monitoring Port (hadoop001):

60010,60000,50070,50030,9000,9001,10000

11.2. Zookeeper Monitoring Port (hadoop003,hadoop004,hadoop005)

2181

11.3. Datanode Monitoring Port (hadoop003,hadoop004,hadoop005,hadoop006,hadoop007)

60030,50075

12. The problem that HDFS uploads files is uneven and balancer is too slow

Hmaster has a start-balancer.sh.

# Migration solution

First prepare a new hadoop environment in the new computer room.

# hadoop Migration-hbase

1 make sure that the new hbase is working properly and that the machines between the two clusters can access each other to the ok using the machine name

2 stop the new hbase ok

3 run the following command on any hadoop machine in two clusters

. / hadoop distcp-bandwidth 10-m 3 hdfs://hadoop001.99bill.com:9000/hbase/if_fss_files hdfs://hadoop-312.99bill.com:9000/hbase/if_fss_files

4 script using attachments, run

Hbase org.jruby.Main ~ / add_table.rb / hbase/if_fss_files

5 start a new hbase

# hadoop Migration-hadoop data Migration

# organize hadoop files and repackage those that fail to be packed

Such as 2014-07-24

. / hdfs dfs-rm-r / fss/2014-07-24

. / hdfs dfs-rm-r / fss/2014-07-24.har

. / hdfs dfs-mv / fss/2014-07-24a.har / fss/2014-07-24.har

# # synchronizing from the remote fss system to the local computer room

. / hdfs dfs-copyToLocal hdfs://hadoop001.99bill.com:9000/fss/2015-04-08.har / opt/sdb/hadoop/tmp/

# Import fss system locally from the new computer room

. / hdfs dfs-copyFromLocal / opt/sdb/hadoop/tmp/2015-04-08.har / fss/

Sleep 5

. / hdfs dfs-copyFromLocal / opt/sdb/hadoop/tmp/2015-06 opt/sdb/hadoop/tmp/2015 03-30.har / fss/2015-06

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.