The seventh of big data's learning series-Hadoop+Spark+Zookeeper+HBase+Hive collection 07/01 Update SLTechnology News&Howtos

The seventh of big data's learning series-Hadoop+Spark+Zookeeper+HBase+Hive collection

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Introduction in the previous big data learning series, set up the Hadoop+Spark+HBase+Hive environment and some tests. In fact, when I started to learn from big data, I built a cluster, not stand-alone mode and pseudo-distributed mode. As for why to write the construction of a stand-alone machine first, it is because as an individual study, a single machine is enough. Well, to tell you the truth, it is not good for your own computer, and it is too slow to use a virtual machine. The whole cluster is built in the company's test service, encountered a variety of holes in the construction, of course, but also gained a lot. After successfully building the big data cluster, I took notes sporadically, and then reorganized these notes. So I have this blog post. In fact, when I built it, I didn't build it step by step. I changed a lot more or less in the middle, and I experimented with a cluster that didn't have any problems at present. At the time of writing this article, we began to prepare to build the environment on one machine, and then transfer it all to other machines. But when you think about it, it may be quick to build the environment, but it's not very friendly for the reader. So it was split out, and it was possible to build one alone. All right, cut the crap. The tutorials are as follows. Catalogue

[TOC]

First, environment selection 1, cluster machine installation diagram

This time, because it is built by a cluster, I use a table to describe the environment configuration. The cluster uses three machines, namely master, slave1, and slave2, and the master-slave relationship can be known by naming. The operating system used is CentOS6.8. The specific configuration of each machine installation is shown in the following table:

The configuration of each machine is shown above. I need to add that I don't use the official spark for spark, but I use the compiled version of spark and hive. Because later, when using hive queries, you do not want to use hive's default mr, and after hive2.x, officials do not recommend it. Because using mr is too inefficient, I will replace the engine of hive with spark later, and I don't want to recompile spark, so I use this version. If you want to compile on your own, or have a higher version, you don't have to follow the above. There is also the storage path, there is no need to use the above, you can first use df-h on the machine to check the corresponding disk space, and then deploy.

2. Configuration description: JDK: the configuration that Hadoop and Spark depend on. It is officially recommended that the JDK version be 1.7 or above! The configuration that Scala:Spark depends on. The recommended version is not lower than that of spark. Hadoop: is a distributed system infrastructure. Spark: a tool for processing distributed storage by big data. Zookeeper: distributed application coordination service, required by HBase cluster. HBase: a distributed storage system for structured data. Hive: a data warehouse tool based on Hadoop, the current default Metabase is mysql. 3, download address

Official address:

Hadoop:

Http://www.apache.org/dyn/closer.cgi/hadoop/common

Spark:

Http://spark.apache.org/downloads.html

Spark Sql on Hive

Http://mirror.bit.edu.cn/apache/spark

Scala:

Http://www.scala-lang.org/download

JDK:

Http://www.oracle.com/technetwork/java/javase/downloads

HBase

Http://mirror.bit.edu.cn/apache/hbase/

Zookeeper

Http://mirror.bit.edu.cn/apache/zookeeper/

Hive

Http://mirror.bit.edu.cn/apache/hive/

Baidu Cloud:

Link: https://pan.baidu.com/s/1kUYfDaf password: o1ov

Second, cluster related configuration 1, hostname change and host and IP to do related mapping 1. Change the hostname

Note: change the hostname to facilitate cluster management, otherwise it would not be good for each machine to be named localhost! All machines in the cluster do this.

Input

Vim / etc/sysconfig/network

Modify the localhost.localdomain to the name you want to change, each with a different name

For example:

HOSTNAME=master Note: the restart will not take effect until the name has been changed by typing reboot. two。 Mapping the relationship between the host and IP

Modify the hosts file to do relational mapping

Description: each machine does this configuration, and the specific ip and host name shall prevail.

Enter:

Vim / etc/hosts

Add

192.169.0.23 master192.169.0.24 slave1192.169.0.25 slave2

Description: after a machine has been added, you can use the scp command or use ftp to copy this file to another machine.

Example of the scp command:

Scp-r / etc/hosts root@192.169.0.24:/etc2,ssh login-free

Setting ssh password-free login is for ease of operation

Generate a secret key file

Execute it on every machine.

First enter:

Ssh-keygen-t rsa-P''

After the secret key is generated, each machine / root/.ssh is stored in a file with the same content, the file name is authorized_keys, and the file content is the public key we just generated for the three machines. It can be generated on one machine and then copied to another machine.

Create a new authorized_keys file

Enter:

Touch / root/.ssh/authorized_keys

Edit authorized_keys and copy the keys from other machines

Cat / root/.ssh/id_rsa.pubvim / root/.ssh/authorized_keys

Copy the contents of id_rsa.pub from other machines to the file authorized_keys.

The first machine:

The second machine:

The third machine:

The contents of the final authorized_keys file

Copy the final authorized_keys file to the / root/.ssh directory on another machine. You can use either scp or ftp.

Example of the scp command:

Scp-r / root/.ssh/authorized_keys root@192.169.0.24:/root/.ssh

Test password-free login

Enter:

Ssh slave1ssh slave2

Enter exit to exit

3. Firewall is off

Description: in fact, it is possible to set permissions without turning off the firewall, but in order to facilitate access, the firewall is turned off. Every machine does it!

Command to turn off the firewall

Stop the firewall:

Service iptables stop

Start the firewall:

Service iptables start

Restart the firewall:

Service iptables restart

Permanently turn off the firewall:

Chkconfig iptables off

4, time configuration

The machine time on the cluster needs to be synchronized, because the one on my side is a virtual machine, so I don't need it.

The NTP service can be used to set up cluster time synchronization.

For more information, please refer to http://blog.csdn.net/to_baidu/article/details/52562574

5, shortcut key settings (optional)

Description: because you often switch between directories, you set up aliases in order to be lazy. You only need to type the alias in linux and you can execute the commands after the alias, which is quite convenient. For example, our commonly used ll is an alias for ls-l. You can find out for yourself about the alias.

Enter:

Vim / .bashrc

Add the following

# Some more ailasesalias chd='cd / opt/hadoop/hadoop2.8'alias chb='cd / opt/hbase/hbase1.2'alias chi='cd / opt/hive/hive2.1'alias czk='cd / opt/zookeeper/zookeeper3.4'alias csp='cd / opt/spark/spark2.0-hadoop2.4-hive'alias fhadoop='/opt/hadoop/hadoop2.8/bin/hdfs namenode-format'alias starthadoop='/opt/hadoop/hadoop2.8/sbin/start-all.sh'alias stophadoop='/opt/hadoop / hadoop2.8/sbin/stop-all.sh'alias starthbase='/opt/hbase/hbase1.2/bin/start-hbase.sh'alias stophbase='/opt/hbase/hbase1.2/bin/stop-hbase.sh'alias startzk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh start'alias stopzk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh stop'alias statuszk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh status'alias startsp='/opt/spark/spark1. 6-hadoop2.4-hive/sbin/start-all.sh'alias stopsp='/opt/spark/spark1.6-hadoop2.4-hive/sbin/stop-all.sh'

Enter after successful addition

Source / .bashrc

Then enter the alias of the setting and you can execute what you set. Aliases do not have to be set according to the above, if there is a better way, please use

6. Overall environment variable setting

In the / etc/profile configuration file to add a lot of environment configuration, here will first list the overall environment configuration, you in the configuration of environment variables to your own! You can first configure the environment variables and then transfer them to other machines.

I first transferred these configurations to other machines, and they are all source, so the operation of this configuration file below is not actually done. The specific circumstances are based on your own.

# Java Configexport JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib# Scala Configexport SCALA_HOME=/opt/scala/scala2.12# Spark Configexport SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive# Zookeeper Configexport ZK_HOME=/opt/zookeeper/zookeeper3.4# HBase Configexport HBASE_HOME=/opt/hbase/hbase1.2# Hadoop Config Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" # Hive Configexport HIVE_HOME=/opt/hive/hive2.1export HIVE_CONF_DIR=$ {HIVE_HOME} / confexport PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin:$ {ZK_ HOME} / bin:$ {HBASE_HOME} / bin:$ {HIVE_HOME} / bin:$PATH II. Environment Construction of Hadoop

In advance, these configurations can be configured on one machine and then copied to other machines. Take care to make these configuration files effective after copying.

1Dist JDK configuration

Note: generally, CentOS comes with openjdk, but the hadoop cluster uses the official jdk of oracle, so uninstall the jdk of CentOS first, and then install the downloaded JDK in oracle.

First enter java-version

Check to see if JDK is installed, and if so, but the version is not suitable, uninstall it

Input

Rpm-qa | grep java

View information

Then enter:

Rpm-e-- nodeps "you want to uninstall JDK information"

Such as: rpm-e-- nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.el6.x86_64

After confirming that it is gone, extract the downloaded JDK

Tar-xvf jdk-8u144-linux-x64.tar.gz

Move to the opt/java folder, create a new one without it, and rename the folder to jdk1.8.

Mv jdk1.8.0_144 / opt/javamv jdk1.8.0_144 jdk1.8

Then edit the profile file and add the following configuration

Enter:

Vim / etc/profile

Add:

Export JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/libexport PATH=.:$ {JAVA_HOME} / bin:$PATH

After the addition is successful, enter

Source / etc/profilejava-version

Check to see if the configuration is successful

2Die Hadoop configuration 3.2.1 file preparation

Extract the downloaded Hadoop configuration file

On linux, enter:

Tar-xvf hadoop-2.8.2.tar.gz

Then move the unzipped folder to the opt/hadoop folder, create a new folder without it, and rename the folder to hadoop2.8.

Enter the move folder command on linux:

Mv hadoop-2.8.2 / opt/hadoopmv hadoop-2.8.2 hadoop2.83.2.2 environment configuration

Edit / etc/profile file

Enter:

Vim / etc/profile

Add:

Export HADOOP_HOME=/opt/hadoop/hadoop2.8 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" export PATH=.:$ {JAVA_HOME} / bin:$ {HADOOP_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

3.2.3 modify the configuration file

Modify these configuration files such as core-site.xml, hadoop-env.sh, hdfs-site.xml, mapred-site.xml, etc.

Enter the command to enter the directory at linux:

Cd / opt/hadoop/hadoop2.8/etc/hadoop3.2.3.1 modify core-site.xml

The storage path of hadoop can be changed on its own. At first I thought these folders needed to be created manually, but later in practice, if they were not created manually, they would be created automatically, so the step of creating directories manually was removed.

Enter:

Vim core-site.xml

Add the configuration within the node:

Hadoop.temp.dir file:/root/hadoop/tmp fs.defaultFS hdfs://master:9000 hadoop.proxyuser.root.hosts * hadoop.proxyuser.root.groups *

Note: fs.defaultFS is the name of the default file, originally using fs.default.name, and later found in the latest official documentation that this method has been deprecated. So I changed it to this. Ps: it doesn't make any difference.

3.2.3.2 modify hadoop-env.sh

To do this, I don't know why the relative path is not recognized, so I use the absolute path.

Set

Export JAVA_HOME=$ {JAVA_HOME}

Modified to:

Export JAVA_HOME=/opt/java/jdk1.8

Note: change it to the path of your own JDK

3.2.2.3 modify hdfs-site.xml

The storage path of the following hdfs can be changed according to your own machine.

Add the configuration within the node:

Dfs:replication 2 dfs.namenode.name.dir file:/root/hadoop/name dfs.datanode.data.dir file:/root/hadoop/data

3.5.2.4 modify mapred-site.xml

Execute the running framework configuration of mapreduce. Ps: it feels like this configuration is useless. Maybe I didn't use mr.

If the file is not mapred-site.xml, copy the mapred-site.xml.template file and rename it to mapred-site.xml.

Modify the newly created mapred-site.xml file and add the configuration to the node:

Mapreduce.framework.name yarn3.5.2.5 modifies the yarn-site.xml file

The configuration of yarn resource scheduling, which is necessary for clustering.

Modify / opt/hadoop/hadoop2.8/etc/hadoop/yarn-site.xml file

Add configuration to the node

Yarn.resourcemanager.hostname master yarn.resourcemanager.address ${yarn.resourcemanager.hostname}: 8032 The address of the scheduler interface. Yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}: 8030 The http address of the RM web application. Yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}: 8088 The https adddress of the RM web application. Yarn.resourcemanager.webapp.https.address ${yarn.resourcemanager.hostname}: 8090 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}: 8031 The address of the RM admin interface. Yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}: 8033 yarn.nodemanager.aux-services mapreduce_shuffle yarn.scheduler.maximum-allocation-mb 8182 available memory per node (in MB) Default 8182MB yarn.nodemanager.vmem-pmem-ratio 2.1 yarn.nodemanager.resource.memory-mb 2048 yarn.nodemanager.vmem-check-enabled false

Note: yarn.nodemanager.vmem-check-enabled this means to ignore the virtual memory check, if you are installed on a virtual machine, this configuration is very useful, with the subsequent operation is not easy to go wrong. If it is on a physical machine and there is enough memory, you can remove this configuration.

3.5.2.6 modify slaves

Set the configuration of the master and slave. If you do not set this, the cluster will not know the master and slave. If it is in stand-alone mode, there is no need to configure it.

Modify / opt/hadoop/hadoop2.8/etc/hadoop/slaves file

Change to

Slave1 slave2

These configurations refer to the official Hadoop documentation.

Hadoop official configuration file description: http://hadoop.apache.org/docs/r2.8.3/

After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.

Enter:

Jdk environment transmission

Scp-r / opt/java root@slave1:/optscp-r / opt/java root@slave2:/opt

Hadoop environment transmission

Scp-r / opt/hadoop root@slave1:/optscp-r / opt/hadoop root@slave2:/opt

After the transfer, the cluster is started at the primary node.

Before starting hadoop, it needs to be initialized, and this only needs to be initialized on master.

3The Hadoop starts.

Note: before starting hadoop, make sure that the firewall is turned off, each machine time passes, and ssh is free to log in.

Initialize hadoop

Change to / opt/hadoop/hadoop2.8/bin directory to enter

. / hdfs namenode-format

After successful initialization, switch to / opt/hadoop/hadoop2.8/sbin

Start hdfs and yarn for hadoop

Enter:

Start-dfs.shstart-yarn.sh

The first login will ask if you want to connect, enter yes, and then enter the password.

After the boot is successful, you can use the jps command to check whether it is successful on each machine

You can view it by typing ip+50070 and port 8088 in the browser.

If shown in the figure, the startup is successful.

If it fails, check whether the jps is started successfully and the firewalls are turned off. After confirming that there is no problem, the interface still cannot be opened. Please check the log and come back to find the reason.

IV. Environment configuration of Spark

Description: in fact, the relevant configuration of spark, I am in big data learning series 6-Hadoop+Spark environment to build http://www.panchengming.com/2017/12/19/pancm63/ should have been said in detail, although it is a stand-alone environment. In fact, the cluster will also add a slave configuration, the rest seems to be nothing for the time being. So simply paste the configuration.

1Jing Scala configuration

Almost the same as the JDK configuration

4.1.1 File preparation

Extract the downloaded Scala file

Input

Tar-xvf scala-2.12.2.tgz

Then move to / opt/scala

And renamed it scala2.1.

Input

Mv scala-2.12.2 / opt/scalamv scala-2.12.2 scala2.124.1.2 environment configuration

Edit / etc/profile file

Enter:

Export SCALA_HOME=/opt/scala/scala2.12export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

Enter scala-version to check whether the installation is successful

2Jing Spark configuration 4.2.1, file preparation

Extract the downloaded Spark file

Input

Tar-xvf spark-1.6.3-bin-hadoop2.4-without-hive.tgz

Then move to / opt/spark and rename it

Input

Mv spark-1.6.3-bin-hadoop2.4-without-hive / opt/sparkmv spark-1.6.3-bin-hadoop2.4-without-hive spark1.6-hadoop2.4-hive4.2.2, environment configuration

Edit / etc/profile file

Enter:

Export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

4.2.3, change the profile

Switch directories

Enter:

Cd / opt/spark/spark1.6-hadoop2.4-hive/conf4.2.3.1 modify spark-env.sh

In the conf directory, modify the spark-env.sh file, and if it is not spark-env.sh, copy the spark-env.sh.template file and rename it to spark-env.sh.

Modify the newly created spark-env.sh file and add the configuration:

Export SCALA_HOME=/opt/scala/scala2.1 export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hiveexport SPARK_MASTER_IP=master export SPARK_EXECUTOR_MEMORY=4G

Note: the above path is based on your own, SPARK_MASTER_IP is the host, and SPARK_EXECUTOR_MEMORY is the set running memory.

4.2.3.2 modify slaves

Slaves distributed file

In the conf directory, modify the slaves file, and if it is not slaves, copy the slaves .template file and rename it to slaves.

Modify the newly created slaves file and add the configuration:

Slave1 slave2

After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.

Enter:

Scala environment transmission

Scp-r / opt/scala root@slave1:/optscp-r / opt/scala root@slave2:/opt

Spark environment transmission

Scp-r / opt/spark root@slave1:/optscp-r / opt/spark root@slave2:/opt

After the transfer, the cluster is started at the primary node.

3Pertinent spark launch

Description: start Hadoop first

Change to the Spark directory

Enter:

Cd / opt/spark/spark2.2/sbin

Then start Spark

Enter:

Start-all.sh

After the boot is successful, you can use the jps command to see if it is successful on each machine.

You can view it in the browser by typing: ip+8080 port

If this interface is displayed successfully, Spark starts successfully.

V. Environmental configuration of Zookeeper

Because HBase is a cluster, you need zookeeper.

Zookeeper can be seen in many environments, such as kafka, storm and so on. I won't say much here.

1, file preparation

Extract the downloaded Zookeeper configuration file

On linux, enter:

Tar-xvf zookeeper-3.4.10.tar.gz

Then move to / opt/zookeeper, create a new one without it, and rename the folder to zookeeper3.4.

Input

Mv zookeeper-3.4.10 / opt/zookeepermv zookeeper-3.4.10 zookeeper3.42, environment configuration

Edit / etc/profile file

Enter:

Export ZK_HOME=/opt/zookeeper/zookeeper3.4 export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {ZK_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

3. Modify the configuration file 5.3.1 to create files and directories

Create these directories on the servers of the cluster

Mkdir / opt/zookeeper/data mkdir / opt/zookeeper/dataLog

And create a myid file in the / opt/zookeeper/data directory

Enter:

Touch myid

After the creation is successful, change the myid file.

For convenience, I changed the contents of the myid files of master, slave1 and slave2 to 1, 2, and 3.

5.3.2 New zoo.cfg

Change to / opt/zookeeper/zookeeper3.4/conf directory

If the file is not zoo.cfg, copy the zoo_sample.cfg file and rename it to zoo.cfg.

Modify the newly created zoo.cfg file

DataDir=/opt/zookeeper/datadataLogDir=/opt/zookeeper/dataLogserver.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888

Description: client port, as the name implies, is the port through which the client connects to the zookeeper service. This is a TCP port. DataLogDir is put into the sequential log (WAL). On the other hand, what is put in the dataDir is the snapshot of the in-memory data structure, which is easy to recover quickly. In order to maximize performance, it is generally recommended to divide dataDir and dataLogDir into different disks, so that you can take full advantage of the disk sequential write feature. DataDir and dataLogDir need to be created by themselves, and directories can be made by themselves. The 1 in server.1 needs to correspond to the value in the myid file in the dataDir directory on the master machine. The 2 in server.2 needs to correspond to the value in the myid file in the dataDir directory on the slave1 machine. The 3 in server.3 needs to correspond to the value in the myid file in the dataDir directory on the slave2 machine. Of course, you can use the value as long as you correspond to it. The port numbers of 2888 and 3888 can also be used freely, because it doesn't matter if they are used the same on different machines.

1.tickTime:CS confidence hop count

The interval at which a heartbeat is maintained between Zookeeper servers or between clients and servers, that is, a heartbeat is sent at each tickTime time. TickTime is in milliseconds.

TickTime=2000

2.initLimit:LF initial communication time limit

The maximum number of heartbeats (the number of tickTime) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster.

InitLimit=10

3.syncLimit:LF synchronous communication time limit

The maximum number of heartbeats (number of tickTime) that can be tolerated between requests and responses between the follower server and the leader server in the cluster.

SyncLimit=5

Still transfer zookeeper to other machines, remember to change the myid under / opt/zookeeper/data, this is not consistent.

Enter:

Scp-r / opt/zookeeper root@slave1:/optscp-r / opt/zookeeper root@slave2:/opt4, start zookeeper

Because zookeeper is an electoral system, its master-slave relationship is not as specified as hadoop, which can be specified in the official documentation.

After successfully configuring zookeeper, start zookeeper on each machine.

Change to the zookeeper directory

Cd / opt/zookeeper/zookeeper3.4/bin

Enter:

ZkServer.sh start

After successful startup

View status input:

ZkServer.sh status

You can view the leader and follower of zookeeper on each machine

VI. HBase environment configuration 1, file preparation

Extract the downloaded HBase configuration file

On linux, enter:

Tar-xvf hbase-1.2.6-bin.tar.gz

Then move to the / opt/hbase folder and rename it to hbase1.2

Input

Mv hbase-1.2.6 / opt/hbasemv hbase1.2 / opt/hbase2, environment configuration

Edit / etc/profile file

Enter:

Export HBASE_HOME=/opt/hbase/hbase1.2export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

Input

Hbase version

View version

3. Modify the configuration file

Switch to / opt/hbase/hbase-1.2.6/conf

6.3.1 modify hbase-env.sh

Edit the hbase-env.sh file to add the following configuration

Export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8export HBASE_HOME=/opt/hbase/hbase1.2export HBASE_CLASSPATH=/opt/hadoop/hadoop2.8/etc/hadoopexport HBASE_PID_DIR=/root/hbase/pidsexport HBASE_MANAGES_ZK=false

Description: configure the path to your own. HBASE_MANAGES_ZK=false is a Zookeeper cluster that does not enable HBase native.

6.3.2 modify hbase-site.xml

Edit the hbase-site.xml file and add the following configuration

Hbase.rootdir hdfs://master:9000/hbase The directory shared byregion servers. Hbase.zookeeper.property.clientPort 2181 zookeeper.session.timeout 120000hbase.master.maxclockskew150000 hbase.zookeeper.quorum master,slave1,slave2 hbase.tmp.dir / root/hbase/tmp hbase.cluster.distributed true hbase.master master:60000

Description: hbase.rootdir: this directory is a shared directory for region server and is used to persist Hbase. Hbase.cluster.distributed: the running mode of Hbase. False is in stand-alone mode and true is in distributed mode. If false,Hbase and Zookeeper run in the same JVM.

6.3.3 modify regionservers

Specify the master and slave of hbase, which is the same as the slaves file configuration of hadoop

Modify the file to

Slave1 slave2

Note: the above is the host name of the cluster

After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.

Enter:

Hbase environment transmission

Scp-r / opt/hbaseroot@slave1:/optscp-r / opt/hbaseroot@ slave2:/opt

After the transfer, the cluster is started at the primary node.

4, start hbase

After successfully starting Hadoop and zookeeper

Change to the HBase directory

Cd / opt/hbase/hbase1.2/bin

Enter:

Start-hbase.sh

After the boot is successful, you can use the jps command to check whether it is successful on each machine

You can view it in the browser by typing: ip+16010 port

If the interface is displayed successfully, the startup is successful.

VII. Environment installation and configuration of Hive

Because you don't need a cluster to install hive, you just need to install it on a machine, which was previously used in my

Big data learning series 4-Hadoop+Hive environment building picture and text detailed explanation (stand-alone) http://www.panchengming.com/2017/12/16/pancm61/ has been explained in great detail, so this article will not be described.

VIII. Other

Environment Building reference: http://blog.csdn.net/pucao_cug/article/details/72773564

The official documentation for the environment configuration reference.

This is the end of this tutorial, thank you for reading!

Author: nothingness

Source of blog Park: http://www.cnblogs.com/xuwujing

Source of CSDN: http://blog.csdn.net/qazwsxpcm

Source of personal blog: http://www.panchengming.com

Original is not easy, reprint please indicate the source, thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.