In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Introduction in the previous big data learning series, set up the Hadoop+Spark+HBase+Hive environment and some tests. In fact, when I started to learn from big data, I built a cluster, not stand-alone mode and pseudo-distributed mode. As for why to write the construction of a stand-alone machine first, it is because as an individual study, a single machine is enough. Well, to tell you the truth, it is not good for your own computer, and it is too slow to use a virtual machine. The whole cluster is built in the company's test service, encountered a variety of holes in the construction, of course, but also gained a lot. After successfully building the big data cluster, I took notes sporadically, and then reorganized these notes. So I have this blog post. In fact, when I built it, I didn't build it step by step. I changed a lot more or less in the middle, and I experimented with a cluster that didn't have any problems at present. At the time of writing this article, we began to prepare to build the environment on one machine, and then transfer it all to other machines. But when you think about it, it may be quick to build the environment, but it's not very friendly for the reader. So it was split out, and it was possible to build one alone. All right, cut the crap. The tutorials are as follows. Catalogue
[TOC]
First, environment selection 1, cluster machine installation diagram
This time, because it is built by a cluster, I use a table to describe the environment configuration. The cluster uses three machines, namely master, slave1, and slave2, and the master-slave relationship can be known by naming. The operating system used is CentOS6.8. The specific configuration of each machine installation is shown in the following table:
The configuration of each machine is shown above. I need to add that I don't use the official spark for spark, but I use the compiled version of spark and hive. Because later, when using hive queries, you do not want to use hive's default mr, and after hive2.x, officials do not recommend it. Because using mr is too inefficient, I will replace the engine of hive with spark later, and I don't want to recompile spark, so I use this version. If you want to compile on your own, or have a higher version, you don't have to follow the above. There is also the storage path, there is no need to use the above, you can first use df-h on the machine to check the corresponding disk space, and then deploy.
2. Configuration description: JDK: the configuration that Hadoop and Spark depend on. It is officially recommended that the JDK version be 1.7 or above! The configuration that Scala:Spark depends on. The recommended version is not lower than that of spark. Hadoop: is a distributed system infrastructure. Spark: a tool for processing distributed storage by big data. Zookeeper: distributed application coordination service, required by HBase cluster. HBase: a distributed storage system for structured data. Hive: a data warehouse tool based on Hadoop, the current default Metabase is mysql. 3, download address
Official address:
Hadoop:
Http://www.apache.org/dyn/closer.cgi/hadoop/common
Spark:
Http://spark.apache.org/downloads.html
Spark Sql on Hive
Http://mirror.bit.edu.cn/apache/spark
Scala:
Http://www.scala-lang.org/download
JDK:
Http://www.oracle.com/technetwork/java/javase/downloads
HBase
Http://mirror.bit.edu.cn/apache/hbase/
Zookeeper
Http://mirror.bit.edu.cn/apache/zookeeper/
Hive
Http://mirror.bit.edu.cn/apache/hive/
Baidu Cloud:
Link: https://pan.baidu.com/s/1kUYfDaf password: o1ov
Second, cluster related configuration 1, hostname change and host and IP to do related mapping 1. Change the hostname
Note: change the hostname to facilitate cluster management, otherwise it would not be good for each machine to be named localhost! All machines in the cluster do this.
Input
Vim / etc/sysconfig/network
Modify the localhost.localdomain to the name you want to change, each with a different name
For example:
HOSTNAME=master Note: the restart will not take effect until the name has been changed by typing reboot. two。 Mapping the relationship between the host and IP
Modify the hosts file to do relational mapping
Description: each machine does this configuration, and the specific ip and host name shall prevail.
Enter:
Vim / etc/hosts
Add
192.169.0.23 master192.169.0.24 slave1192.169.0.25 slave2
Description: after a machine has been added, you can use the scp command or use ftp to copy this file to another machine.
Example of the scp command:
Scp-r / etc/hosts root@192.169.0.24:/etc2,ssh login-free
Setting ssh password-free login is for ease of operation
Generate a secret key file
Execute it on every machine.
First enter:
Ssh-keygen-t rsa-P''
After the secret key is generated, each machine / root/.ssh is stored in a file with the same content, the file name is authorized_keys, and the file content is the public key we just generated for the three machines. It can be generated on one machine and then copied to another machine.
Create a new authorized_keys file
Enter:
Touch / root/.ssh/authorized_keys
Edit authorized_keys and copy the keys from other machines
Cat / root/.ssh/id_rsa.pubvim / root/.ssh/authorized_keys
Copy the contents of id_rsa.pub from other machines to the file authorized_keys.
The first machine:
The second machine:
The third machine:
The contents of the final authorized_keys file
Copy the final authorized_keys file to the / root/.ssh directory on another machine. You can use either scp or ftp.
Example of the scp command:
Scp-r / root/.ssh/authorized_keys root@192.169.0.24:/root/.ssh
Test password-free login
Enter:
Ssh slave1ssh slave2
Enter exit to exit
3. Firewall is off
Description: in fact, it is possible to set permissions without turning off the firewall, but in order to facilitate access, the firewall is turned off. Every machine does it!
Command to turn off the firewall
Stop the firewall:
Service iptables stop
Start the firewall:
Service iptables start
Restart the firewall:
Service iptables restart
Permanently turn off the firewall:
Chkconfig iptables off
4, time configuration
The machine time on the cluster needs to be synchronized, because the one on my side is a virtual machine, so I don't need it.
The NTP service can be used to set up cluster time synchronization.
For more information, please refer to http://blog.csdn.net/to_baidu/article/details/52562574
5, shortcut key settings (optional)
Description: because you often switch between directories, you set up aliases in order to be lazy. You only need to type the alias in linux and you can execute the commands after the alias, which is quite convenient. For example, our commonly used ll is an alias for ls-l. You can find out for yourself about the alias.
Enter:
Vim / .bashrc
Add the following
# Some more ailasesalias chd='cd / opt/hadoop/hadoop2.8'alias chb='cd / opt/hbase/hbase1.2'alias chi='cd / opt/hive/hive2.1'alias czk='cd / opt/zookeeper/zookeeper3.4'alias csp='cd / opt/spark/spark2.0-hadoop2.4-hive'alias fhadoop='/opt/hadoop/hadoop2.8/bin/hdfs namenode-format'alias starthadoop='/opt/hadoop/hadoop2.8/sbin/start-all.sh'alias stophadoop='/opt/hadoop / hadoop2.8/sbin/stop-all.sh'alias starthbase='/opt/hbase/hbase1.2/bin/start-hbase.sh'alias stophbase='/opt/hbase/hbase1.2/bin/stop-hbase.sh'alias startzk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh start'alias stopzk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh stop'alias statuszk='/opt/zookeeper/zookeeper3.4/bin/zkServer.sh status'alias startsp='/opt/spark/spark1. 6-hadoop2.4-hive/sbin/start-all.sh'alias stopsp='/opt/spark/spark1.6-hadoop2.4-hive/sbin/stop-all.sh'
Enter after successful addition
Source / .bashrc
Then enter the alias of the setting and you can execute what you set. Aliases do not have to be set according to the above, if there is a better way, please use
6. Overall environment variable setting
In the / etc/profile configuration file to add a lot of environment configuration, here will first list the overall environment configuration, you in the configuration of environment variables to your own! You can first configure the environment variables and then transfer them to other machines.
I first transferred these configurations to other machines, and they are all source, so the operation of this configuration file below is not actually done. The specific circumstances are based on your own.
# Java Configexport JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib# Scala Configexport SCALA_HOME=/opt/scala/scala2.12# Spark Configexport SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive# Zookeeper Configexport ZK_HOME=/opt/zookeeper/zookeeper3.4# HBase Configexport HBASE_HOME=/opt/hbase/hbase1.2# Hadoop Config Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" # Hive Configexport HIVE_HOME=/opt/hive/hive2.1export HIVE_CONF_DIR=$ {HIVE_HOME} / confexport PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin:$ {ZK_ HOME} / bin:$ {HBASE_HOME} / bin:$ {HIVE_HOME} / bin:$PATH II. Environment Construction of Hadoop
In advance, these configurations can be configured on one machine and then copied to other machines. Take care to make these configuration files effective after copying.
1Dist JDK configuration
Note: generally, CentOS comes with openjdk, but the hadoop cluster uses the official jdk of oracle, so uninstall the jdk of CentOS first, and then install the downloaded JDK in oracle.
First enter java-version
Check to see if JDK is installed, and if so, but the version is not suitable, uninstall it
Input
Rpm-qa | grep java
View information
Then enter:
Rpm-e-- nodeps "you want to uninstall JDK information"
Such as: rpm-e-- nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.el6.x86_64
After confirming that it is gone, extract the downloaded JDK
Tar-xvf jdk-8u144-linux-x64.tar.gz
Move to the opt/java folder, create a new one without it, and rename the folder to jdk1.8.
Mv jdk1.8.0_144 / opt/javamv jdk1.8.0_144 jdk1.8
Then edit the profile file and add the following configuration
Enter:
Vim / etc/profile
Add:
Export JAVA_HOME=/opt/java/jdk1.8export JRE_HOME=/opt/java/jdk1.8/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/libexport PATH=.:$ {JAVA_HOME} / bin:$PATH
After the addition is successful, enter
Source / etc/profilejava-version
Check to see if the configuration is successful
2Die Hadoop configuration 3.2.1 file preparation
Extract the downloaded Hadoop configuration file
On linux, enter:
Tar-xvf hadoop-2.8.2.tar.gz
Then move the unzipped folder to the opt/hadoop folder, create a new folder without it, and rename the folder to hadoop2.8.
Enter the move folder command on linux:
Mv hadoop-2.8.2 / opt/hadoopmv hadoop-2.8.2 hadoop2.83.2.2 environment configuration
Edit / etc/profile file
Enter:
Vim / etc/profile
Add:
Export HADOOP_HOME=/opt/hadoop/hadoop2.8 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" export PATH=.:$ {JAVA_HOME} / bin:$ {HADOOP_HOME} / bin:$PATH
Enter:
Source / etc/profile
Make the configuration effective
3.2.3 modify the configuration file
Modify these configuration files such as core-site.xml, hadoop-env.sh, hdfs-site.xml, mapred-site.xml, etc.
Enter the command to enter the directory at linux:
Cd / opt/hadoop/hadoop2.8/etc/hadoop3.2.3.1 modify core-site.xml
The storage path of hadoop can be changed on its own. At first I thought these folders needed to be created manually, but later in practice, if they were not created manually, they would be created automatically, so the step of creating directories manually was removed.
Enter:
Vim core-site.xml
Add the configuration within the node:
Hadoop.temp.dir file:/root/hadoop/tmp fs.defaultFS hdfs://master:9000 hadoop.proxyuser.root.hosts * hadoop.proxyuser.root.groups *
Note: fs.defaultFS is the name of the default file, originally using fs.default.name, and later found in the latest official documentation that this method has been deprecated. So I changed it to this. Ps: it doesn't make any difference.
3.2.3.2 modify hadoop-env.sh
To do this, I don't know why the relative path is not recognized, so I use the absolute path.
Set
Export JAVA_HOME=$ {JAVA_HOME}
Modified to:
Export JAVA_HOME=/opt/java/jdk1.8
Note: change it to the path of your own JDK
3.2.2.3 modify hdfs-site.xml
The storage path of the following hdfs can be changed according to your own machine.
Add the configuration within the node:
Dfs:replication 2 dfs.namenode.name.dir file:/root/hadoop/name dfs.datanode.data.dir file:/root/hadoop/data
3.5.2.4 modify mapred-site.xml
Execute the running framework configuration of mapreduce. Ps: it feels like this configuration is useless. Maybe I didn't use mr.
If the file is not mapred-site.xml, copy the mapred-site.xml.template file and rename it to mapred-site.xml.
Modify the newly created mapred-site.xml file and add the configuration to the node:
Mapreduce.framework.name yarn3.5.2.5 modifies the yarn-site.xml file
The configuration of yarn resource scheduling, which is necessary for clustering.
Modify / opt/hadoop/hadoop2.8/etc/hadoop/yarn-site.xml file
Add configuration to the node
Yarn.resourcemanager.hostname master yarn.resourcemanager.address ${yarn.resourcemanager.hostname}: 8032 The address of the scheduler interface. Yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}: 8030 The http address of the RM web application. Yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}: 8088 The https adddress of the RM web application. Yarn.resourcemanager.webapp.https.address ${yarn.resourcemanager.hostname}: 8090 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}: 8031 The address of the RM admin interface. Yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}: 8033 yarn.nodemanager.aux-services mapreduce_shuffle yarn.scheduler.maximum-allocation-mb 8182 available memory per node (in MB) Default 8182MB yarn.nodemanager.vmem-pmem-ratio 2.1 yarn.nodemanager.resource.memory-mb 2048 yarn.nodemanager.vmem-check-enabled false
Note: yarn.nodemanager.vmem-check-enabled this means to ignore the virtual memory check, if you are installed on a virtual machine, this configuration is very useful, with the subsequent operation is not easy to go wrong. If it is on a physical machine and there is enough memory, you can remove this configuration.
3.5.2.6 modify slaves
Set the configuration of the master and slave. If you do not set this, the cluster will not know the master and slave. If it is in stand-alone mode, there is no need to configure it.
Modify / opt/hadoop/hadoop2.8/etc/hadoop/slaves file
Change to
Slave1 slave2
These configurations refer to the official Hadoop documentation.
Hadoop official configuration file description: http://hadoop.apache.org/docs/r2.8.3/
After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.
Enter:
Jdk environment transmission
Scp-r / opt/java root@slave1:/optscp-r / opt/java root@slave2:/opt
Hadoop environment transmission
Scp-r / opt/hadoop root@slave1:/optscp-r / opt/hadoop root@slave2:/opt
After the transfer, the cluster is started at the primary node.
Before starting hadoop, it needs to be initialized, and this only needs to be initialized on master.
3The Hadoop starts.
Note: before starting hadoop, make sure that the firewall is turned off, each machine time passes, and ssh is free to log in.
Initialize hadoop
Change to / opt/hadoop/hadoop2.8/bin directory to enter
. / hdfs namenode-format
After successful initialization, switch to / opt/hadoop/hadoop2.8/sbin
Start hdfs and yarn for hadoop
Enter:
Start-dfs.shstart-yarn.sh
The first login will ask if you want to connect, enter yes, and then enter the password.
After the boot is successful, you can use the jps command to check whether it is successful on each machine
You can view it by typing ip+50070 and port 8088 in the browser.
If shown in the figure, the startup is successful.
If it fails, check whether the jps is started successfully and the firewalls are turned off. After confirming that there is no problem, the interface still cannot be opened. Please check the log and come back to find the reason.
IV. Environment configuration of Spark
Description: in fact, the relevant configuration of spark, I am in big data learning series 6-Hadoop+Spark environment to build http://www.panchengming.com/2017/12/19/pancm63/ should have been said in detail, although it is a stand-alone environment. In fact, the cluster will also add a slave configuration, the rest seems to be nothing for the time being. So simply paste the configuration.
1Jing Scala configuration
Almost the same as the JDK configuration
4.1.1 File preparation
Extract the downloaded Scala file
Input
Tar-xvf scala-2.12.2.tgz
Then move to / opt/scala
And renamed it scala2.1.
Input
Mv scala-2.12.2 / opt/scalamv scala-2.12.2 scala2.124.1.2 environment configuration
Edit / etc/profile file
Enter:
Export SCALA_HOME=/opt/scala/scala2.12export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$PATH
Enter:
Source / etc/profile
Make the configuration effective
Enter scala-version to check whether the installation is successful
2Jing Spark configuration 4.2.1, file preparation
Extract the downloaded Spark file
Input
Tar-xvf spark-1.6.3-bin-hadoop2.4-without-hive.tgz
Then move to / opt/spark and rename it
Input
Mv spark-1.6.3-bin-hadoop2.4-without-hive / opt/sparkmv spark-1.6.3-bin-hadoop2.4-without-hive spark1.6-hadoop2.4-hive4.2.2, environment configuration
Edit / etc/profile file
Enter:
Export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH
Enter:
Source / etc/profile
Make the configuration effective
4.2.3, change the profile
Switch directories
Enter:
Cd / opt/spark/spark1.6-hadoop2.4-hive/conf4.2.3.1 modify spark-env.sh
In the conf directory, modify the spark-env.sh file, and if it is not spark-env.sh, copy the spark-env.sh.template file and rename it to spark-env.sh.
Modify the newly created spark-env.sh file and add the configuration:
Export SCALA_HOME=/opt/scala/scala2.1 export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hiveexport SPARK_MASTER_IP=master export SPARK_EXECUTOR_MEMORY=4G
Note: the above path is based on your own, SPARK_MASTER_IP is the host, and SPARK_EXECUTOR_MEMORY is the set running memory.
4.2.3.2 modify slaves
Slaves distributed file
In the conf directory, modify the slaves file, and if it is not slaves, copy the slaves .template file and rename it to slaves.
Modify the newly created slaves file and add the configuration:
Slave1 slave2
After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.
Enter:
Scala environment transmission
Scp-r / opt/scala root@slave1:/optscp-r / opt/scala root@slave2:/opt
Spark environment transmission
Scp-r / opt/spark root@slave1:/optscp-r / opt/spark root@slave2:/opt
After the transfer, the cluster is started at the primary node.
3Pertinent spark launch
Description: start Hadoop first
Change to the Spark directory
Enter:
Cd / opt/spark/spark2.2/sbin
Then start Spark
Enter:
Start-all.sh
After the boot is successful, you can use the jps command to see if it is successful on each machine.
You can view it in the browser by typing: ip+8080 port
If this interface is displayed successfully, Spark starts successfully.
V. Environmental configuration of Zookeeper
Because HBase is a cluster, you need zookeeper.
Zookeeper can be seen in many environments, such as kafka, storm and so on. I won't say much here.
1, file preparation
Extract the downloaded Zookeeper configuration file
On linux, enter:
Tar-xvf zookeeper-3.4.10.tar.gz
Then move to / opt/zookeeper, create a new one without it, and rename the folder to zookeeper3.4.
Input
Mv zookeeper-3.4.10 / opt/zookeepermv zookeeper-3.4.10 zookeeper3.42, environment configuration
Edit / etc/profile file
Enter:
Export ZK_HOME=/opt/zookeeper/zookeeper3.4 export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$ {ZK_HOME} / bin:$PATH
Enter:
Source / etc/profile
Make the configuration effective
3. Modify the configuration file 5.3.1 to create files and directories
Create these directories on the servers of the cluster
Mkdir / opt/zookeeper/data mkdir / opt/zookeeper/dataLog
And create a myid file in the / opt/zookeeper/data directory
Enter:
Touch myid
After the creation is successful, change the myid file.
For convenience, I changed the contents of the myid files of master, slave1 and slave2 to 1, 2, and 3.
5.3.2 New zoo.cfg
Change to / opt/zookeeper/zookeeper3.4/conf directory
If the file is not zoo.cfg, copy the zoo_sample.cfg file and rename it to zoo.cfg.
Modify the newly created zoo.cfg file
DataDir=/opt/zookeeper/datadataLogDir=/opt/zookeeper/dataLogserver.1=master:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:3888
Description: client port, as the name implies, is the port through which the client connects to the zookeeper service. This is a TCP port. DataLogDir is put into the sequential log (WAL). On the other hand, what is put in the dataDir is the snapshot of the in-memory data structure, which is easy to recover quickly. In order to maximize performance, it is generally recommended to divide dataDir and dataLogDir into different disks, so that you can take full advantage of the disk sequential write feature. DataDir and dataLogDir need to be created by themselves, and directories can be made by themselves. The 1 in server.1 needs to correspond to the value in the myid file in the dataDir directory on the master machine. The 2 in server.2 needs to correspond to the value in the myid file in the dataDir directory on the slave1 machine. The 3 in server.3 needs to correspond to the value in the myid file in the dataDir directory on the slave2 machine. Of course, you can use the value as long as you correspond to it. The port numbers of 2888 and 3888 can also be used freely, because it doesn't matter if they are used the same on different machines.
1.tickTime:CS confidence hop count
The interval at which a heartbeat is maintained between Zookeeper servers or between clients and servers, that is, a heartbeat is sent at each tickTime time. TickTime is in milliseconds.
TickTime=2000
2.initLimit:LF initial communication time limit
The maximum number of heartbeats (the number of tickTime) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster.
InitLimit=10
3.syncLimit:LF synchronous communication time limit
The maximum number of heartbeats (number of tickTime) that can be tolerated between requests and responses between the follower server and the leader server in the cluster.
SyncLimit=5
Still transfer zookeeper to other machines, remember to change the myid under / opt/zookeeper/data, this is not consistent.
Enter:
Scp-r / opt/zookeeper root@slave1:/optscp-r / opt/zookeeper root@slave2:/opt4, start zookeeper
Because zookeeper is an electoral system, its master-slave relationship is not as specified as hadoop, which can be specified in the official documentation.
After successfully configuring zookeeper, start zookeeper on each machine.
Change to the zookeeper directory
Cd / opt/zookeeper/zookeeper3.4/bin
Enter:
ZkServer.sh start
After successful startup
View status input:
ZkServer.sh status
You can view the leader and follower of zookeeper on each machine
VI. HBase environment configuration 1, file preparation
Extract the downloaded HBase configuration file
On linux, enter:
Tar-xvf hbase-1.2.6-bin.tar.gz
Then move to the / opt/hbase folder and rename it to hbase1.2
Input
Mv hbase-1.2.6 / opt/hbasemv hbase1.2 / opt/hbase2, environment configuration
Edit / etc/profile file
Enter:
Export HBASE_HOME=/opt/hbase/hbase1.2export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH
Enter:
Source / etc/profile
Make the configuration effective
Input
Hbase version
View version
3. Modify the configuration file
Switch to / opt/hbase/hbase-1.2.6/conf
6.3.1 modify hbase-env.sh
Edit the hbase-env.sh file to add the following configuration
Export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8export HBASE_HOME=/opt/hbase/hbase1.2export HBASE_CLASSPATH=/opt/hadoop/hadoop2.8/etc/hadoopexport HBASE_PID_DIR=/root/hbase/pidsexport HBASE_MANAGES_ZK=false
Description: configure the path to your own. HBASE_MANAGES_ZK=false is a Zookeeper cluster that does not enable HBase native.
6.3.2 modify hbase-site.xml
Edit the hbase-site.xml file and add the following configuration
Hbase.rootdir hdfs://master:9000/hbase The directory shared byregion servers. Hbase.zookeeper.property.clientPort 2181 zookeeper.session.timeout 120000hbase.master.maxclockskew150000 hbase.zookeeper.quorum master,slave1,slave2 hbase.tmp.dir / root/hbase/tmp hbase.cluster.distributed true hbase.master master:60000
Description: hbase.rootdir: this directory is a shared directory for region server and is used to persist Hbase. Hbase.cluster.distributed: the running mode of Hbase. False is in stand-alone mode and true is in distributed mode. If false,Hbase and Zookeeper run in the same JVM.
6.3.3 modify regionservers
Specify the master and slave of hbase, which is the same as the slaves file configuration of hadoop
Modify the file to
Slave1 slave2
Note: the above is the host name of the cluster
After making these configurations on one machine (preferably master), we use the scp command to transfer these configurations to other machines.
Enter:
Hbase environment transmission
Scp-r / opt/hbaseroot@slave1:/optscp-r / opt/hbaseroot@ slave2:/opt
After the transfer, the cluster is started at the primary node.
4, start hbase
After successfully starting Hadoop and zookeeper
Change to the HBase directory
Cd / opt/hbase/hbase1.2/bin
Enter:
Start-hbase.sh
After the boot is successful, you can use the jps command to check whether it is successful on each machine
You can view it in the browser by typing: ip+16010 port
If the interface is displayed successfully, the startup is successful.
VII. Environment installation and configuration of Hive
Because you don't need a cluster to install hive, you just need to install it on a machine, which was previously used in my
Big data learning series 4-Hadoop+Hive environment building picture and text detailed explanation (stand-alone) http://www.panchengming.com/2017/12/16/pancm61/ has been explained in great detail, so this article will not be described.
VIII. Other
Environment Building reference: http://blog.csdn.net/pucao_cug/article/details/72773564
The official documentation for the environment configuration reference.
This is the end of this tutorial, thank you for reading!
Copyright notice:
Author: nothingness
Source of blog Park: http://www.cnblogs.com/xuwujing
Source of CSDN: http://blog.csdn.net/qazwsxpcm
Source of personal blog: http://www.panchengming.com
Original is not easy, reprint please indicate the source, thank you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.