Big data Hadoop Cluster Building 04/29 Update SLTechnology News&Howtos

Big data Hadoop Cluster Building

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

# create the specified directory [root@node2 conf] # mkdir-pv / data/bigdata/hive/ {tmp/ {operation_logs,resources} Warehouse} 5 、 Initialize and run # schema for initializing metastore using schematool: [root@node2 conf] # schematool-initSchema-dbType mysql SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [SLF4J: Found binding in [jar:file:/data/bigdata/src/hadoop-2.7. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://127.0.0.1:3306/metastore?useSSL=falseMetastore Connection Driver: com.mysql.jdbc.DriverMetastore connection User: hiveStarting metastore schema initialization To 2.3.0Initialization script hive-schema-2.3.0.mysql.sqlInitialization script completedschemaTool completed [root@node2 conf] # # running hive [root@node2 conf] # hive # corresponds to RunJar process hive > show databases OKdefaultTime taken: 1.881 seconds, Fetched: 1 row (s) hive > use default;OKTime taken: 0.081 secondshive > create table kylin_test (test_count int); OKTime taken: 2.9 secondshive > show tables;OKTime taken: 0.151 seconds, Fetched: 1 row (s) hive > quit The process started so far:\ node1node2node3node4node5JDK ✔ QuorumPeerMain ✔✔✔ JournalNode ✔✔✔ NameNode ✔✔ DFSZKFailoverController ✔✔ DataNode ✔ NodeManager ✔ ✔✔ HMaster ✔ HRegionServer ✔✔✔ HistoryServer ✔ Master ✔ Worker ✔ Kafka ✔✔ RunJar ✔ # started hive

Official document

1. Prepare before installation

Before installing kylin, make sure that hadoop 2.4 +, hbase 0.13 +, hive 0.98 and hive 1.98 have been installed and started.

Hive needs to start metastore and hiveserver2.

Apache Kylin can also use cluster deployment, but using cluster deployment does not increase computing speed.

Because the calculation process uses the MapReduce engine, it has nothing to do with Kylin itself, but mainly provides load balancing for queries. This time, a single node is adopted.

2 、 Extract and create the environment variable [root@node2 ~] # tar zxvf / data/tools/apache-kylin-2.3.1-hbase1x-bin.tar.gz-C / data/bigdata/src/ [root@node2 ~] # ln-s / data/bigdata/src/apache-kylin-2.3.1-bin/ / data/bigdata/kylin# add the environment variable [root@node2 ~] # echo-e "\ n # kylin\ nexport KYLIN_HOME=/data/bigdata/ Kylin\ nexport PATH=\ $KYLIN_HOME/bin:\ $PATH "> > / etc/profile.d/bigdata_ path.sh [root @ node2 ~] # cat / etc/profile.d/bigdata_path.sh# zookeeperexport ZOOKEEPER_HOME=/data/bigdata/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH# hadoopexport HADOOP_HOME=/data/bigdata/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH# hbaseexport HBASE_HOME=/data/bigdata/hbaseexport PATH=$HBASE_HOME/bin:$PATH# scalaexport scala_HOME=/data/bigdata/scalaexport PATH=$scala_ HOME/bin:$PATH# sparkexport SPARK_HOME=/data/bigdata/sparkexport PATH=$SPARK_HOME/bin:$PATH# kafkaexport KAFKA_HOME=/data/bigdata/kafkaexport PATH=$KAFKA_HOME/bin:$PATH# kylinexport KYLIN_HOME=/data/bigdata/kylinexport PATH=$KYLIN_HOME/bin:$PATH "# effective [root@node2 ~] # source / etc/profile3, Copy the relevant jar of hive to kylin

Copy all jar packages in the lib directory of the hive installation directory to the lib directory under the kylin installation directory.

[root@node2 ~] # cp-a / data/bigdata/hive/lib/* / data/bigdata/kylin/lib/4, configure the Hive database used by Kylin: [root@node2 ~] # cd / data/bigdata/kylin/conf [root@node2 conf] # vim kylin.propertieskylin.server.cluster-servers=node2:7070 # kylin cluster settings, modify hostname or ip Port kylin.job.jar=$KYLIN_HOME/lib/kylin-job-2.3.1.jar # modify jar package version and path kylin.coprocessor.local.jar=$KYLIN_HOME/lib/kylin-coprocessor-2.3.1.jar # modify jar package version and path # List of web servers in use, this enables one web server instance to sync up with other serverskylin.rest.servers=node2:7070## configure the Hive database used by Kylin, here configure the schema used in Hive Change to current user kylin.job.hive.database.for.intermediatetable=root 5, if https is not started Please close [root@node2 ~] # cd / data/bigdata/kylin/tomcat/conf [root@node2 conf] # cp-a server.xml {, _ $(date +% F)} [root@node2 conf] # vim server.xml 85 maxThreads= "150" SSLEnabled= "true" scheme= "https" secure= "true" instead of 85 maxThreads= "150" SSLEnabled= "false" scheme= "https" secure= "false"

If it is not closed, the following error will be reported

SEVERE: Failed to load keystore type JKS with path conf/.keystore due to / data/bigdata/kylin/tomcat/conf/.keystore (No such file or directory) java.io.FileNotFoundException: / data/bigdata/kylin/tomcat/conf/.keystore (No such file or directory) 6, Modify $KYLIN_HOME/bin/ kylin.sh [root @ node2 conf] # vim.. / bin/kylin.shexport KYLIN_HOME=/data/bigdata/kylinexport CATALINA_HOME=/data/bigdata/kylin/tomcatexport PATH=$CATALINA_HOME/bin:$PATHexport HCAT_HOME=$HIVE_HOME/hcatalogexport hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/*:$HCAT_HOME/share/hcatalog/hive-hcatalog-core-2.3.3.jarexport HBASE_CLASSPATH_PREFIX=$CATALINA_HOME/bin/bootstrap.jar:$CATALINA_HOME/bin/tomcatjuli. Jar:$CATALINA_HOME/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX# uses the HDFS superuser to create a working directory for Kylin on HDFS And give the server login # [root@node2 conf] # hdfs dfs-mkdir / kylin# [root@node2 conf] # hdfs dfs-chown-R root:root / kylin7, check kylin dependencies

Go to the bin directory and execute

[root@node2 bin] # cd $KYLIN_HOME/ bin [root @ node2 bin] #. / check-env.shRetrieving hadoop conf dir...KYLIN_HOME is set to / data/bigdata/kylin [root@node2 bin] #. / find-hive-dependency.shRetrieving hive dependency... [root@node2 bin] #. / find-hbase-dependency.shRetrieving hbase dependency...8, start the kylin service

Execute under the root directory of the kylin installation

[root@node2 bin] #. / kylin.sh startRetrieving hadoop conf dir...KYLIN_HOME is set to / data/bigdata/kylinRetrieving hive dependency...Retrieving hbase dependency...Retrieving hadoop conf dir...Retrieving kafka dependency...Retrieving Spark dependency...Start to check whether we need to migrate acl tables.. Omit some... 2018-06-05 17 zookeeper.ZooKeeper:684 12 INFO 10111 INFO [Thread-6] zookeeper.ZooKeeper:684: Session: 0x300346e6b9e000d closed2018-06-05 17 Swiss 12 zookeeper.ZooKeeper:684 10111 INFO [main-EventThread] zookeeper.ClientCnxn:512: EventThread shut down2018-06-05 17 zookeeper.ZooKeeper:684 12 close-hbase-conn 10210 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation 2068: Closing master protocol: MasterService2018-06-05 17 INFO 10211 INFO [close-hbase-conn] client.ConnectionManager$HConnectionImplementation:1676: Closing zookeeper sessionid=0x20034776a7c00042018-06-05 17 Visu 12 INFO 10214 INFO [close-hbase-conn] zookeeper.ZooKeeper:684: Session: 0x20034776a7c0004 closed2018-06-05 17 Glaze 1214 INFO [main-EventThread] zookeeper.ClientCnxn:512: EventThread shut downA new Kylin instance is started by root. To stop it, run 'kylin.sh stop'Check the log at / data/bigdata/kylin/logs/kylin.logWeb UI is at http://:7070/kylin[root@node2 bin] # processes started so far:

\ node1node2node3node4node5JDK ✔ QuorumPeerMain ✔✔✔ JournalNode ✔✔✔ NameNode ✔✔ DFSZKFailoverController ✔✔ DataNode ✔ NodeManager ✔ ResourceManager ✔✔ HMaster ✔ HRegionServer ✔✔✔ HistoryServer ✔ Master ✔ Worker ✔ Kafka ✔✔ RunJar ✔ # the RunJar ✔✔ # kylin process is launched when hive

After the service is started, the browser access address: http://IP:7070/kylin/

User name: ADMIN

Password: KYLIN

9. Configure the hive data source

1. Configure the data source

(1) Select Model-> Data Source-> Load Hive Table

(2) enter the table name format of the database in hive as: database name. Data table name

Such as: db_hiveTest.student, and then click Sync.

After the addition is successful, the effect is as follows:

10. Common mistakes

1. The interface cannot synchronize hive metadata.

The solution is under the kylin installation directory:

Execute the command: vim. / bin/kylin.sh requires the following modifications to this script:

Export HBASE_CLASSPATH_PREFIX=$ {tomcat_root} / bin/bootstrap.jar:$ {tomcat_root} / bin/tomcat-juli.jar:$ {tomcat_root} / lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX# add $hive_dependency to the path. 1. Zookeeper: [root@node1 ~] # cd / data/bigdata/ [root@node1 bigdata] #. / zookeeper_all_op.sh startZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/../conf/zoo.cfgStarting zookeeper. STARTEDnode1 zookeeper start doneZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/../conf/zoo.cfgStarting zookeeper... STARTEDnode2 zookeeper start doneZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/../conf/zoo.cfgStarting zookeeper... STARTEDnode3 zookeeper start done [root@node1 bigdata] #. / zookeeper_all_op.sh statusZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/../conf/zoo.cfgMode: followernode1 zookeeper status doneZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/../conf/zoo.cfgMode: leadernode2 zookeeper status doneZooKeeper JMX enabled by defaultUsing config: / data/bigdata/src/zookeeper-3.4.12/bin/ .. / conf/zoo.cfgMode: followernode3 zookeeper status done

You can see a leader, and the other is follower *

2 、 hadoop: [root@node1 bigdata] # cd / data/bigdata/src/hadoop-2.7.6/sbin/ [root@node1 sbin] #. / start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [node1 node2] node1: starting namenode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-namenode-node1.outnode2: starting namenode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-namenode-node2.outnode1: starting datanode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-datanode-node1.outnode2: starting datanode Logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-datanode-node2.outnode5: starting datanode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-datanode-node5.outnode3: starting datanode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-datanode-node3.outnode4: starting datanode Logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-datanode-node4.outStarting journal nodes [node1 node2 node3] node1: starting journalnode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-journalnode-node1.outnode2: starting journalnode, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-journalnode-node2.outnode3: starting journalnode Logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-journalnode-node3.outStarting ZK Failover Controllers on NN hosts [node1 node2] node2: starting zkfc, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-zkfc-node2.outnode1: starting zkfc, logging to / data/bigdata/src/hadoop-2.7.6/logs/hadoop-root-zkfc-node1.outstarting yarn daemonsstarting resourcemanager Logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-resourcemanager-node1.outnode1: starting nodemanager, logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-nodemanager-node1.outnode3: starting nodemanager, logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-nodemanager-node3.outnode5: starting nodemanager, logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-nodemanager-node5.outnode2: starting nodemanager Logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-nodemanager-node2.outnode4: starting nodemanager, logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-nodemanager-node4.out [root@node1 sbin] # jps16418 QuorumPeerMain18196 Jps17047 NameNode17194 DataNode17709 DFSZKFailoverController17469 JournalNode17999 NodeManager [root@node1 sbin] #

If it is not started, start it separately under the corresponding server.

Resourcemanager: you need to launch [root@node3 sbin] # / yarn-daemon.sh start resourcemanagerstarting resourcemanager, logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-resourcemanager-node3.out [root@node3 sbin] # jps15968 Jps14264 QuorumPeerMain14872 NodeManager14634 DataNode15723 ResourceManager14749 JournalNode [root@node3 sbin] # [root@node4 sbin] # / yarn-daemon.sh start resourcemanagerstarting resourcemanager separately Logging to / data/bigdata/src/hadoop-2.7.6/logs/yarn-root-resourcemanager-node4.out [root@node4 sbin] # jps2995 NodeManager4004 ResourceManager4091 Jps2813 DataNode [root@node4 sbin] # 3 、 spark [root@node1 sbin] # cd / data/bigdata/src/spark-2.1.2-bin-hadoop2.7/sbin/ [root@node1 sbin] #. / start-all.sh starting org.apache.spark.deploy.master.Master Logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-node1.outnode5: starting org.apache.spark.deploy.worker.Worker, logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node5.outnode1: starting org.apache.spark.deploy.worker.Worker Logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node1.outnode4: starting org.apache.spark.deploy.worker.Worker, logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node4.outnode2: starting org.apache.spark.deploy.worker.Worker Logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node2.outnode3: starting org.apache.spark.deploy.worker.Worker, logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node3.out [root@node1 sbin] #. / start-history-server.sh starting org.apache.spark.deploy.history.HistoryServer Logging to / data/bigdata/spark/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-node1.out [root@node1 sbin] # 4, hbase [root@node1 ~] # cd / data/bigdata/src/hbase-1.2.6/bin/ [root@node1 bin] #. / start-hbase.sh starting master, logging to / data/bigdata/hbase/logs/hbase-root-master-node1.outnode3: starting regionserver Logging to / data/bigdata/hbase/logs/hbase-root-regionserver-node3.outnode2: starting regionserver, logging to / data/bigdata/hbase/logs/hbase-root-regionserver-node2.outnode1: starting regionserver, logging to / data/bigdata/hbase/logs/hbase-root-regionserver-node1.out [root@node1 bin] # END

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.