In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
since the company used big data products, rarely touch open source things, cluster problems are also to communicate with research and development, a few days ago, a friend asked me, how to change the underlying engine of hive to spark, I thought about it, is not to share the hive database with spark and then use spark-shell, and then check the information, it turned out that this is not the case, there are still a lot of operations. Ah, it is true that if you use other people's products, the development is convenient, and the principle is understood less. The editor has been immersed in a happy life of parallel execution of tasks with a spark program that can be converted at the bottom of SQL. Take advantage of the weekend, a person to enjoy the company's WiFi and air conditioning, replace this open source hive engine with spark, and share it with you, the most important thing is the transition from fat house to technology house.
Due to limited funds, can only use a virtual machine to demonstrate. Here, the editor introduces the platform environment of hadoop built by himself, and first takes you to review what processes need to be started in hadoopHA mode: (hadoop is version 2.7.x)
→ Namenode: (active-standby): master node of HDFS for metadata management and slave node management
Slave node of → Datanode:HDFS, used to store data
Master node of → ResourceMananger:yarn for resource scheduling
Slave node of → Nodemanager:yarn, which is used to perform specific tasks
→ Zookeeper: service orchestration (process name QuorumPeerMain)
→ JournalNode: sharing of metadata for master / slave namenode
→ DFSZKFailoverController: monitors the life and death of namenode and is always ready to switch between master and slave.
That's about all , a very ordinary hadoop platform, where the editor uses three virtual machines:
Services on each node:
Hadoop01:
Hadoop02:
Hadoop03:
complains about the uneven distribution of services, stop your keyboard, the editor is just a demonstration, in a hurry to build.
1. Test whether hive is working properly:
Here I have distributed the hive installation package on all three machines:
Execute the command to start hive: (how to get here quickly, don't use beeline)
[hadoop@hadoop01 applications] $hive
Try running a few commands:
Hive > use test; # enter the database hive > show tables; # to see which tables are hive > create external table `user` (id string,name string) row format delimited fields terminated by', 'location "/ zy/test/user'; # create table # Import data [hadoop@hadoop01 ~] $for i in `seq 100`; do echo" 10$ iGrainzySecreti "> > user.txt; done; [hadoop@hadoop01 ~] $hadoop fs-put user.txt / zy/test/userhive > select * from `user`
There is no problem with OK,hive!
2. Confirm the replacement of hive engine with spark (1) version
First check the compatibility of the hive and spark versions:
Here the editor's spark is 2.0.0 and Hive is 2.3.2.
Spark download address: https://archive.apache.org/dist/spark/spark-2.0.0/
Download address of Hive: http://hive.apache.org/downloads.html
Here is the spark that needs to be compiled by the hive module. Here, the editor will provide the compiled spark to you:
Link: https://pan.baidu.com/s/1tPu2a34JZgcjKAtJcAh-pQ extraction code: kqvs
As for hive, the official website will be fine.
(2) modify the configuration file # hive configuration (hive-site.xml: javax.jdo.option.ConnectionURL jdbc:mysql://hadoop03:3306/hivedb?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username To use against metastore database javax.jdo.option.ConnectionPassword 123456 password to use against metastore database hive.metastore.warehouse.dir / user/hive/warehouse hive.execution.engine spark hive.enable.spark.execution.engine true spark.home / applications/spark-2.0 .0-bin-hadoop2-without-hive spark.master yarn spark.eventLog.enabled true spark.eventLog.dir hdfs://zy-hadoop:8020/spark-log must have this directory spark.executor.memory 512m spark.driver.memory 512m Spark.serializer org.apache.spark.serializer.KryoSerializer spark.yarn.jars hdfs://zy-hadoop:8020/spark-jars/* hive.spark.client.server.connect.timeout 300000 spark.yarn.queue default spark.app.name zyInceptor one thing to note here is that Hadoop is HA mode So the path to hdfs should be written as: configuration of hdfs://cluster_name:8020/path#spark (spark-env.sh) #! / usr/bin/env bashexport JAVA_HOME=/applications/jdk1.8.0_73export SCALA_HOME=/applications/scala-2.11.8export HADOOP_HOME=/applications/hadoop-2.8.4export HADOOP_CONF_DIR=/applications/hadoop-2.8.4/etc/hadoopexport HADOOP_YARN_CONF_DIR=/applications/hadoop-2.8.4/etc / hadoopexport SPARK_HOME=/applications/spark-2.0.0-bin-hadoop2-without-hiveexport SPARK_WORKER_MEMORY=512mexport SPARK_EXECUTOR_MEMORY=512mexport SPARK_DRIVER_MEMORY=512mexport SPARK_DIST_CLASSPATH=$ (/ applications/hadoop-2.8.4/bin/hadoop classpath) (3) configuration of jar
① found the following jar package in hive lib and copied it to the spark jars directory:
Hive-beeline-2.3.3.jar
Hive-cli-2.3.3.jar
Hive-exec-2.3.3.jar
Hive-jdbc-2.3.3.jar
Hive-metastore-2.3.3.jar
[hadoop@hadoop01 lib] $cp hive-beeline-2.3.2.jar hive-cli-2.3.2.jar hive-exec-2.3.2.jar hive-jdbc-2.3.2.jar hive-metastore-2.3.2.jar / applications/spark-2.0.0-bin-hadoop2.7/jars/
② found the following jar package in spark jars and copied it to the hive lib directory:
Spark-network-common_2.11-2.0.0.jar
Spark-core_2.11-2.0.0.jar
Scala-library-2.11.8.jar
Chill-java
Chill
Jackson-module-paranamer
Jackson-module-scala
Jersey-container-servlet-core
Jersey-server
Json4s-ast
Kryo-shaded
Minlog
Scala-xml
Spark-launcher
Spark-network-shuffle
Spark-unsafe
Xbean-asm5-shaded
[hadoop@hadoop01 jars] $cp spark-network-common_2.11-2.0.0.jar spark-core_2.11-2.0.0.jar scala-library-2.11.8.jar chill-java-0.8.0.jar chill_2.11-0.8.0.jar jackson-module-paranamer-2.6.5.jar jackson-module-scala_2.11-2.6.5.jar jersey-container-servlet-core-2.22.2.jar Jersey-server-2.22.2.jar json4s-ast_2.11-3.2.11.jar kryo-shaded-3.0.3.jar minlog-1.3.0.jar scala-xml_2.11-1.0.2.jar spark-launcher_2.11-2.0.0.jar spark-network-shuffle_2.11-2.0.0.jar spark-unsafe_2.11-2.0.0.jar xbean-asm5-shaded-4.4.jar / Applications/hive-2.3.2-bin/lib/
Distribution of ③ configuration files
Put the yarn-site.xml and hdfs-site.xml of hadoop into the conf of spark
Also put hive-site.xml into the conf of spark
④ distributes jar packages
Configured in hive-site.xml: spark.yarn.jars
Here we first create this directory in hdfs:
[hadoop@hadoop01 conf] $hadoop fs-mkdir / spark-jars
Put all the jar packages in spark's jars into this directory:
[hadoop@hadoop01 jars] $hadoop-put. / jars/*.jar / spark-jars
⑤ starts spark
[hadoop@hadoop01 jars] $/ applications/spark-2.0.0-bin-hadoop2-without-hive/sbin/start-all.sh
At this point, these processes appear in this node:
(4) after completing the above steps:
Test, run a SQL in hive:
Test, run a SQL in hive:
Select count (1) from table; is generally used here to detect!
The Spark interface appears:
The interface of Yarn will include:
The above interface indicates that hive on spark is installed successfully!
4. Problem encountered: (version incompatible)
Reason: spark can not contain hive dependencies, remove-Phive to compile spark.
Solution: compiling spark
Here are the tutorials from the hive website:
# Prior to Spark 2.0.0: (he said priority on spark2.0.0 It is actually the compilation of the spark1.6 version). / make-distribution.sh-- name "hadoop2-without-hive"-- tgz "--Pyarn,hadoop-provided,hadoop-2.4,parquet-provided" # Since Spark 2.0.0:./dev/make-distribution.sh-- name "hadoop2-without-hive"-- tgz "- Pyarn,hadoop-provided,hadoop-2.7 Parquet-provided "# Since Spark 2.3.0:. / dev/make-distribution.sh-- name" hadoop2-without-hive "--tgz"-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided "
After the compilation is successful, you can execute the previous content.
Here the editor also has the compiled spark:
Link: https://pan.baidu.com/s/1tPu2a34JZgcjKAtJcAh-pQ extraction code: kqvs
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.