In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to use Spark in Hadoop. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
What is Spark
Spark is a general distributed parallel computing framework like Hadoop MapReduce, which is open source by UC Berkeley AMP lab. Spark has the advantages of hadoop MapReduce, but the biggest difference between Spark and MapReduce is that Spark is an iterative calculation based on memory-the intermediate output of Spark's Job processing can be saved in memory, so there is no need to read or write HDFS. In addition, a MapReduce has only two stages in the computing process, map and reduce, and the processing ends after processing, while in Spark's computing model, it can be divided into n stages, because it is memory iterative. After we have finished dealing with one stage, we can continue to deal with many stages, not just two stages.
Therefore, Spark is more suitable for MapReduce algorithms that need iteration, such as data mining and machine learning. It not only realizes the operator map function and reduce function and computing model of MapReduce, but also provides more abundant operators, such as filter, join, groupByKey and so on. Is a platform for fast and common cluster computing.
Spark is a platform for implementing fast and general-purpose cluster computing. It extends the widely used MapReduce computing model and efficiently supports more computing models, including interactive query and stream processing. Speed is very important when dealing with large data sets. An important feature of Spark is that it can be calculated in memory, so it is faster. Even with complex calculations on disk, Spark is still more efficient than MapReduce.
II. Installation of Scala (all nodes)
Download the installation package
Wget https://downloads.lightbend.com/scala/2.11.7/scala-2.11.7.tgz
Extract the installation package
Tar xf scala-2.11.7.tgz mv scala-2.11.7 / usr/local/scala
Configure the scala environment variable / etc/profile.d/scala.sh
# Scala ENVexport SCALA_HOME=/usr/local/scalaexport PATH=$PATH:$SCALA_HOME/bin
Make the scala environment variable effective
Source / etc/profile.d/scala.sh III, Spark installation (all nodes) 1, download and install # download installation package wget https://mirrors.aliyun.com/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz# extract installation package tar xf spark-2.3.1-bin-hadoop2.7.tgz mv spark-2.3.1-bin-hadoop2.7 / usr/local/spark2, configure Spark environment variables
Edit the file / etc/profile.d/spark.sh and modify it as follows:
# Spark ENVexport SPARK_HOME=/usr/local/sparkexport PATH=$PATH:$SPARK_HOME/bin:
Effective environment variable
Source / etc/profile.d/spark.sh IV, Spark configuration (namenode01) 1, configuration spark-env.sh
Edit the file / usr/local/spark/conf/spark-env.sh to read as follows:
Export JAVA_HOME=/usr/java/defaultexport SCALA_HOME=/usr/local/scalaexport HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoopexport SPARK_MASTER_IP=namenode01export SPARK_WORKER_MEMORY=4gexport SPARK_WORKER_CORES=2export SPARK_WORKER_INSTANCES=12, configure slaves
Edit the file / usr/local/spark/conf/slaves to read as follows:
Datanode01datanode02datanode033, synchronize configuration files to other nodes scp / usr/local/spark/conf/* datanode01:/usr/local/spark/conf/scp / usr/local/spark/conf/* datanode02:/usr/local/spark/conf/scp / usr/local/spark/conf/* datanode03:/usr/local/spark/conf/4, start Spark cluster
The Spark service uses only hadoop's hdfs cluster.
/ usr/local/spark/sbin/start- all.sh V, check 1, JPS [root@namenode01 ~] # jps14512 NameNode23057 RunJar14786 ResourceManager30355 Jps15894 HMaster30234 Master [root@datanode01 ~] # jps3509 DataNode3621 NodeManager1097 QuorumPeerMain9930 RunJar15514 Worker15581 Jps3935 HRegionServer [root@datanode02 ~] # jps3747 HRegionServer14153 Worker3322 DataNode3434 NodeManager1101 QuorumPeerMain14221 Jps [root@datanode03 ~] # jps3922 DataNode4034 NodeManager19186 Worker19255 Jps1102 QuorumPeerMain4302 HRegionServer2, Spark WEB interface
Visit http://192.168.1.200:8080/
3 、 spark-shell
At the same time, because shell is running, we can also visit WebUI to see the currently executed tasks through 192.168.1.200 virtual 4040.
Thank you for reading! This is the end of the article on "how to use Spark in Hadoop". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.