In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you what Spark refers to. It is concise and easy to understand. It will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.
Spark is a general parallel computing framework similar to HadoopMapReduce opened by UC Berkeley AMP lab. Distributed computing based on map reduce algorithm in Spark has the advantages of HadoopMapReduce, but unlike MapReduce, the intermediate output and results of Job can be saved in memory, so it is no longer necessary to read and write HDFS, so Spark can be better applied to map reduce algorithms that need iteration, such as data mining and machine learning.
What is Spark
Spark is a memory-based big data analysis engine, which improves the real-time performance of data processing in big data environment. Spark only deals with the calculation of data, not the storage of data.
II. High availability deployment of Spark HA
* two solutions for Spark HA to solve Master single point of failure:
1. Single point of recovery based on file system (mainly for development or test environments)
two。 Zookeeper-based Standby Masters (for production mode)
* Spark HA highly available cluster deployment based on zookeeper
(1) im spark-env.sh
Comment out export SPARK_MASTER_HOST=hdp-node-01
(2) add SPARK_DAEMON_JAVA_OPTS to spark-env.sh as follows:
Spark.deploy.recoveryMode:
There are three recovery models (modes for Master restart):
(1) ZooKeeper
(2) FileSystem
(3) NONE
Server address of spark.deploy.zookeeper.url:ZooKeeper
Spark.deploy.zookeeper.dir: a file or directory that holds cluster metadata information. Including Worker,Driver and Application.
Note:
To start the spark cluster in normal mode, you only need to execute start-all.sh on the host. To start a spark cluster in high availability mode, you need to start the start-all.sh command on any node. Then start master separately on another node. Command start-master.sh.
III. Spark-Shell
Read local files
1. Run spark-shell-- master local [N] (N table threads)
two。 Write scala code
Sc.textFile ("file:///root///words.txt")
.flatMap (_ .split (")) .map ((_, 1)) .reduceByKey (_ + _) .collect
Read data on HDFS
1. Integrate spark and HDFS, modify configuration file spark-env.sh
Export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.6.4/etc/hadoop
two。 Start hdfs and restart the spark cluster
3. Upload a file to hdfs
4. Write spark program in scala language in spark shell to specify specific master address
1. Execute the startup command:
Spark-shell\
-- master spark://hdp-node-01:7077\
-- executor-memory 1g\
-- total-executor-cores 2
Default local mode if no master address is specified
What does Spark refer to above? have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.