Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does Spark mean?

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you what Spark refers to. It is concise and easy to understand. It will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.

Spark is a general parallel computing framework similar to HadoopMapReduce opened by UC Berkeley AMP lab. Distributed computing based on map reduce algorithm in Spark has the advantages of HadoopMapReduce, but unlike MapReduce, the intermediate output and results of Job can be saved in memory, so it is no longer necessary to read and write HDFS, so Spark can be better applied to map reduce algorithms that need iteration, such as data mining and machine learning.

What is Spark

Spark is a memory-based big data analysis engine, which improves the real-time performance of data processing in big data environment. Spark only deals with the calculation of data, not the storage of data.

II. High availability deployment of Spark HA

* two solutions for Spark HA to solve Master single point of failure:

1. Single point of recovery based on file system (mainly for development or test environments)

two。 Zookeeper-based Standby Masters (for production mode)

* Spark HA highly available cluster deployment based on zookeeper

(1) im spark-env.sh

Comment out export SPARK_MASTER_HOST=hdp-node-01

(2) add SPARK_DAEMON_JAVA_OPTS to spark-env.sh as follows:

Spark.deploy.recoveryMode:

There are three recovery models (modes for Master restart):

(1) ZooKeeper

(2) FileSystem

(3) NONE

Server address of spark.deploy.zookeeper.url:ZooKeeper

Spark.deploy.zookeeper.dir: a file or directory that holds cluster metadata information. Including Worker,Driver and Application.

Note:

To start the spark cluster in normal mode, you only need to execute start-all.sh on the host. To start a spark cluster in high availability mode, you need to start the start-all.sh command on any node. Then start master separately on another node. Command start-master.sh.

III. Spark-Shell

Read local files

1. Run spark-shell-- master local [N] (N table threads)

two。 Write scala code

Sc.textFile ("file:///root///words.txt")

.flatMap (_ .split (")) .map ((_, 1)) .reduceByKey (_ + _) .collect

Read data on HDFS

1. Integrate spark and HDFS, modify configuration file spark-env.sh

Export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.6.4/etc/hadoop

two。 Start hdfs and restart the spark cluster

3. Upload a file to hdfs

4. Write spark program in scala language in spark shell to specify specific master address

1. Execute the startup command:

Spark-shell\

-- master spark://hdp-node-01:7077\

-- executor-memory 1g\

-- total-executor-cores 2

Default local mode if no master address is specified

What does Spark refer to above? have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report