In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to integrate spark hadoop, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
The Spark application consists of two parts
1.Driver 2 Executor
Basic concepts of Spark
Application Spark-based user program that contains a Driver and multiple executor of a cluster
Driver program runs the main function of Application and creates a SparkContext usually using SparkContxet to represent Driver Programe
Executor is a process running on work node for an Application, which is responsible for running task and storing data in memory
Or on disk, each Application has its own independent executor
Cluster manager. The external services for obtaining cluster resources on the cluster are listed as Standalone,Mesos and yarn.
Any node in a worker node cluster that can run application code
Task is sent to an executor unit of work
Job contains parallel operations made up of multiple task, often spawned by spark action, a term that can often be seen in logs
Rdd is the basic computing unit of spark and can be operated by a series of operators, mainly transaformation and action operations.
Parallelization of scala sets
Spark uses the parallelize method to convert to RDD
Val rdd1=sc.parallelize (Array (1, 2, 3, 4, 5)
Val rdd2=sc.parallelize (List (0 to 10), 5)
The parameter is slice to slice the dataset, and each slice starts a task for processing
Spark support text files can support reading local files, support the entire directory to read, compressed file read gzip wildcard read the second parameter is shard optional
Use wholeTextFiles to read small files in the directory
You can convert sequenceFile to rdd using sequenceF
Use the hadoopRDD method to convert any other hadoop input type to RDD
Broadcast variable
Broadcast variables are cached in the memory of each node instead of each task
After the broadcast variable is created, it can be called in any running function
Broadcast variables are read-only and cannot be modified after broadcast
For broadcasts of large datasets, spark tries to use efficient propagation algorithms to reduce communication costs.
Method of use val broadcastVar=sc.broadcast (Array (1 ~ 2 ~ 3))
BroadcastVar.value
Accumulator
Accumulators only support addition operations
The accumulator can be efficiently parallel and can be used to realize the summation of counters and variables.
Spark supports native types and standard variable collection counters, but users can add new types
Only the driver can get the value of the accumulator
Usage
Val accnum=sc.accumulator (0)
Sc.parallelize (Array (1, 2, 3, 4)) .foreach (x = > accnum+=x)
Accnum.value
Spark wants to start start all
[root@localhost bin] #. / spark-submit-- master spark://127.0.0.1:7077-- class week2.SougoQA-- executor-memory 3G scala.jar hdfs://127.0.0.1:9000/dataguru/data/SogouQ1.txt hdfs://127.0.0.1:9000/dataguru/week2/output
. / spark-submit-- master-- class week2.SougoQA-- executor-memory 3G scala.jar hdfs://127.0.0.1:9000/dataguru/data/SogouQ1.txt hdfs://127.0.0.1:9000/dataguru/week2/output
Thank you for reading this article carefully. I hope the article "how to integrate spark into hadoop" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.