In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you what is the basic knowledge of Spark, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Before talking about Spark, I would like to suggest to friends who are interested in Spark that wanting to know, learn and use Spark,Spark 's official website is a good tool that can almost meet most of your needs. At the same time, it is recommended to learn scala language, mainly based on two points: 1. Spark is written in scala language, if you want to learn Spark well, you must study and analyze its source code, of course, other technologies are no exception. Writing Spark programs in Scala language is more convenient, concise and efficient than using Java. The book is back to the original, and the following is an overall introduction to the Spark biosphere.
Apache Spark is a fast, general-purpose, extensible, fault-tolerant, memory-based big data analysis engine. First of all, it is emphasized that Spark is currently a computing engine that processes data and does not store it. First, let's take a look at the core components of the current Spark biosphere:
This article first briefly introduces the usage scenarios of each component, and the author will explain the core components in detail later, which are all based on the Spark2.X version.
Spark RDD and Spark SQL
Spark RDD and Spark SQL are mostly used in offline scenarios, but Spark RDD can deal with both structured and unstructured data, but Spark SQL deals with structured data and internally processes distributed datasets through dataset
SparkStreaming and StructuredStreaming
It is used for streaming processing, but it is emphasized that Spark Streaming is based on micro-batch processing. Even though Structured Streaming has made some optimization in real-time, at present, compared with Flink and Storm,Spark, streaming processing preparation is indeed quasi-real-time processing.
MLlib
For machine learning, of course, pyspark also has applications based on python for data processing.
GraphX
For graph calculation
Spark R
Data processing and statistical analysis based on R language
Let's introduce the features of Spark.
Come on!
The implementation of DAG execution engine, based on memory iterative calculation to process data, Spark can save the intermediate results of data analysis in memory, so that there is no need to repeatedly read and write data from external storage systems. Compared with mapreduce, it is better suitable for machine learning, data mining and other scenarios that require iterative operations.
Easy to use
Support scala, java, python, R languages; support a variety of advanced operators (currently more than 80), so that users can quickly build different applications; support scala, python and other shell interactive query
Universal
Spark emphasizes one-stop solution, which integrates batch processing, streaming processing, interactive query, machine learning and graph computing, avoiding the waste of resources caused by the deployment of different clusters in a variety of computing scenarios.
Good fault tolerance
Fault tolerance is achieved through checkpoint in distributed dataset computing. When a certain operation fails, there is no need to recalculate from scratch [often checkpoint to HDFS]
Strong compatibility
Can run on Yarn, Kubernetes, Mesos and other resource managers, and implement Standalone mode as a built-in resource management scheduler, supporting multiple data sources
The above content is what are the basic knowledge points of Spark? have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.