In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The official website of Spark describes Spark in the following concise language
We can extract the following information:
Spark is an engine
fast
Universal
Spark can be used to process data
The data is large-scale.
Spark itself does not provide data storage capacity, it is just a computing framework
Where is its rapidity reflected?
If the data is in memory, running MapReduce is more than 100 times faster than hadoop, and if the data is on disk, it is also 10 times faster than Hadoop.
Why is it fast? Spark uses an advanced execution engine: DAG-directed acyclic graph when processing data. And memory computing.
Easy to use:
You can use scala, java, Python and other languages to develop applications quickly. Spark provides more than 80 operations to easily build parallel applications. It only takes a few lines of code to complete the calculation of wordcount.
Versatility:
Spark provides big data one-stack solution. It includes flow computing, graph computing, machine learning, SQL and so on.
For development, maintenance, learning costs are greatly reduced.
Run anywhere:
Spark can run on Hadoop's YARN, Mesos, standalone, or on the cloud.
The data processed by Spark can be stored in HDFS, Cassandra, HBase, S3 and so on.
The development of Spark is very fast, the TimeLine is as follows
After Spark entered the Apache, it developed very rapidly. Versions are released frequently.
Spark ecosystem (BDAS, Chinese: Berkeley Analytical Stack)
MapReduce is one of the Hadoop ecosystems, while Spark is one of the BDAS ecosystems.
Hadoop includes MapReduce, HDFS, HBase, Hive, Zookeeper, Pig, Sqoop, etc.
BDAS includes Spark, Shark (equivalent to Hive), BlinkDB, Spark Streaming (message real-time processing framework, similar to Storm), and so on.
BDAS ecosystem map:
Comparison between MapReduce and Spark
Similarities and differences:
In basic principle
MapReduce is disk-based batch processing of big data.
Spark is based on RDD (resilient distributed dataset) data processing, and RDD can be stored in memory or on disk.
two。 On the model
MapReduce is suitable for processing very large datasets for batch processing. Suitable for long tasks with fewer iterations.
Spark is suitable for data mining, with a large number of iterations, such as machine learning and other iterative tasks.
3. Fault tolerance
At each iteration of MapReduce, the result needs to be written to the hard disk, and then the data calculation is read from the hard disk. As long as one step fails, the whole task will fail.
Spark uses DAG to split the task into many steps, and during each iteration, the data is written to memory. And Spark also provides fault tolerance.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.