In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces why Spark is so popular in the data science community, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
Today is 2019. I don't believe anyone who says that he has ten years of big data's working experience. How many years has it been since Spark was officially used? For those of you who have read the following article, you should know that the transfer of Apache Spark in 2012, even if he is the Committer of Spark, is only 7 years old.
If it is the Hadoop generation of elders in 2006, it must have 10 years of experience of big data, but it can only be said that he is a half-fooled big data engineer, because the era when there is a real-time big data platform should be counted from the official launch of Apache Spark in 2012.
Spark is the top project of Apache, and every move is under the attention of the whole community. All projects promoted by Apache are naturally more successful. It is a pity that Google did not extend Big Table, Map Reduce and GFS to Apache in time, but was won the first place by the latecomer Hadoop. If you want to know that Google has missed this good opportunity, you can read my article, "after Ant Financial Services Group OceanBase, Tencent has also made a big kill."
Initially, Spark hatched in the big data Laboratory (AMPLab) at the University of California (University of California) at Berkeley (Berkeley). Speaking of this laboratory, there are also two giant products, Apache Mesos and Alluxio. The viewer may not know much about these two products, it doesn't matter, and I'm not going to talk about it here. I'll talk about it in more detail later.
In 2006, Hadoop was based on Google's troika and was known before GCP. In addition to distributed storage to expand the storage capacity of commercial relational databases, Map Reduce is a great innovation, so that distributed computing has made new progress. But the principle of Map Reduce is doomed to its fatal flaw, the intermediate data set has to be saved, so that the strategic card of performance is lost. A leak was picked up by Spark's memory flexible distributed dataset (Resilient Distributed Dataset). So Spark was born in 2009, making up for the shortcomings of Hadoop performance, thus grabbing a piece of the market.
Hadoop was expected to be very high, pointing to machine learning and artificial intelligence. Scientists have tried to develop a software library for machine learning on Hadoop, but because of the fatal defect of storing intermediate data, many real-time computing projects ended up, and scientists made great progress in another project, called Mesos (distributed Cluster Management). They simply built Spark (distributed Computing) on Mesos to replace Hadoop.
Thus it can be seen that the reason why Hadoop is defeated by Spark is entirely due to the emerging demands of the market (machine learning and artificial intelligence). Spark was born to solve the dilemma of machine learning.
Of course, it's a bit imprecise to say that Spark defeats Hadoop, just like saying that Apple's iOS beats Google's Andriod. The two are complementary and meet different market needs. Spark and Hadoop only complement each other in application scenarios. After all, the hardware requirements for implementing Spark are much higher than Hadoop, so the cost is different. These are all things that the manufacturer will not tell you directly.
Hadoop was born 3 years before Spark, so how can Spark quickly seize its own market from Hadoop? Build your own distributed management from scratch, or take advantage of Hadoop's existing market, compatible with Hadoop, and just throw out your own distributed computing engine? Obviously, smart people will choose the latter, there is no need to build a wheel from scratch. So soon, it was quite easy for the community to accept Spark. The promotion of the community has also contributed to the delivery of Spark applications to a large extent.
The basic reasons for the popularity of Spark are almost done, so let's talk about some advanced applications. During the period from the occurrence of software to the present, it is really not which software can solve a problem, but which software can provide a whole set of application chain, use that one. So openness determines how far the software system can go.
Just like the programming language, the original Visual FoxPro, Visual Basic and Delphi are the most effective programming tools to solve the MIS system, but with the emergence of web and mobile application requirements, these tools can no longer keep up with the pace of demand development and are gradually abandoned by the market.
Throughout the current mainstream programming language, Java, Python, which is not all-inclusive, you can not only play with the traditional development of Cmax S, but also control the trend of Bhand S, and even deal with it in mobile applications. Spark is the same, in addition to playing with data CRUD (Create, Retrieve, Update, Delete), it can also match the current trend of data science, such as batch, real-time ETL, such as integrating a variety of data analysis, data mining algorithms, to efficiently complete machine learning.
While embracing in-memory distributed computing, Spark indirectly accommodates Spark Streaming, Spark Machine Learning (MLlib) Spark SQL and Spark GraphX. These components are a big synthesis of the current ecological needs of the Internet. It can be said that the entire data application chain, Spark provides a perfect solution, so it is not popular, there is no reason!
About why Spark is so popular in the data science community to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.