What are the applications of Spark 04/24 Update SLTechnology News&Howtos

What are the applications of Spark

2026-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what are the applications of Spark". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

First, MapReduce is dying, Spark is dominant, Hadoop is not stiff.

Due to the high latency of Hadoop's MapReduce, Hadoop is unable to deal with many time-demanding scenes, and people criticize it more and more. Hadoop is unable to change the present and is dying. As in any field, death is a process, and Hadoop is exemplifying such a process. The death process of Hadoop has begun in 2012, and the weakness of iteration and algorithm is a hard wound.

Take five minutes to see what's going on in the world.

1. The four major commercial organizations that originally supported Hadoop have announced their support for Spark.

2Mather Mahout said in the previous stage that from now on they will no longer accept any form of algorithm implemented in MapReduce. On the other hand, Mahout announced that the new algorithm is based on Spark.

3The execution engine of Oryx, the machine learning framework of Hadoop Cloudera, will also be replaced by Spark by Hadoop's MapReduce.

4Jing Google has begun to shift the load from MapReduce to Pregel and Dremel-in fact, Mapreduce's paper was first abandoned and made public.

5Jing Facebook transfers the load to Presto.

Now many companies that used to use Hadoop deeply are turning to Spark. Domestic Taobao is a typical case.

Here, we use Yahoo, the most typical company in the world that uses Hadoop! For example, you can take a look at the architecture diagram of its data processing:

It is nothing more than two lines of struggle between real-time and non-real-time, after the introduction of Spark

With the passage of time and the excellent features of Spark's own flow processing, graph technology, machine learning and NoSQL query, Spark may completely replace the computing power of Hadoop, which represents the trend of all companies doing cloud computing, big data.

Hadoop is increasingly degenerated into an abstract container under a benchmarking file system.

Maybe some friends will ask, why doesn't Hadoop improve himself?

In fact, the Hadoop community has been improving Hadoop itself, the world is like this, inherent things are always strong, reform is not as fierce as the revolution.

1 the improvement of Hadoop2.0 Hadoop basically stays at the code level, that is, tinkering, which leads to the fact that Hadoop now has a deep "technical debt" and a heavy load; the expansion of Hadoop in this chapter is: once you can't control something, expand your scope to look forward to a complete inclusion relationship.

2 the computing model of Hadoop itself determines that all work on Hadoop should be translated into core phases such as Map, Shuffle and Reduce. Because each calculation has to read or write data from disk, and the real computing model needs to be transmitted over the network, this leads to more and more unbearable delay. At the same time, no task can be run until the previous task is finished, which directly leads to its inability to support interactive applications.

So why not rewrite a better Hadoop altogether? The answer is that the advent of Spark makes it unnecessary.

Spark is the next generation cloud computing big data core technology to replace Hadoop after Hadoop. At present, SPARK has built its own entire big data processing ecosystem, such as streaming processing, graphics technology, machine learning, NoSQL query and other aspects have their own technologies, and is the top Project of Apache. It can be expected that there will be explosive growth in community and business applications from the second half of 2014 to 2015.

Some large foreign Internet companies have deployed Spark.

Even Yahoo, an early major contributor to Hadoop, now deploys Spark in multiple projects.

Domestic Taobao, Youku Tudou, NetEase, Baidu, Tencent and so on have used Spark technology in their commercial production systems, and the applications at home and abroad are becoming more and more extensive.

This is the end of the content of "what are the applications of Spark"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.