Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the new features of Spark 1.4

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you what the new features of Spark 1.4 are, which are concise and easy to understand. I hope you can gain something through the detailed introduction of this article.

Overview of new features in Spark 1.4

Function introduction Scala & Apache Spark

After four RC versions, Spark 1.4 was finally released ahead of Spark Summit. This article briefly talks about the very important new feature and improvement in this version.

SparkR did not elaborate. For data scientists, she was looking forward to it. She came out shyly after a thousand calls. This obviously needs to be said in a separate article:)

Spark Core:

What do people care about most now? Performance and operation and maintenance! What affects performance most? You have to shuffle! What is the first priority of operation and maintenance? It has to be monitoring (don't talk about alert first)! 1.4 A lot of work has been done on both points. In 1.4, Spark provides REST API for applications to get all kinds of information (jobs / stages / tasks / storage info). It takes minutes to use this API to build your own monitoring. Not only that, DAG can now be visualized, and students who don't know how Spark's DAGScheduler works can now easily know the details of DAG. Let's talk about shuffle. As we all know, sort-based shuffle has become the default shuffe strategy since 1.2. shuffle based on sort does not need to open many files at the same time, and can also reduce the generation of intermediate files, but the problem is that a large number of java objects are left in the heap of JVM. Since 1.4, the output of the map phase of shuffle will be serialized. This will bring two benefits: 1, spill to disk files become smaller 2, GC efficiency greatly increased, some people will say that serialization deserialization will generate additional cpu overhead ah, in fact, the shuffle process is often IO-intensive operations, this cpu overhead is acceptable.

The much-anticipated tungsten wire project (Project Tungsten) also made its debut in 1.4. it introduced a new shuffle manager "UnsafeShuffleManager" to provide a cache-friendly sorting algorithm, among other improvements, aimed at reducing memory usage in the shuffle process and speeding up the sorting process. The tungsten wire project is bound to be the focus of the next two versions (1.5 and 1.6).

Spark Streaming:

Streaming has added a new UI to this version, which is a boon for Streaming users, with all kinds of details. When it comes to the Spark China Summit, TD, who was sitting next to me in the review part of the code, whispered "this is awesome" to me. By the way, this part is mainly done by Zhu Shixiong. Although Shixiong stood me up at the summit, we must thank him for bringing us such good characteristics! In addition, this version also supports the 0.8.2.x Kafka version.

Spark SQL (DataFrame)

Support the old ORCFile, although younger than Parquet, but less bug ah:) 1.4 provides a similar window function in Hive, or more practical. This time for the optimization of join is quite powerful, especially for that kind of larger join, we can experience. JDBC Server users must be very happy, because there is finally a UI to watch.

Spark ML/MLlib

ML pipelines graduated from alpha, everyone's enthusiasm for ML pipelines is really high. I am quite interested in Personalized PageRank with GraphX, and related to it is recommendAll in matrix factorization model. In fact, most companies still implement their own algorithms on Spark.

The above are the new features of Spark 1.4. have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report