How to deal with the HA of Spark 04/18 Update SLTechnology News&Howtos

How to deal with the HA of Spark

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to deal with the HA of Spark". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Are Q1:Master and Driver the same thing?

The two are not the same thing. In Standalone mode, Master is used for cluster resource management and scheduling, while Driver is suitable for directing Executor on Worker to process tasks in a multi-line manner.

Master is located in the management node of the cluster, usually on the same node as NameNode.

Driver is generally located on the client, and the client generally does not belong to the cluster, but it is in the same network environment as the cluster, because the Driver in the client interacts frequently with the Executor in the cluster.

How to choose between Q2:Standalone and Yarn

Both Standalone and Yarn are systems for resource management. Standalone is a lightweight resource management and allocation method specially built for Spark, while Yarn is a general resource management framework of big data, which can be used not only to manage Spark vertex resource allocation, but also to manage resource management and allocation on other computing platforms that implement Yarn.

If there are multiple computing frameworks in the production system, such as Spark, MapReduce and Mahout, it is recommended to use Yarn or Mesos for unified resource management and scheduling. If only Spark is used, Standalone is recommended, and Yarn consumes more resources.

What about Q3:Spark 's HA?

For HA of Master, the Worker node is automatically HA in Standalone mode, and Zookeeper is generally used for HA of Master.

Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected "leader" and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master's state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications-applications that were already running during Master failover are unaffected

For Yarn and Mesos modes, ResourceManager generally uses ZooKeeper for HA

This is the end of the content of "how to deal with the HA of Spark". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.