What is the mode of Spark 3.0 on Kubernetes 07/04 Update SLTechnology News&Howtos

What is the mode of Spark 3.0 on Kubernetes

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the mode of Spark 3.0 on Kubernetes". In daily operation, I believe many people have doubts about the mode of Spark 3.0 on Kubernetes. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what is the mode of Spark 3.0 on Kubernetes?" Next, please follow the editor to study!

With the release of Spark 3.0, native support for Kubernetes has been greatly enhanced, which facilitates the rapid deployment of Spark in cloud native environments and the management of running instances.

1. Standalone mode

The first feasible way for Spark to run on a Kubernetes cluster is to run Spark in Standalone mode, but soon the community came up with the running mode of Kubernetes native Scheduler, that is, Native mode.

2. Kubernetes Native mode

In short, Native mode is to make Driver and Executor Pod. Users submit the previous way of submitting Spark jobs to YARN to Kubernetes's apiserver, with the following submission command:

$bin/spark-submit\-master k8s:// https://:\-deploy-mode cluster\-name spark-pi\-class org.apache.spark.examples.SparkPi\-conf spark.executor.instances=5\-conf spark.kubernetes.container.image=\ local:///path/to/examples.jar

Where master is the apiserver address of kubernetes. After the submission, the entire job runs as follows: first start Driver through Pod, and then Driver will start the Pod of Executor. These methods should be understood by many people, so I will not repeat them. For more information, please refer to https://spark.apache.org/docs/latest/running-on-kubernetes.html.

3. Spark Operator

In addition to this way of submitting jobs directly to Kubernetes Scheduler, you can also submit them through Spark Operator. Operator is a milestone product in Kubernetes. In the early days of Kubernetes, how stateful applications were deployed on Kubernetes was a topic that officials did not want to talk about until the advent of StatefulSet. StatefulSet implements an abstraction for the deployment of stateful applications, simply ensuring network and storage topologies. However, state applications are very different, and not all applications can be abstracted into StatefulSet. Forced adaptation increases the mental burden of developers anyway.

And then Operator showed up. We know that Kubernetes provides a very open ecology for developers, and you can customize CRD,Controller or even Scheduler. And Operator is the combination of CRD and Controller. Developers can define their own CRD. For example, I define a CRD called EtcdCluster as follows:

ApiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdCluster" metadata: name: "example-etcd-cluster" spec: size: 3 version: "3.1.10" repository: "quay.io/coreos/etcd"

After the Operator of Etcd is submitted to Kubernetes, the fields in this yaml are processed, and finally an etcd cluster with 3 nodes is deployed. You can view the distributed applications that currently implement Operator deployment in this repo: https://github.com/operator-framework/awesome-operators of github.

GCP, the Google cloud platform, has opened up Spark's Operator,repo address: GoogleCloudPlatform/spark-on-k8s-operator on github. Operator deployment is also very convenient, using Helm Chart deployment as follows, you can simply think of it as deploying a Kubernetes API Object (Deployment).

$helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator$ helm install incubator/sparkoperator-- namespace spark-operator

The CRD involved in this Operator is as follows:

If I want to submit an assignment, then I can define the following yaml of SparkApplication. For the meaning of the fields in yaml, please refer to the CRD document above.

ApiVersion: sparkoperator.k8s.io/v1beta1kind: SparkApplicationmetadata:... spec: deps: {} driver: coreLimit: 200m cores: 0.1 labels: version: 2.3.0 memory: 512m serviceAccount: spark executor: cores: 1 instances: 1 labels: version: 2.3.0 memory: 512m image: gcr.io/ynli-k8s/spark:v2.4.0 mainApplicationFile: local:///opt/spark/examples/jars/spark -examples_2.11-2.3.0.jar mainClass: org.apache.spark.examples.SparkPi mode: cluster restartPolicy: type: OnFailure onFailureRetries: 3 onFailureRetryInterval: 10 onSubmissionFailureRetries: 5 onSubmissionFailureRetryInterval: 20 type: Scalastatus: sparkApplicationId: spark-5f4ba921c85ff3f1cb04bef324f9154c9 applicationState: state: COMPLETED completionTime: 2018-02-20T23:33:55Z driverInfo: podName: spark-pi-83ba921c85ff3f1cb04bef324f9154c9-driver webUIAddress: 35.192.234.248 webUIPort: 31064 webUIServiceName: spark-pi- 2402118027-ui-svc webUIIngressName: spark-pi-ui-ingress webUIIngressAddress: spark-pi.ingress.cluster.com executorState: spark-pi-83ba921c85ff3f1cb04bef324f9154c9-exec-1: COMPLETED LastSubmissionAttemptTime: 2018-02-20T23:32:27Z

Submit the assignment.

$kubectl apply-f spark-pi.yaml

By comparison, Operator's job submission method seems more lengthy and complex, but it is also a more kubernetes-based api deployment, that is, Declarative API, declarative API.

4. Challenges

Basically, most companies in the market use the above two methods to do Spark on Kubernetes, but we also know that the Native support for Kubernetes in Spark Core is not particularly mature, and there are many areas that can be improved. Here are a few simple examples:

1.scheduler difference

Resource schedulers can be simply classified into centralized resource schedulers and two-level resource schedulers. The two-level resource scheduler has a central scheduler responsible for macro resource scheduling, and the scheduling for an application is done by the following partition resource scheduler. The two-level resource scheduler often has a good support for the management and scheduling of large-scale applications, such as performance, the shortcomings are also obvious, and the implementation is complex. In fact, this design idea has been applied in many places, such as tcmalloc algorithm in memory management and memory management implementation in Go language. Big data's resource scheduler Mesos/Yarn can be classified as a two-level resource scheduler to some extent.

The centralized resource scheduler responds and makes decisions for all resource requests, which will inevitably lead to a single point of bottleneck after the size of the cluster is large, no doubt. But Kubernetes's scheduler is also different in that it is an upgraded version, a centralized resource scheduler based on shared state. Kubernetes achieves the high performance of the scheduler by caching the resources of the whole cluster to the local scheduler and making an "optimistic" allocation (assume + commit) according to the state of the cached resources during resource scheduling.

To some extent, the default scheduler of Kubernetes can not meet the job scheduling requirements of match Spark. A feasible technical solution is to provide another custom scheduler or direct rewriting. For example, one of the participants in the Spark on Kubernetes Native mode, big data's Palantir, has opened up their custom scheduler,github repo: https://github.com/palantir/k8s-spark-scheduler.

2. Shuffle processing

Since the Shuffle data of Kubernetes's Executor Pod is stored in PV, if the job fails, you need to re-mount the new PV to calculate from scratch. To solve this problem, Facebook proposed a Remote Shuffle Service solution, which is simply to write Shuffle data on the remote end. Intuitively, how can it be faster to write far-end than to write locally? One of the benefits of writing on the remote end is that there is no need to recalculate when Failover, which is useful when the data scale of the job is unusually large.

3. Cluster scale

Basically, it is certain that Kubernetes will have a bottleneck when the cluster size reaches 5, 000 units, but in the early days, when Spark published a paper, it claimed that the Spark Standalone model could support 10, 000 units. The bottleneck of Kubernetes is mainly reflected in master, such as etcd and apiserver based on raft consistency protocol, which are used for metadata storage. At the past 2019 Shanghai KubeCon Conference, Alibaba made a session about improving the performance of master: understand the scalability and performance of Kubernetes Master, and those who are interested can understand it by themselves.

4.Pod eviction (Eviction) problem

In Kubernetes, resources are divided into compressible resources (such as CPU) and incompressible resources (such as memory). When incompressible resources are insufficient, some Pod will be expelled from the current Node node. When using Spark on kubernetes, a large domestic factory encountered the failure of Spark jobs due to the lack of disk IO, which indirectly led to the failure of the whole test set. How to ensure that Spark's assignment Pod (Driver/Executor) will not be expelled? This involves the issue of priority, which has been supported since 1.10. But when it comes to priority, there is an inevitable problem is how to set the priority of our application? Generally speaking, online applications or long-running applications take precedence over batch job, but this is obviously not a good way for Spark jobs.

5. Job log

In Spark on Yarn mode, we can aggregation the logs and then view them, but in Kubernetes, we can only view them through Pod logs. If you want to interface with Kubernetes ecology, you can consider using fluentd or filebeat to summarize the logs of Driver and Executor Pod into ELK for viewing.

6.Prometheus ecology

Prometheus, as the second project of CNCF graduation, is basically the standard configuration of Kubernetes monitoring. At present, Spark does not provide Prometheus Sink. And the data reading mode of Prometheus is the way of pull, for batch job in Spark is not suitable to use pull, it may be necessary to introduce pushgateway of Prometheus.

At this point, the study of "what is the mode of Spark 3.0 on Kubernetes" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.