Analysis of how Kubernetes helps Spark big data 04/11 Update SLTechnology News&Howtos

Analysis of how Kubernetes helps Spark big data

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how Kubernetes helps Spark big data's analysis. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

An Overview of how Kubernetes helps Spark big data analyze

This article introduces a containerized data service Spark + OSS on ACK, which allows Spark distributed computing nodes to directly access Ali Cloud OSS object storage. With the deep integration of Aliyun Kubernetes container service and Aliyun OSS storage resources, Spark distributed memory computing is allowed, and the machine learning cluster directly analyzes and saves the results of big data on the cloud.

precondition

You have created a Kubernetes cluster through Ali Cloud Container Service. For more information, please see create Kubernetes Cluster.

Create a Spark OSS instance from the Container Service console

Use three clicks to create an instance of 1 master + 3 worker Spark OSS

1 Log in to https://cs.console.aliyun.com/

2 Click "Application Directory"

3 Select "spark-oss" and click "parameters"

Give your app a name, e.g. Spark-oss-online2

(required) fill in your oss_access_key_id and oss_access_key_secret

Worker: # set OSS access keyID and secret oss_access_key_id: oss_access_key_secret:

(optional) modify the number of work nodes Worker.Replicas:

4 Click "deploy"

5 Click "Kubernetes console" to view the deployment example

6 Click Services, view external endpoints, and click URL to access the Spark cluster

7 Test Spark cluster

Open a spark-shell

Kubectl get pod | grep worker

Spark-oss-online2-worker-57894f65d8-fmzjs 1/1 Running 0 44m

Spark-oss-online2-worker-57894f65d8-mbsc4 1/1 Running 0 44m

Spark-oss-online2-worker-57894f65d8-zhwr4 1/1 Running 0 44m

Kubectl exec-it spark-oss-online2-worker-57894f65d8-fmzjs-/ opt/spark/bin/spark-shell-master spark://spark-oss-online2-master:7077

Paste the following code to test the readability and writing of OSS using Spark

/ / Save RDD to OSS bucketval stringRdd = sc.parallelize (Seq ("Test Strings\ nTest String2")) stringRdd.saveAsTextFile ("oss://eric-new/testwrite12") / / Read data from OSS bucketval lines = sc.textFile ("oss://eric-new/testwrite12") lines.take (10) .foreach (println)

Test Strings

Test String2

CLI command line operation Setup keys and deploy spark cluster in one commandexport OSS_ID=export OSS_SECRET=helm install-n myspark-oss-- set "Worker.oss_access_key_id=" $OSS_ID " Worker.oss_access_key_secret= "$OSS_SECRET incubator/spark-osskubectl get svc | grep ossmyspark-oss-master ClusterIP 172.19.9.111 7077/TCP 2mmyspark-oss-webui LoadBalancer 172.19.13.1 120.55.104.27 8080:30477/TCP 2m above is how the Kubernetes shared by the editor helps Spark big data's analysis. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.