How to solve the problems encountered in KubeFlow 1.2.0 deployment 04/09 Update SLTechnology News&Howtos

How to solve the problems encountered in KubeFlow 1.2.0 deployment

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how to solve the problems encountered in the deployment of KubeFlow 1.2.0". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

KubeFlow is a Kubernetes-based big data and machine learning platform. Deployment reference:

KubeFlow 1.2.0 Mirror Cache (continuously updated)

KubeFlow 1.2.0 deployment (Ubuntu20.04 + k8s 1.21.0)

Prepare to deploy Istio service grid in advance

Reference https://istio.io/latest/docs/setup/getting-started/#download

Curl-L https://istio.io/downloadIstio | sh-cd istio-1.9.4# setting path. You can add ~ / .profile # export PATH=$PWD/bin:$PATHistioctl install-- set profile=demo-y#Add a namespace label to instruct Istio to automatically inject Envoy sidecar proxies when you deploy your application later:$ kubectl label namespace default istio-injection=enablednamespace/default labeled to deploy metallb local load balancer service.

Reference https://my.oschina.net/u/2306127?q=metallb

Reference https://github.com/rancher/local-path-provisioner#deployment

Quick installation:

Kubectl apply-f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

Sample Persistent Volume for deploying hostPath and using pod:

Kubectl create-f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pvc/pvc.yamlkubectl create-f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pod/pod.yamlKubeFlow 1.2 deployment legacy

There are some problems after deployment, and some services cannot be started. After inspection, several major problems were found:

The pvc was not created properly, causing the related service pod to fail to run.

The download policy for some images is set to always, but it is located on gcr.io, causing the download to fail.

Missing mirror kfam/kfserving.

Let's solve these problems one by one.

1. Delete and rebuild pvc

Because there is no network storage service, local path is used.

Delete all pvc under namespace kubeflow, mainly including:

Katib-mysql

Metadata-mysql

Minio-pvc

Mysql-pv-claim

Then add the StorageClass configuration parameter (local-path is used here) to recreate the pvc.

Then delete the relevant pod from the pod list, let the system recreate it automatically, and return to normal after a period of time.

2. Modify the image download policy

The download policy for some images is set to always, but it is located on gcr.io, causing the download to fail.

Modify it manually first to verify whether it is feasible.

View the "ImagePullBackOff" status in the pod.

Then find the corresponding deployment or stateful set.

Select "Edit" in K8s dashboard or use the command kubectl edit to modify the parameters.

Delete the corresponding pod and the system will automatically rebuild the pod according to the new policy.

Wait for a period of time, and the state of the corresponding pod returns to normal operation.

⚠️ Note: modify the parameters in deployment and stateful set. If you only modify the parameters of pod and replica set/control set, it will be overwritten and invalidated after reconstruction.

Later, you can modify it through the configuration parameters, which can be done at deployment time.

3. Make up for the missing image

After the above processing, it is found that there are still two pod that cannot be started:

Kfserving-controller-manager, mirrored as:

Gcr.io/kfserving/kfserving-controller:v0.4.1

Profiles-deployment, mirrored as:

Gcr.io/kubeflow-images-public/kfam:vmaster-g9f3bfd00

It turns out that the previous auto-generation script is missing (there are two images in a pod, only one has been extracted). Download separately, docker save is a tar file, and then download back docker load to each node on it.

It has been corrected in aliyun's mirror library and script:

Just use the "KubeFlow 1.2.0 mirror cache (continuous update)" method and script.

This is the end of the content of "how to solve the problems encountered in KubeFlow 1.2.0 deployment". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.