Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of Machine Learning using Kubernetes in Kubeflow

2025-04-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the method of Kubeflow using Kubernetes for machine learning". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Kubeflow's method of machine learning using Kubernetes".

Manifesto of the Lady

Kubeflow is a machine learning component based on kubernetes environment introduced by Google. Through Kubeflow, resource types such as TFJob can be defined, and the process of distributed training model in TFJob can be completed like deploying applications. This article briefly introduces Kubeflow and its installation process.

one

Background introduction

Before introducing Kubeflow, let's briefly introduce the stages that a real machine learning model service needs to go through when it comes online, as shown in the following figure:

Each color in the image above represents the processing of a stage. It can be seen that the online service of a machine learning model goes through the following stages: data cleaning verification, data set segmentation, training, construction of verification model, large-scale training, model export, model service online, log monitoring and other stages. Computing frameworks such as Tensorflow solve some of the core problems, but there is still a long way to go before production, production, and enterprise-level machine learning project development. For example: data collection, data cleaning, feature extraction, computing resource management, model services, configuration management, storage, monitoring, logging and so on.

two

Brief introduction of Kubeflow Core components

Jupyter Multi-tenant NoteBook Service

Tensorflow/ [PyTorch] the main machine learning engine currently supported

Seldon provides deployment of machine learning models on Kubernetes

TF-Serving provides online deployment of Tensorflow models, supports version control and does not need to stop online services, switching models and other functions

Argo Workflow engine based on Kubernetes

Ambassador gateway for providing unified services (API Gateway)

Istio provides micro-service management, Telemetry collection

Ksonnet Kubeflow uses ksonnet to deploy the required k8s resources to the kubernetes cluster

Kubeflow takes advantage of Kubernetes

Native resource isolation

Cluster automatic management

Automatic scheduling of computing resources (CPU/GPU)

Support for a variety of distributed storage

Integrated more mature monitoring and alarm

The components involved in each stage of machine learning are combined in the way of micro-services and deployed in a containerized way to provide high availability and convenient expansion of each system of the whole process.

three

Kubeflow deployment installation

Server configuration

GPU card model: Nvidia-Tesla-K80

Network card: gigabit (note: when training big data set, gigabit network card will be the bottleneck)

Cephfs service configuration

Nic: 10 gigabytes (Note: when storing data through ceph, the ceph cluster needs to share the same computer room as Kubernetes, otherwise the delay will have a very high impact on loading datasets)

Software environment

Kubernetes version: v1.12.2 (Note: kube-dns needs to be installed)

Kubeflow version: v0.3.2

Jsonnet version: v0.11.2

Install ksonnet

Install Kubeflow

After all the installation steps above have been completed normally, check the startup status of kubeflow in the kubernetes cluster deployment resource object:

Through the status, we find that the service starts normally. Check the status of the pod of each service under each deployment:

Now that the services are normal, let's access the various components deployed by kubeflow to the K8s cluster through Ambassador.

Visit Kubeflow UIs

Because Kubeflow uses Ambassador as the unified external gateway of kubeflow, other internal services provide services through it. It is shown in the following figure:

Next, we use port-forwarding of kubectl to forward port to Ambassador Service and access Kubeflow locally:

Local localhost:8080 access through a browser:

Kubeflow UIs can be used for different functions, such as using Jupyter Notebook to calculate the whole process of the application: development, documentation, running code, and displaying results. You can also visit TF-operator for distributed multi-machine and multi-card training for Tensorflow-based models.

four

Summary

Now foreign Google, Microsoft, Amazon, Intel and domestic Aliyun, Huaweiyun and other companies are making efforts to Kubeflow, and combined with kubernetes for a variety of machine learning engines for multi-machine multi-card large-scale training, so that we can achieve the integration of GPU resources, and efficiently improve the utilization of GPU resources, and the efficiency of model training. And one-stop service is realized, and the whole workflow of machine learning service is implemented on Kubernetes platform. Reduce the other learning costs of machine learning algorithm students, concentrate on the algorithm. This is bound to bring higher challenges to the students of Devops.

At this point, I believe you have a deeper understanding of "Kubeflow's method of machine learning using Kubernetes". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report