Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Application of Kubernetes in SAIC Cloud platform and AI

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Founded in 2015, Fan Yi Shangxing is a wholly owned subsidiary of SAIC, with three data centers in Shanghai, Nanjing and Zhengzhou (under construction), with more than 4000 physical servers and 10PB data storage, with a total area of nearly 9000 square meters.

Fanyi Shang Xing mainly provides users with cloud products and services such as flexible computing, storage network, big data, artificial intelligence and security, as well as industry solutions such as vehicle Internet of things, Internet of things and whole vehicles. Up to now, it has served more than 40 automobile enterprises, such as SAIC Group headquarters, SAIC passenger cars, SAIC Chase, Jian Logistics, Racer Travel and so on.

On November 13, 2018, the Cloud Native Service Grid (Istio) Enterprise Summit, the same event of KubeCon + CloudNativeCon co-sponsored by Rancher Labs, Huawei and CNCF, was held in Shanghai. Gong Hanshen, general manager of Fanyi Business Development Department of SAIC Group, gave a keynote speech at the summit, sharing how SAIC took advantage of the powerful capabilities of Kubernetes while lowering the threshold for the use of the system. Make Kubernetes technology diversified to meet the needs of users with different technical levels, and make use of the powerful capabilities of Kubernetes to support emerging services such as artificial intelligence.

Development background

Gong Hanshen, General Manager of Business Development Department of Fan Yi Shang Bank of SAIC Group

According to Gong Hanshen, general manager of SAIC Fanyi Business Development Department, the demand for Internet transformation in the automotive industry is mainly concentrated in two aspects: on the one hand, the automotive industry's demand for the native Internet cloud, such as car networking and shared travel, and so on, these Internet businesses have led to the transformation of the entire automotive industry, and most of this kind of ecology comes from the cloud system architecture, which belongs to the cloud native system. On the other hand, it comes from the internal operation needs of the company, the mode of operation can not achieve timely response to the rapid change needs of the Internet, repetitive infrastructure, complex system architecture and closed business system, it will cause a huge waste of resources and high operating costs.

When enterprises deploy their business to the cloud, this situation will be improved accordingly. In addition to reducing the overall IT investment cost, based on the advantages of cloud computing interconnection, it can also increase the data exchange between businesses. "from the strategic planning level of SAIC, we need to develop a large number of products with industry characteristics." Gong Hanshen analyzed: "in the process of going to the cloud, the cloud platform is not only the role of the resource provider, the most important thing is that it produces general technology and general business functions. This is a long-term development direction of SAIC Cloud platform."

The overall framework of Fan Yi Shang Xing's cloud platform

After clarifying the future development direction of the steam cloud platform, they developed an overall cloud platform framework, focusing on the main tasks on the two platforms. One is the basic service platform, which takes virtualization and data center as the technical core, provides standardized hardware to users in the form of virtual resources, and users calculate on demand in the resource pool. The second is the launch of the platform service, the container plus the scheduling system will constitute the operation basis of the platform service. When the platform service moves closer to the business layer, it will be abstracted out of the business platform, and when the platform service will move closer to the technical layer, it will be abstracted out of the technical platform. Whether it is a technical console or a business console, the running foundation is provided by virtualization and containers. Therefore, in the cloud data center, container has obviously become a core technology of SAIC Fanichi. It is not only a lightweight PaaS, but also a virtualization of smaller particles of the IaaS platform, providing the operating basis for the entire platform.

Practical course

From the perspective of timeline, SAIC's exploration and practice of container technology is closely related to the overall development of container technology.

The Application course of Container Technology in Fan Yi

In 2015, SAIC Fanyi Shang's development team ran some simple website applications using Docker. Developers ran some simple code on their laptops, packaged and pushed to Fanyi's virtual environment through containers, and started the packaged applications in seconds of response time. "We compared Docker with OpenStack and discussed internally whether containers will replace OpenStack." Gong Hanshen recalled: "due to the problems of container maturity and user maturity, we believe that containers can not replace OpenStack in the short term, but based on this attempt, we feel the advantages of containers in resource utilization and environmental consistency."

In 2016, SAIC Fan Yixing invested more energy on Docker and choreography system, and the development team investigated Rancher, Mesos+Marathon, Kubernetes and Docker+Swarm systems on the market, and different systems had certain differences in system maturity and difficulty of deployment, and finally chose Docker+Swarm to build enterprise clusters, and began the practice and promotion of using small-scale clusters to support overall marketing activities.

By 2017, with the increasing popularity and popularity of Kubernetes, the products are becoming more and more mature. SAIC Fanyi Shangxing development team built a small cluster of Kubernetes on a small internal scale and applied it to the scheduling of the entire GPU resource platform. After one or two years of attempt and exploration, SAIC Fan Yi Shang formally listed Kubernetes as an important product of the product line to support the operation of the whole container platform.

"in the early days of building the Kubernetes platform, we set the goal of the container platform from multiple dimensions." Gong Hanshen shared: "from the dimension of deployment, it must support the deployment of multiple data centers, must support the deployment of mainstream public cloud and private cloud platforms; from the perspective of resource scheduling, it must support mainstream CPU scheduling, and must provide storage and network interface in the form of open standards." From the perspective of tenant management, it must be able to provide multi-tenant resource quotas, so that tenants can schedule resources and image repositories within their own quotas; from the perspective of overall operation and management, it must provide a platform for unified docking of Kubernetes cluster management, be able to flexibly add or decrease Kubernetes clusters, and achieve simple monitoring functions. "

Selection of technical scheme for the first sail line

Through a series of exploration and practice, SAIC finally settled on the selection of container technology that best meets its own needs: building infrastructure based on physical servers, automating the deployment of the whole cluster using self-developed Kubernetes based on Ansible; realizing unified management and control of the platform through Rancher, docking with unified authentication system, and realizing advanced functions such as application deployment management, multi-tenancy, quota management, etc. At the network level, we choose to use the load balance of Calico BGP network + external L4L/7 to achieve the release form of a variety of applications; storage uses Nexenta and PortWorx in Swarm to create a distributed storage solution; finally, SAIC also carries out platform monitoring and external unified monitoring and alarm based on Prometheus.

Logical framework of Kubernetes platform

The Kubernetes cluster is integrated with SAIC Fanyi's user interface. Users can log on to the Saicmotor portal to directly use SAIC Fanyi's Kubernetes cluster, or manage the Kubernetes cluster through the cloud platform products around SAIC Fanyi Shangxing, such as application development log, log management, etc. On the other hand, the operation and maintenance staff of SAIC Fan Yi manage the underlying Kubernetes cluster through the management interface of Rancher.

"We have also had some situational thinking about how Kubernetes should be provided to users in a product way. Some people will regard Kubernetes as the management and control system of the data center, some people will regard it as the management system of task scheduling, and others will regard Kubernetes as a governance framework for micro-services. In different scenarios, the definition of Kubernetes is different." Gong Hanshen analyzed: "all this stems from Kubernetes's open multi-dimensional framework design concept and easy-to-use product features, so we understand it as an extensible and composable scheduling system framework."

In view of the product characteristics of Kubernetes and users' familiarity with Kubernetes, SAIC Fan Yi Shang has designed two kinds of product forms. One kind of product form is aimed at junior users, which encapsulates Kubernetes and provides services to users in another form. Users experience more application deployment and release capabilities based on container technology. The other is open to advanced users, who can own a Kubernetes cluster and quickly achieve it through one-click deployment, which can fully experience the features of Kubernetes.

AI application

After the landing of the internal project, SAIC has put forward higher requirements for the container platform in order to realize the development of L4 autopilot industrialization software and the construction of autopilot function in complex scenes.

"the platform must provide a complete AI software development process management system, including data management, model management, simulation testing, model compression and other system functions, and a closed loop of AI software development from training to reasoning." Gong Hanshen attributed this goal to two major requirements, one is AI training services, which will focus on data tagging, data storage, CPU training and distributed training, and the other is the AI model, including training services, managed publishing and model version management.

At the same time, this platform will be defined as a public training service platform at the group level, which will not only serve the intelligent driving department of SAIC, but also provide AI training services for enterprises such as vehicle logistics parts and components affiliated to the group. Then, for this platform, task scheduling and tenant isolation are indispensable.

When the SAIC Fanyi Shangxing development team carried out the technology selection, it was found that Kubernetes can perfectly realize the scheduling function of the resource layer and the task scheduling function of the service layer, and can also isolate the tenants' GPU resources and network storage resources. Finally, at the level of AI platform application, SAIC also chose Kubernetes to be applied to the ground.

AI platform business architecture

So, how does the whole AI platform implement the business? From a horizontal point of view, SAIC's AI platform customers are divided into three levels, the first is the human level, the second is the computing level, and the third is the data level. At the data level, a large amount of data will be generated, such as traffic signs, a large number of videos and images generated on rainy days, as well as some signal data, which will be collected and sent to the computing platform. finally, the SAIC Fan Yi labor division of labor will process and train these data, and finally produce an algorithm. From a vertical point of view, the human resources team of the platform is divided into three categories: one is the labeling team, which is mainly responsible for model services; the second is the algorithm team, and the third is the operation team, which plays a coordinating role.

Logical Framework of AI training platform

When Fan Yi's development team communicates with the business, the system logic framework of the AI platform is abstracted. The first layer is the basic level, which is composed of Kubernetes and storage; the second layer is the Service level, including scheduling algorithms and data processing; and another level is the foreground service layer, including data management and control processes, task issuance, and so on. When the Kubernetes receives the schedule, it will go to the Service level to schedule the Service module, and then the Kubernetes will allocate the GPU resources and storage resources to the Service module, and the Service module will carry out unified training and calculation, and finally return the results to the foreground users.

"all of these scheduled tasks in the service layer are stored in Kubernetes's image repository in the form of mirrors. Kubernetes plays the role of multi-task control scheduling and resource scheduling in the whole system. So this core is actually implemented by the underlying Kubernetes." Gong Hanshen added: "our team will package their computing power according to the algorithm team, store it in the image warehouse, and let them independently initiate task training through the front desk. This is the application of Kubernetes in SAIC's AI platform."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report