Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Jing Dongyun launched the vGPU pooling scheme, which can greatly reduce the cost of large model reasoning.

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Thank you, Mr. Air, a netizen of CTOnews.com, for your clue delivery! According to CTOnews.com on August 15, JD.com Cloud has launched a vGPU pooling solution. "this solution can provide one-stop GPU computing power pooling, which can increase GPU utilization by up to 70%, and greatly reduce the reasoning cost of large models."

▲ image source Jingdongyun official account said that with its "self-developed hybrid multi-cloud operating system cloud ship", JD.com Cloud has further increased the scheduling management capabilities required for AI applications, including card management, node management, heterogeneous resource scheduling management, etc., on the basis of its original ability to support hybrid multi-cloud CPU computing power pooling and training for large models. Provide an one-stop computing pool solution for a variety of AI applications, including large model training, so as to improve resource utilization in an all-round way.

Jing Dongyun also said that his own pooling program has four major advantages, and the contents related to CTOnews.com transcription are as follows:

Flexible computing power segmentation: supports arbitrary proportional segmentation and dynamic adjustment mechanism, which can be divided into fine-grained video card computing power and video memory. A physical card can be used for multiple containers. Compared with the whole card computing power, the performance degradation is less than 2%.

Fine quota management: support flexible quota management, according to graphics card model / label quota, effectively ensure the allocation of resources on demand, improve reasoning stability and training performance.

Multi-scenario adaptation: adapt to mainstream CUDA versions and different GPU chips, and support mainstream AI training frameworks such as TensorFlows and Pytorch.

Multi-node management: support node virtual grouping and node group to specify the use of applications, comprehensively improve the efficiency of large model training.

Jing Dongyun said that in the usage scenario, developers can apply for resources according to the card model, split according to computing power and video memory, and be adjusted by the controller according to the scheduling policy specified by the user. Dynamic allocation is carried out only when training, fine-tuning and reasoning tasks are started, and can be released at the end of the task, which supports multi-task arithmetic isolation and task cold start.

From the perspective of practice, through the pooling of GPU heterogeneous resources, the operation efficiency of AI has been significantly improved, and the overall GPU utilization has increased by 70%. Combined with arbitrary segmentation and on-demand allocation, under the premise of the same number of GPU, it achieves several times of traffic expansion and resource sharing, reduces the hardware procurement cost, and uses fewer AI chips to support more training and reasoning tasks.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report