Wave information release large model intelligent computing software stack OGAI, to create efficient productivity for large model innovation 04/16 Update SLTechnology News&Howtos

Wave information release large model intelligent computing software stack OGAI, to create efficient productivity for large model innovation

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

On August 24, Inspur Information officially released OGAI, a large model intelligent computing software stack. OGAI (Open GenAI Infra) is a full-stack intelligent computing software stack that provides AI computing system environment deployment, computing scheduling guarantee and model development management capabilities for large model businesses. OGAI is developed by Inspur Information based on its own practice of large models and professional experience in serving customers. It aims to create efficient productivity for large model R & D and application innovation and accelerate the pace of innovation in generative AI industry.

Large model is the core technology of current general artificial intelligence industry development innovation. At present, there are more than 100 generative AI models released in China, and the pattern of "100 models competing for show" is emerging. However, the large model still faces many challenges in each link from research and development to application, such as the system full stack problem of large model computing power, compatibility adaptation problem, performance optimization problem, etc.

OGAI is built by Inspur Information from the actual requirements of current large model computing power construction, model development and application landing, adhering to the design principles of full stack process, full release of computing power and actual combat verification and refinement. OGAI consists of 5 layers, from L0 to L4 corresponding to the intelligent computing center OS of infrastructure layer, PODsys of system environment layer, AIStation of scheduling platform layer, YLink of model tool layer and MModel of multimode nanotube layer respectively.

L0 intelligent computing center OS: intelligent computing management platform for large model computing services, meeting the flexible AI computing management needs of multi-tenant bare metal-based. Among them, the efficient bare metal service supports the minute-level deployment of thousands of bare metal nodes and elastic expansion on demand, realizes one-click acquisition of heterogeneous computing chips, IB, RoCE high-speed network, high-performance storage and other environments, and realizes computing, network and data isolation to ensure business security.

L1 PODsys: an open source, efficient, compatible, and easy-to-use deployment solution for intelligent computing cluster systems. Focus on intelligent computing cluster deployment scenarios, comprehensively cover cluster system environment elements from OS and driver to system monitoring visualization and resource scheduling, select the most stable and widely compatible software version, simplify deployment process through a series of script tools, shorten computing power online cycle, and provide enterprise users with Expert Service for implementation installation service and cluster performance calibration.

L2 AIStation: Commercial AI scheduling platform for large model development. For the common training interruption problems in large model training, it can realize rapid positioning of training abnormalities and automatic continuation of breakpoints: through rapid positioning of chip, network card and communication equipment abnormalities or faults, it can realize global training pause and maintenance, automatic elastic replacement of hot standby computing power, and rapid CheckPoint reading of healthy nodes, and realize automatic continuation of breakpoints.

L3 Layer YLink: Efficient tool chain for data governance, pre-training, and fine-tuning of large models. Focus on the development process of data governance, pre-training and fine-tuning of large models, integrate the self-research tools and open source tools of Inspur information in large model development, such as Y-DataKit, Y-TrainKit and Y-FTKit, etc., and accelerate the training and development efficiency of large models through these diverse and perfect engineering and automation tools.

L4 Layer MModel: A nanotube platform that provides multi-model access, service, evaluation, and other functions. The core components include dataset management, model management and evaluation, which can facilitate developers and researchers to better manage multi-version and multi-type basic large models and task models, and comprehensively evaluate multiple models through diversified evaluation datasets and evaluation tasks.

Liu Jun, senior vice president of Inspur Information and general manager of AI&HPC, said:"OGAI provides a complete engineering and automation tool software stack, which will help more enterprises successfully cross the threshold of large model R & D and application and fully release the innovation productivity of large models. Inspur Information will carry out continuous innovation through the high degree of cooperation between software and hardware of intelligent computing system, continuously cultivate prosperous Yuanbrain ecology, promote the realization of 'helping hundreds of models, intelligent thousands', and accelerate the innovation of generative AI industry. "

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.