Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Microsoft launched the first self-developed large model AI chip! TSMC 5nm, 105 billion transistors, OpenAI takes the lead in trial

2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Microsoft self-developed chip boots fell to the ground, and Huang Renxun wanted to create "AI Jietai SMC".

Author | ZeR0

Editor | Moying

According to the report on November 16, Microsoft unveiled two self-developed chips at the annual IT Professional and developer Conference Ignite in the early hours of this morning-- the cloud AI chip Microsoft Azure Maia and the server CPU Microsoft Azure Cobalt 100.

Maia 100is the first artificial intelligence (AI) chip designed by Microsoft for cloud language model training and reasoning. Using TSMC 5nm process, it has 105 billion transistors, optimized for AI and generative AI, and supports less than 8-bit data types (MX data types) implemented by Microsoft for the first time. Microsoft is already testing the chip with search engine Bing and Office AI products.

Cobalt 100 is the first CPU customized by Microsoft for Microsoft Cloud and the first complete liquid-cooled server CPU created by Microsoft. It is designed with Arm Neoverse CSS and 128 cores.

Microsoft has also customized an AI end-to-end rack with an "assistant" liquid cooler, which works similar to a car radiator.

▲ Microsoft Live Show AI end-to-end Rack

The two chips will be launched in Microsoft's data centers early next year, initially powering services such as Microsoft's Copilot or Azure OpenAI Service. Microsoft is already designing second-generation versions of Azure Maia AI chips and Cobalt CPU series.

These chips represent the last jigsaw puzzle for Microsoft to deliver infrastructure systems-everything from chips, software and servers to rack and cooling systems that are designed from top to bottom and can be optimized based on internal and customer workloads.

It is worth mentioning that the generative AI Super Unicorn OpenAI was the first to try out the Maia 100chip. The chip is being tested on GPT-3.5 Turbo.

"when Microsoft first shared their Maia chip design, we were excited and we worked together to improve and test it on our model," said Sam Altman, CEO of OpenAI. "Azure's end-to-end AI architecture is now optimized to the chip along with Maia, paving the way for training more capable models and making them cheaper for our customers."

In addition to releasing its own chips, Microsoft announced that it will expand its partnership with Nvidia and AMD in AI accelerated computing to provide customers with more price and performance choices.

Microsoft has released a preview of the new NC H100 v5 virtual machine series for the Nvidia H100 GPU, and will add the latest Nvidia H200 GPU next year to support larger model reasoning; and announced the addition of AMD MI300X accelerated virtual machines to Azure, designed to speed up AI workload processing for AI model training and generative reasoning.

Huang Renxun, founder and CEO of Nvidia, made a special trip to the scene to announce an AI contract manufacturing service that can help enterprises and startups deployed on Microsoft Azure to build their own customized large language models.

▲ Nadella shook hands with Huang Renxun when Microsoft CEO Nadella asked what the future direction of AI innovation would be, Huang Renxun replied: "generative AI is the most important platform transformation in the history of computing. Nothing so big has happened in the past 40 years. So far, it is bigger than a personal computer, bigger than a mobile phone, and will be bigger than the Internet."

01. Uncover Microsoft Core Lab: Microsoft's Redmond Park, which achieves maximum hardware utilization, hides a laboratory full of silicon, an essential part of the digital age. Over the years, Microsoft engineers have been carefully testing silicon through a multi-step process to secretly improve its methods.

▲ in Microsoft's Redmond Lab, a system-level tester simulates the operation of the chip in Microsoft's data center. The machine strictly evaluates each chip under real-world conditions to ensure that it meets performance and reliability standards. (image source: Microsoft) Microsoft believes that adding self-developed chips is a way to ensure that every element is suitable for Microsoft cloud and AI workloads. The chips will be installed on custom server motherboards and on custom racks as the racks are installed in existing Microsoft data centers.

The AI chip Microsoft Azure Maia 100 is designed to achieve absolute maximum utilization of hardware and will power some of the largest internal AI workloads running on Microsoft Azure.

Brian Harry, a Microsoft technician who leads the Azure Maia team, said that Maia 100s were designed for the Azure hardware stack, and this vertical integration-combining chip design with a larger AI infrastructure designed with Microsoft workloads in mind-can generate huge benefits in terms of performance and efficiency.

The Cobalt 100CPU is a 128-core server processor designed and built with Arm Neoverse CSS. According to Wes McCulloug, vice president of hardware product development at Microsoft, this is an optimized low-power chip design that provides higher efficiency and performance in cloud-based products.

The selection of Arm technology is a key factor in Microsoft's sustainability goal, which is to optimize the "per watt performance" of the entire data center, which essentially means more computing power per unit of energy consumed.

"preliminary tests show that our performance is 40 per cent better than the data center performance of existing commercial Arm servers." Rani Borkar, vice president of hardware systems and infrastructure at Microsoft Azure.

▲ 's first servers, powered by Microsoft Azure Cobalt 100 CPU, are located in a data center in Quincy, Washington. (image source: Microsoft) "We are making the most efficient use of transistors on silicon. Multiplying the efficiency of servers in all our data centers is a considerable number." McCulloug said.

02. Although Microsoft has been developing chips for its Xbox and HoloLens devices for more than a decade, its efforts to create custom chips for Azure only began in 2020.

Pat Stemen, partner project manager for the Azure hardware systems and infrastructure team, said that before 2016, most of the layers of Microsoft Cloud were ready-made, and then Microsoft began customizing servers and racks, reducing costs and providing customers with a more consistent experience. With the passage of time, silicon has become the main missing part.

The testing process of self-developed custom chip includes determining the peak performance of each chip under different frequency, temperature and power conditions, and more importantly, testing each chip under the same conditions and configuration of Microsoft real data center.

▲ in Microsoft's Redmond Lab, chips are being tested at the system level to simulate their use in actual production conditions before being installed on a server. (image source: Microsoft) the chip architecture announced today not only improves cooling efficiency, but also optimizes the use of its current data center assets and maximizes server capacity within the existing range.

For example, there is no rack to accommodate the unique requirements of the Maia 100 server motherboard, so Microsoft makes a wider data center rack from scratch. This extended design provides ample space for power and network cables to meet the unique needs of AI workloads.

▲ has customized racks for Maia 100 AI chips and its "buddies" in a hot room in Microsoft's Redmond Lab. When dealing with the computing needs of AI workloads, the new "assistant" circulates liquid between racks to cool the chip. (image source: Microsoft) large AI tasks require a lot of computing and consume more power. Traditional air cooling methods can not meet these needs, and liquid cooling has become the first choice to deal with these thermal challenges. But Microsoft's current data center is not designed for large liquid-cooled machines. So it developed an "assistant" next to the Maia 100rack.

These "assistants" work a bit like car radiators. The cold liquid flows from the side plate to the cold plate attached to the surface of the Maia 100chip. Each plate has channels through which liquid circulates to absorb and transport heat. This heat flows to the aileron, which removes heat from the liquid and sends it back to the rack to absorb more heat, and so on.

The ▲ cold plate is attached to the surface of the Maia 100AI chip. (image source: Microsoft) McCullough stressed that the in-line design of the rack and "assistant" emphasizes the value of the infrastructure system approach.

By controlling everything-from the low power concept of the Cobalt 100chip to the complexity of data center cooling-Microsoft can coordinate the harmonious interaction between each component to ensure that the whole is indeed greater than the sum of its parts in terms of reducing environmental impact.

Microsoft has shared its custom rack design experience with industry partners and can use it no matter what chip is installed in it.

"everything we build, whether it's infrastructure, software or firmware, we can use our own chips, or chips from our industry partners." "this is a choice made by our customers, and we are trying to provide them with the best choice, whether it's performance, cost or anything else they care about," McCullough shared.

Stemen says Microsoft's mission is clear: to optimize every layer of its technology stack, from core chips to terminal services.

"Microsoft's innovation will go further into chip work to ensure that the future of our customers' workloads on Azure gives priority to performance, energy efficiency and cost." "We intend to choose this innovation so that our customers can get the best experience from Azure today and in the future," he said.

During the conference, Microsoft also announced the full availability of one of the key elements-Azure Boost, a system that moves storage and networking processes from host servers to dedicated hardware and software, helping to speed up storage and networking.

03. Nvidia launched AI contract manufacturing service to assist in rapid customization and generation of AI model. at Microsoft Ignite conference, Nvidia also announced a new development-the launch of AI contract manufacturing service.

Huang Renxun, founder and CEO of Nvidia, had an 11-minute conversation with Microsoft CEO Nadella to share the comprehensive cooperation between Nvidia and Microsoft.

He said that generative AI is the most important platform transformation in the history of computing, and everything has changed because of generative AI. In the past 12 months, Microsoft and Nvidia have made every effort to speed up and work together to build the world's fastest AI supercomputer, which usually takes two or three years, and the two teams have built two of them in just a year, one at Microsoft and one at Nvidia.

"We will do what TSMC does for us for people who want to build their own proprietary big language models, and we will become the contract factory for AI models." Huang Renxun says companies need to customize models to implement expertise trained based on the company's proprietary DNA, or data, an AI contract manufacturing service that combines Nvidia's generative AI modeling technology, large language model training expertise and a giant AI factory.

AI contract manufacturing service can help enterprises build generative AI application customization models across industries (including enterprise software, telecommunications and media). Once ready for deployment, enterprises can use retrieval-enhanced generation (RAG) technology to connect their models to enterprise data. Nvidia built this feature in Microsoft Azure so that enterprises around the world can connect their custom models to Microsoft cloud services.

The service brings together three elements: the Nvidia AI basic model, the Nvidia NeMo framework and tools, and the collection of Nvidia DGX Cloud AI supercomputing services to provide an end-to-end solution for creating custom generative AI models.

Enterprises can then use Nvidia AI Enterprise software to deploy custom models to support generative AI applications, including intelligent search, summaries, and content generation.

Customers using Nvidia AI foundry services can choose from multiple Nvidia AI Foundation models, including the new Nvidia Nemotron-3 8B model series hosted in the Azure AI model catalog. Nemotron-3 8B has multilingual capabilities for building custom enterprise-generated AI applications.

Developers can also access the Nemotron-3 8B model in the Nvidia NGC directory as well as the community model, such as the Meta Llama 2 model optimized for Nvidia to speed up computing.

SAP SE, Amdocs, Getty Images, and others have all used this service to build custom models.

SAP plans to combine this service and optimized RAG workflow with Nvidia DGX Cloud and Nvidia AI Enterprise distribution software running on Azure to help customize and deploy its new natural language generation AI copilot Joule.

04. Conclusion: specific chip parameters have not yet been released, and it remains to be seen how they affect cloud service pricing. It may be that Microsoft has not released specific chip parameters or performance benchmarks in the early stage of deployment. Two new chips will be added to Microsoft Cloud's underlying hardware supply list to help meet the explosive demand for efficient, scalable and sustainable computing power.

Microsoft is building an innovative infrastructure with AI and is rethinking all aspects of the data center, including optimizing Azure hardware system flexibility, power, performance, sustainability, cost, optimizing and integrating each layer of the infrastructure stack to maximize performance and diversify its supply chain.

Self-developed AI chips can save Microsoft from over-reliance on a small number of head chip suppliers. The remaining questions are how quickly Microsoft will put the two chips on the shelves, and how they will help balance the burst of demand for a generative AI experience, and how they will affect the pricing of Microsoft's Azure AI cloud services.

This article comes from the official account of Wechat: ID:aichip001, author: ZeR0

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report