In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Thanks to CTOnews.com netizen Mr. Air, South China Daniel Wu, Rain and Snow on the way, Brother Black fly's left hand clue delivery!
The GPU:HBM designed for the generative AI is 2.4x denser and 1.6x more bandwidth than the Nvidia H100.
Author | ZeR0
Editor | Moying
Core thing reported on June 14, early this morning, Nvidia's number one rival AMD, finally released the long-awaited AI move.
When Su Zifeng became AMD CEO in 2014, the chip company was on the verge of an existential crisis, cutting about 1x4 jobs and its share price hovering around $2. Then, under Su Zifeng's helm, AMD made a beautiful U-turn, with its share price soaring nearly 30-fold over the past nine years, creating checks and balances on Nvidia and Intel, two top chip giants.
With the generative AI hurricane sweeping the world, Nvidia GPU was snapped up by big companies, and the focus quickly shifted to AMD-can AMD produce enough AI chips to break Nvidia's near-monopoly market position and seize the new wave of AI?
Today, AMD handed over the periodic answer paper.
When showing the next-generation AI chip MI300X accelerator, Su Zifeng smiled and said, "I love this chip."
MI300X is a pure GPU version that uses AMD CDNA 3 technology and uses up to 192 GB of HBM3 high-bandwidth memory to accelerate large language models and generative AI computing.
AMD key customers will start testing MI300X in the third quarter and full production in the fourth quarter. Another model, Instinct MI300A, is now on sale to customers.
Su Zifeng said artificial intelligence is the "biggest and most strategic long-term growth opportunity" for AMD.
Live, AMD announced a new partnership with star AI unicorn company Hugging Face to optimize their models for AMD's CPU, GPU and other AI hardware.
In addition to the AI chip, AMD introduces a new EPYC server processor, code-named Bergamo, designed for cloud computing and very large users, with up to 128cores per slot and optimized for a variety of containerized workloads.
Executives from Amazon's cloud computing divisions AWS, Oracle Cloud, Meta and Microsoft Azure all came to the scene to share their feelings about using AMD chips and software in their data centers.
01. Accelerated generation of AI:192GB HBM3, a single GPU running model, AMD Instinct GPU has been adopted by many of the world's fastest supercomputers.
The MI300X Accelerator is a new member of the AMD Instinct MI300 family and provides a chip version with only GPU configuration.
MI300X and its CDNA architecture are designed for large language models and other advanced AI models, encapsulating 12 5nm chiplets together for a total of 153 billion transistors.
This new AI chip abandons APU's 24 Zen cores and I / O chips in favor of more CDNA 3 GPU and larger 192GB HBM3, providing 5.2TB / s memory bandwidth and unlimited 896GB/s bandwidth.
The MI300X's HBM density is 2.4 times that of the Nvidia H100 and its bandwidth is 1.6 times that of the Nvidia H100, which means that the AMD can run a larger model than the Nvidia chip.
AMD demonstrated running a large Falcon-40B language model with 40 billion parameters on a single MI300X GPU and asked it to write a poem about San Francisco.
"the size of the model is getting bigger and bigger, and you need multiple GPU to run the latest large language models," Su Zifeng said. As the memory on the AMD chip increases, developers will not need as much GPU.
Another MI300A, described by Su Zifeng as "the world's first APU accelerator for AI and high-performance computing", encapsulates multiple CPU, GPU and high-bandwidth memory and has 146 billion transistors on 13 chiplets.
MI300A uses 5nm and 6nm processes, CDNA 3 GPU architecture, with 24 Zen 4 cores and 128GB HBM3, providing more than 8x performance and 5x efficiency compared to MI250.
AMD also unveiled an AMD Infinity architecture. The architecture connects eight MI300X accelerators in a standard system that takes into account AI reasoning and training, and provides common 1.5TB HBM3 memory.
According to Taiwan media reports, AMD's Instinct MI300 series and Nvidia's H100 / H800 series GPU are using TSMC's advanced back-end 3D packaging method CoWoS, resulting in TSMC CoWoS capacity shortage will persist. TSMC currently has the capacity to process about 8000 CoWoS wafers a month, of which Nvidia and AMD together account for about 70 to 80 per cent.
In addition, one of Nvidia's key moats favored by developers in recent years is CUDA software. Victor Peng, president of AMD, also demonstrated AMD's efforts in developing software ecology.
AMD plans to adopt the concept of "Open, Proven, Ready" in the development of AI software ecosystem.
AMD's ROCm is a complete set of libraries and tools for optimizing the AI software stack. Unlike CUDA, this is an open platform.
AMD also shared the cooperation between PyTorch and ROCm. The new PyTorch 2.0 is almost twice as fast as the previous version. AMD is a founding member of the PyTorch Foundation.
AMD is constantly optimizing ROCm. "while this is a journey, we have made really great progress in building a powerful software stack that works with an open ecosystem of models, libraries, frameworks, and tools," said Victor Peng.
02. Cloud native processor Bergamo:128 core, 256 threads, highest vCPU density. Let's take a look at AMD's data center CPU.
Su Zifeng first shared the progress of AMD EPYC processors, especially in terms of cloud computing instances available around the world.
She stressed that AMD's fourth-generation EPYC Genoa processors are 1.8 times better in cloud computing workloads than Intel's competitors and 1.9 times better in enterprise workloads.
The vast majority of AI runs on CPU, and AMD says that compared with Intel Xeon 8490H, the fourth-generation EPYC is far ahead in performance, with a performance advantage of 1.9 times.
Su Zifeng said that cloud native processors are throughput-oriented and require the highest performance, scalability, computing density and energy efficiency.
The newly released Bergamo is the entrance to the cloud native processor market.
The chip has 82 billion transistors, providing the highest vCPU density.
Under the large radiator, there is a chip that looks very much like the previous EPYC, with a central I / O chip and eight core composite chips (CCD) like Rome or Milan.
Bergamo has as many as 128cores and 256threads in each slot, spread across 8 CCD, has twice as many cores per CCD as Genoa's 16 cores, uses a new Zen 4c core design that provides higher density than the standard Zen 4 kernel, and supports consistent x86 ISA.
"Zen 4c is optimized for the best balance between performance and power consumption, which provides us with better density and energy efficiency," Su Zifeng said in a speech. "as a result, the design area has been reduced by 35%, and the performance per watt has been significantly improved."
Bergamo is currently shipping to AMD's cloud customers. AMD also shared the performance, density, and energy efficiency and comparison of fourth-generation EPYC 9754 and Intel Xeon 8490H:
In addition to Bergamo's new core and Chiplet architecture, the processor has a lot in common with Genoa, including support for 12-channel DDR5 memory, the latest PCIe 5.0, single-socket or two-socket configurations, and so on.
However, multi-core is no longer just a unique feature of AMD processors. Not long ago, Ampere Computing, a newcomer to data center processors, launched the Ampere One series of processors with up to 192 single-threaded Ampere cores. Intel also plans to launch its core-optimized Xeon processor Sierra Forest in early 2024, which will have 144 high-performance cores built in.
AMD also showed off its latest cache stack X chip, codenamed Genoa-X, which is now available.
The chip is targeted at high-performance computing workloads, including computational fluid dynamics, electronic design automation, finite element analysis, seismic tomography, and other bandwidth-sensitive workloads that benefit from a large number of shared buffers.
Genoa-X CPU is based on AMD's standard Genoa platform and uses AMD 3D V-Cache technology to improve the available L3 cache by stacking SRAM modules vertically on each CCD.
The chip can provide up to 96 cores and a total of 1.1GB L3 cache, with one 64MB SRAM block stacked on each CCD.
According to data disclosed by AMD, the performance improvement of Genoa-X cache is 2.2 to 2.9 times higher than Intel's highest specification, 60-core Sapphire Rapids Xeon, in terms of various computational fluid dynamics and finite element analysis workloads.
The following figure shows the performance comparison between Genoa-X and Intel with the same number of cores:
03. At the end of the upcoming new DPU, AMD briefly introduces its network infrastructure.
Last year AMD bought Pensando for $1.9 billion and entered the DPU track. AMD explains how to use its DPU to reduce network overhead in the data center.
AMD calls its P4 DPU architecture "the smartest DPU in the world" and calls its Pensando SmartNIC an integral part of the new data center architecture.
AMD also demonstrated the smart switch jointly developed with Aruba Networks in the field. AMD plans to integrate P4 DPU offload into the network switch itself to provide rack-level services.
AMD's latest DPU, designed to offload networking, security, and virtualization tasks from CPU, will provide higher performance and energy efficiency than the current generation of P4 DPU.
Its DPU is supported by many major cloud providers such as Microsoft, IBM Cloud and Oracle Cloud, as well as software suites such as VMware hypervisors.
AMD plans to expand the list of compatible software and launch the Chip Software Development Kit before launching Giglio DPU later this year to make it easier for users to deploy workloads on their DPU.
04. Conclusion: by 2027, the market size of data center AI accelerators will exceed $150 billion. Nvidia and Intel, the leaders of global data center GPU and CPU, are emphasizing their ability to accelerate AI. AMD, the "number two" of the two tracks, is also competing to meet the growing demand for AI computing and to challenge Nvidia's dominance in emerging markets by launching a data center GPU that adapts to the latest demand.
The popularity of generative AI and large language models is pushing data centers to their limits. So far, Nvidia has an advantage in providing the technology needed to handle these workloads. Nvidia accounts for 95 per cent of the GPU market that can be used for machine learning, according to market research firm New Street Research.
"We are still in a very, very early stage of the AI life cycle," Su Zifeng predicts that the total potential market for data center AI accelerators will grow fivefold by 2027, from about $30 billion this year at a CAGR of more than 50 per cent to more than $150 billion in 2027.
AMD did not disclose the price of the two new MI300 chips, but that could put some price pressure on Nvidia, where the price of the H100 was rumored to be as high as $30000 or more.
This article comes from the official account of Wechat: ID:aichip001, author: ZeR0
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.