Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Auroral supercomputing promotes the development of generative AI, which will support running the largest large language model today!

2025-03-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

As lead architect and lead researcher of Aurora Supercomputing at Argonne National Laboratory, Olivier Franza played a leading role in the landing of this ambitious scientific instrument.

Aurora supercomputing is one of Intel's most high-profile projects recently, and it is challenging for Intel's entire system portfolio. In fact, Aurora supercomputing is not only the largest GPU cluster in the world, but is also expected to be the first supercomputing with peak performance of 20 billion floating-point operations per second (2 × 10 ^ 18).

As a veteran who has worked at Intel for 22 years, Olivier Franza still feels the pressure in the face of Aurora overcounting.

Olivier Franza joined the Aurora project as a system hardware architect in 2016 and became chief architect in 2021, witnessing a major shift from Aurora projects to GPU-based architectures.

"what the chief architect needs to do is to adjust the overall architecture of the supercomputing system according to the customer's high standards," Franza explained. "the chief architect will also focus on some basic parameters, such as overall performance metrics, power consumption, and some RAS (reliability, availability, maintainability) features, which are critical to building scalable systems."

Of course, the chief architect should focus on all aspects of the entire system, from nodes to racks to the entire system, as well as a variety of network and storage components.

A change in technological route creates an opportunity to shape future products.

Aurora Super is planned to adopt a range of Intel product technologies in early planning. With the adjustment of Intel's product line, the planning of Aurora has also changed.

When Intel announced the creation of a data center GPU product line, Franza participated in the design discussion of Intel data center GPU Max products.

So, Aurora super calculation is not one step to become like this. The construction process of Aurora supercomputing not only affects Intel's strategy and product line planning, but also enables Aurora supercomputing to solve scale and performance problems at a high level.

Franza said Intel made a lot of adjustments from components to systems to meet the needs of Aurora supercomputing.

For example, the architecture and concept of Intel Xeon CPU Max series processors are derived from some of the features of Intel Xeon Phi, which is the first product to integrate high-bandwidth and high-capacity innovative memory architecture in the package.

In addition, in pursuit of higher performance, various subsystems of Aurora supercomputing have made some progress, from the cooling of blade servers to high-density integration to storage.

It is worth mentioning that in this process, Intel also built a new storage system-DAOS (distributed Asynchronous object Storage).

Franza said that this is an open source project that can achieve high-speed storage on traditional hardware, and Aurora is one of the first users to use DAOS and is currently the largest user of DAOS deployment.

From designing components to connecting thousands of systems

The Aurora supercomputing program strengthens Intel's system-level thinking capabilities and promotes collaboration among Intel's internal business units, with external Argonne scientists and HPE engineers (HPE is another major participant in the project). There is a lot of cross-functional and cross-organizational collaboration.

"getting a whole team to work together to deliver supercomputers like Aurora is an once-in-a-lifetime experience for many of us." Franza said.

Although engineers installed the last blade server in June, Aurora will need to work around the clock for subsequent large-scale testing and stability verification.

Franza provides guidance to a large team responsible for the startup, verification, stabilization and optimization of Aurora supercomputing to maximize the performance of the system under load. One of the most noteworthy is the High Performance Linpack (HPL) benchmark, which is based on the Top500 list, which is full of the world's strongest supercomputing systems.

Every morning, Franza carefully examines the operation of each node at night and makes plans for the next day and beyond. Every afternoon, Franza will have a meeting to summarize the progress and problems encountered. This kind of work happens every day, and the machine is running all the time.

"We will systematically verify," Franza explained. "start with a single blade server, then move to rack size, and then to multiple rack sizes for large-scale verification."

Aurora supercomputing consists of 10624 blade servers and has 63744 Intel Max series GPU, making it the largest GPU cluster in the world. Of the 166racks, a total of 21248 Intel Xeon Max CPU was used.

According to Franza, the Aurora supercomputing center is about the size of four tennis courts and sounds big, but you won't really realize how big it is until you see it with your own eyes.

The first task of Franza is to ensure the stability, function and normal operation of the system. This is a very difficult task, and Franza has seen the dawn of victory.

Walking down the aisle of the data center, watching the lights flicker and the machine running properly, Franza felt refreshed, satisfied and fulfilled.

An once-in-a-lifetime effort to build a supercomputer to solve scientific problems

There are many difficulties and obstacles in building an influential scientific research supercomputer, but given the huge potential of Aurora supercomputing in cancer research and the opportunity to benefit everyone, Franza's sense of mission supported him.

Aurora supercomputing is not only used to solve some of the most complex scientific and engineering problems in the world, it is also an ideal platform for running generative AI and using generative AI for research.

It is understood that Aurora supercomputing will support the largest large-scale language model so far, that is, the 1 trillion parameter Aurora GenAI project, so as to improve the work efficiency of scientists and simplify their work.

What Franza does is a great thing, and what gratifies him in his work is teamwork and friendship.

Aurora is a huge project, which requires a lot of long-term efforts and a lot of perseverance.

I learned from Franza's introduction that its core team maintains a marathon mindset and can't relax until the last minute, and what the team needs is someone who can focus on challenging things for a long time, and these people end up achieving things that are difficult for most people to do.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report