DeepMind medium-range weather forecast has more advantages than the world's top weather stations: a TPU 1-minute weather forecast for 10 days 07/04 Update SLTechnology News&Howtos

DeepMind medium-range weather forecast has more advantages than the world's top weather stations: a TPU 1-minute weather forecast for 10 days

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

The original title: "DeepMind is all over the sky!" AI medium range Weather Forecast hits the world's top weather stations: a TPU1 minute forecasts 10 days of weather. "

Because of the large amount of data in the "medium-range weather forecast", the quality of the prediction model has always been a problem. Recently, the new machine learning model launched by DeepMind and Google has won the weather forecast model by more than 99%.

As we all know, the reliability of traditional weather forecasts is somewhat indescribable.

Recently, DeepMind and Google have developed a weather simulator based on machine learning, which can predict the weather within 10 days in 60 seconds with high accuracy.

Paper address: https://arxiv.org/ abs / 2212.127941. GraphCast is an autoregressive model based on graph neural network, and its performance is better than that of the most accurate machine learning weather forecasting system (medium range weather forecast) in the world.

2. GraphCast only needs a single Cloud TPU v4 device to generate 10-day weather forecast (35GB data) within 60 seconds with a resolution of up to 0.25 °.

3. By training on larger, newer and higher quality data, the speed and accuracy of GraphCast prediction can be further improved.

In terms of short-term weather forecasting, DeepMind wrote on Nature in September 2021 that its generated model could beat other methods with an absolute advantage of 89%.

Why is the medium range Weather Forecast so difficult "medium range Weather Forecast" usually refers to the forecast of the trend of weather changes in the next 4 to 10 days. Its accuracy is very important for policy-making in agriculture, construction, tourism and other industries.

To this end, the international leading European Center for medium range Weather Forecast (ECMWF) provides up to four medium-range weather forecasts every day.

In the process of making medium-range weather forecasts, two key components need to be simulated using large-scale high-performance computing (HPC) clusters:

Forecast weather conditions by analyzing current and historical data collected by satellites, weather stations, ships, etc., that is, "data assimilation"

The numerical weather forecast (NWP) system is used to establish a model to predict how the weather-related variables will change with time.

However, with the significant increase in the amount of data, the NWP model can not be effectively extended.

In other words, although there are a large number of weather and climate observation files, it is difficult for us to directly use these data to improve the quality of forecasting models.

The method of improving NWP is generally by trained experts to create better models, algorithms and approximations manually, which is a time-consuming and costly process.

In contrast, machine learning-based methods can make use of more and higher quality available data to improve the accuracy of the model, and the computational budget is usually much lower.

In GraphCast's paper "GraphCast: learning medium-term Global Weather accurate Forecast", DeepMind uses graph neural network (GNN) to create an autoregressive model in the way of "coding-processing-decoding".

The three-phase simulation process of GraphCast is as follows:

1. Using GNN with directed edges from grid points to multi-grids, the input data of the original longitude and latitude grids are mapped to learning features on multi-grids.

two。 A deep GNN is used to transmit information for learning on multi-grids, where long-distance edges allow information to propagate effectively in space.

3. The decoder maps the final multi-grid representation back to the longitude and latitude grid and performs any necessary operations.

The results show that 99.2% of the 252 variables of GraphCast outperform the most accurate machine learning weather forecasting models available, and 90% of the 2760 variables exceed the European Meteorological Centre's High Precision Forecast (ECMWF HRES Forecast).

(a) the weather state entered is determined on a high-resolution latitude-longitude-barometric grid.

(B) the next state of weather forecast by GraphCast is the latitude-longitude-pressure level grid.

(C) to generate a series of states by iteratively applying GraphCast to each previously predicted state, representing the weather as a continuous amount of lead.

(d) the encoder component of the GraphCast architecture maps the input local area (green box) to the nodes of the multi-grid graph.

(e) the processor component updates each multi-grid node with the message passing learned.

(F) the decoder component maps the processed multi-grid features (purple nodes) to the grid representation.

ERA5 dataset GraphCast was trained on the corpus of 39 years'(1979-2018) historical weather data, namely ECMWF's ERA5 reanalysis dataset.

With a time step of 6 hours and a resolution of 0.25 °longitude and latitude, the model predicts 5 surface variables and 6 atmospheric variables for 10 days. Each variable represents the weather state of a specific place and time on 37 vertical pressure layers.

As shown in figure 1A, the researchers expressed the weather state at the time index t as

The grid around the earth corresponds to variables for each latitude, longitude, and pressure level. The surface and atmospheric variables are represented by the yellow and blue boxes in the enlarged view, respectively.

We will

The subset of variables in the 𝑖 corresponding to a specific grid point (a total of 1038240) is called

And call each of the target variables 𝑗

Generate prediction

GraphCast will have two weather conditions

As inputs, they correspond to the current time t, and the previous time tmai 1, respectively, and predict the weather state of the next time step (as shown in figure 1b).

To generate T-step predictions

GraphCast iterates the equation of the above figure in an autoregressive manner, taking its own prediction as input to predict the following step size (that is, prediction step 2, input is

Prediction step t + 3, input is

Figures 1b and c describe this process.

The core architecture of the architecture GraphCast uses GNN in the Encoding-processing-Decoding configuration, as shown in figures 1D, e, and f.

GNN-based learning simulators are very effective in learning the complex physical dynamics of fluids and other materials because their representation and computational structure are similar to those of finite element learning solvers.

A key advantage of GNN is that the structure of the input graph determines which parts of the representation interact with each other through learning messaging, thus allowing spatial interaction of arbitrary patterns in any scope.

In contrast, convolution neural networks (CNN) are limited to calculating interactions within a local patch (or, in the case of extended convolution, regularly span a longer range).

Although Transformer can also complete arbitrary remote computing, they can not be well extended when the input is very large (you know, there are more than 1 million grid points in the global input of GraphCast), because the interaction of all-to-all in computing will cause very complex secondary memory.

Contemporary extensions of Transformer often dilute possible interactions to reduce complexity, which makes them actually similar to GNN.

By introducing the internal multi-grid representation of GraphCast, the researchers used the capabilities of GNN to simulate arbitrary sparse interactions.

It has uniform spatial resolution in the global range and allows long-distance interaction within a small number of message delivery steps.

To construct a multi-grid, it is necessary to iterate a conventional icosahedron (12 nodes and 20 faces) for 6 times to get an icosahedral mesh hierarchy with 40962 nodes and 81920 faces at the highest resolution.

Because the coarse mesh node is a subset of the fine mesh node, the researchers can overlay all levels of edges in the grid hierarchy onto the grid with the lowest resolution.

This process produces a multi-scale grid set, with coarse edges bridging long distances on multiple scales and fine edges capturing local interactions.

Figure 1g shows each individual fine grid, while figure 1e shows the complete multi-grid.

An encoder of GNN,GraphCast with oriented edges from grid points to multi-grids (figure 1D) is used to first map the input data of the original longitude and latitude grid to learning features on the multi-grid.

Then, the processor (figure 1e) uses a 16-layer deep GNN to transfer learning information on a multi-grid, and the information can be effectively propagated spatially due to long-distance edges.

Then, the decoder (figure 1f) uses GNN with directional edges to map the final multi-grid representation back to the longitude and latitude grid, and combines the grid representation 𝑌 𝑡 + 𝑘 with the input state 𝑋 𝑡 + 𝑘 to form an output prediction, 𝑋 "𝑡 + 𝑘 + 1 = 𝑋" 𝑡 + 𝑘 + 𝑌 "𝑡 + 𝑘.

The training process GraphCast is trained to minimize the objective function of the ERA5 target in 12-step prediction (3 days), using the gradient descent method.

The objective function is as follows--

Using batch parallel technology, the researchers spent about three weeks training GraphCast on 32 Cloud TPU v4 devices.

In order to reduce memory footprint, the researchers also used complex gradient checkpointing strategies and low-precision numerical values.

The results show that GraphCast surpasses the HRES weather forecast technology in a 10-day forecast with a resolution of 0.25 °.

As shown in figure 4, GraphCast (blue line) is significantly better than HRES (black line) in 10 major surface and atmospheric variables.

In addition, the researchers used regional analysis to show that these results are consistent across the planet.

According to the results of the evaluation, 90.0% of the 2760 variables, grades and lead time (4 surface variables, plus 5 atmospheric variables x 13 levels, 10 days, 4 steps per day) of GraphCast outperformed HRES.

The researchers say that HRES tends to perform better than GraphCast at the upper atmospheric level, especially at stress level 50hPa, which is not surprising, since the total weight of training loss applied to 50hPa or below accounts for only 0.66 per cent of the total loss weight of all variables and levels.

When excluding 50hPa levels, the percentage of GraphCast better than HRES among 2240 targets is 96.6%; when excluding 50 and 100hPa levels, the percentage of 1720 targets is 99.2%.

Real weather and forecast weather of 10U Line 1 shows ERA5, line 2 shows HRES, line 3 shows GraphCast, lines 4 and 5 show the absolute values of errors between HRES and HRES-fc0, GraphCast and ERA5, respectively. The figure at the bottom shows the RMSE levels of HRES and GraphCast.

The effect of msl's real and predicted weather state autoregressive training on prediction when using fewer autoregressive steps, the model performs better in a shorter lead time, but worse in a longer lead time.

With the increase of the number of autoregressive steps, the performance becomes worse in a shorter lead time, but better in a longer lead time.

Performance comparison between GraphCast and top ML prediction models at present, Pangu-Weather based on ViT represents the latest level of weather forecasting based on ML, and its calculation model is similar to that of GNN.

The comparison between GraphCast and Pangu-Weather is shown in figure 8. Lines 1 and 3 show GraphCast (blue lines), Pangu-Weather (red lines), HRES's evaluation of HRES-fc0 (black lines), and absolute RMSE; lines 2 and 4 of HRES's evaluation of ERA5 show the normalized RMSE differences between models relative to Pangu-Weather.

It is concluded that the 10-day prediction of the GraphCast model exceeds the HRES of the most accurate deterministic system-ECMWF at the 6-hour step and the latitude and longitude resolution of 0.25 °.

The evaluation results of the combination of 2760 variables, pressure level and lead time show that the RMSE of GraphCast model is lower than that of HRES in 90.0%.

When the upper atmosphere of 100hPa and above is excluded, GraphCast outperforms HRES in 99.2% of the 1760 targets.

In addition, 99.2% of the 252 targets for GraphCast exceeded the previous best ML baseline, Pangu-Weather.

A key innovation of GraphCast is its novel "multi-grid" representation method, which enables it to capture longer spatial interactions than traditional NWP methods, thus supporting thicker original time steps.

This is part of the reason why GraphCast can generate accurate 10-day weather forecasts in 60 seconds in 6 hours on a Cloud TPU v4 device.

Reference:

Https://arxiv.org/abs/2212.12794

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era), editor: sleepy Aeneas

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.