CW3E Publication Notice

Toward Calibrated Ensembles of Neural Weather Model Forecasts

April 23, 2025

Scientists from the CW3E machine learning team recently published an article titled “Toward calibrated ensembles of neural weather model forecasts” in the Journal of Advances in Modeling Earth Systems. This study was led by Jorge Baño-Medina (CW3E) and co-authored by Agniv Sengupta (CW3E), Duncan Watson-Parris (SIO), Weiming Hu (James Madison University), and Luca Delle Monache (CW3E). This work aligns with CW3E’s strategic goal to develop artificial intelligence (AI)-based predictive capabilities for extreme weather associated with atmospheric rivers (ARs). The work was supported by the California Department of Water Resources AR Program and the U.S. Army Corps of Engineers’ Forecast Informed Reservoir Operations.

The central question addressed in this study is whether AI data-driven models, that have revolutionized the domain of weather prediction in the past couple of years, can be used to generate very large, calibrated ensemble forecasts. Answering this question is crucial for advancing our ability to create sharp weather predictions and reliably quantify their uncertainty. This, in turn, provides avenues to improve probabilistic forecasting and the prediction of extreme events, which are critical for informed and effective decision-making.

AI-based weather models offer a transformative opportunity to produce significantly larger ensembles with far lower computational costs and in much shorter time frames than traditional dynamical systems. By overcoming these computational barriers, very large ensembles based on AI models can better represent the tail of the predictive distribution, which is associated with extreme events, when compared to traditional methods. Physics-based models still play a crucial role in the process of generating a prediction, because they are a key component (along with data assimilation and a wealth of observations) to generating the training data sets needed by AI models.

This study introduces an innovative methodology to generate ensemble forecasts using AI-driven global weather models. Our methodology accounts for two critical sources of uncertainty: model uncertainty and initial condition uncertainty (Figure 1). For model uncertainty, we use a novel technique of model checkpointing—sampling the model weights at various epochs during training—resulting in a diverse ensemble of 90 distinct models. With this approach the AI model needs to be trained only once. For initial condition uncertainty, we apply the breeding of growing modes technique, traditionally used in numerical weather prediction, to generate six bred vectors. This combination of 90 models and six bred vectors results in an unprecedented ensemble of 540 members, enabling three-dimensional global weather forecasts of several parameters up to 5 days in advance. Remarkably, this ensemble is generated in only 5 minutes on a single graphics processing unit (GPU), with even shorter times achievable with a larger number of GPUs.

Figure 1. Schematic of the proposed strategy to generate a 540‐member ensemble. An initial condition field, defined by the state of a set of atmospheric variables, is perturbed with 6 bred vectors (initial condition perturbations). Then the perturbed initial conditions feed each of them 90 distinct NWMs (model perturbations), to generate a 540‐member ensemble. Figure 1 from Baño-Medina et al. (2025).

Our approach exhibits significantly lower errors and better calibration—where the ensemble spread closely matches the error of the ensemble mean—compared to benchmark AI-based systems, and it rivals the world-leading physics-based/stochastic probabilistic systems of the European Centre for Medium-Range Weather Forecasts, for key atmospheric variables: air surface temperature, surface zonal wind velocity, total column of water vapor and geopotential at 500 hPa. To visualize the performance of the entire ensemble, the total column water vapor was averaged over the Feather River basin for an impactful atmospheric river over the West coast of North America from April 2018. Figure 2 displays the evolution of the 7‐day forecast for each of the 540 members from EnAFNO540 (turquoise), the ensemble mean of EnAFNO (blue) and ERA5 (black). The ensemble is able to accurately capture the evolution of the atmospheric river up to 7‐days of lead time.

Figure 2. Temporal series of the total column water vapor averaged over the Feather River Basin for an impactful AR of April 2018 for the 540 EnAFNO members (turquoise), the ensemble mean (blue) and ERA5 (black). Figure 4 from Baño-Medina et al. (2025).

In conclusion, this paper offers promising insights into the transformative potential of AI weather models for probabilistic forecasting, particularly in the context of extreme weather events.

Baño‐Medina, J., Sengupta, A., Watson‐Parris, D., Hu, W., & Delle Monache, L. (2025). Toward calibrated ensembles of neural weather model forecasts. Journal of Advances in Modeling Earth Systems, 17(4), e2024MS004734, https://doi.org/10.1029/2024MS004734.