CW3E Publication Notice

Improving Streamflow Simulation through Machine Learning-Powered Data Integration and Its Potential for Forecasting in the Western U.S.

November 7, 2025

A paper titled “Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S.” was recently published in the EGU’s Hydrology and Earth Science Systems. This study was conducted by Yuan Yang (CW3E), Ming Pan (CW3E), Dapeng Feng (Stanford University), Mu Xiao (CW3E), Taylor Dixon (CW3E), Robert Hartman (Robert K. Hartman Consulting Services), Chaopeng Shen (Pennsylvania State University), Yalan Song (Pennsylvania State University), Agniv Sengupta (CW3E), Luca Delle Monache (CW3E) and F. Martin Ralph (CW3E). This work was sponsored by the NOAA Cooperative Institute for Research to Operations in Hydrology (CIROH) project.

This work investigates a flexible machine learning-based data integration approach that incorporates recent observations to improve streamflow simulations in the Western U.S. These findings support the Advance Precipitation and Streamflow Prediction priority identified in CW3E’S 2025-2029 Strategic Plan.

Accurate streamflow simulation and forecasts are crucial yet remain challenging in the arid Western U.S. Although the forecasting practices employed by operational agencies have been historically successful, their skill has declined in recent years due to regional climate change and methodological limitations. Moreover, these approaches often require extensive manual expertise for domain-specific implementation and for incorporating new observational data. Recent advances in machine learning, particularly the Long Short-Term Memory (LSTM), have exhibited high accuracy in streamflow simulation and strong abilities to integrate observations to enhance performance. This study evaluated an LSTM-based data integration approach (DI-LSTM) that incorporates streamflow (Q) and snow water equivalent (SWE) observations to improve streamflow estimations across different lag times (1–10 d, 1–6 months) and timescales (daily and monthly) over hundreds of basins in the Western U.S. (Fig 1).

Results show that integrating Q at the daily scale yielded the most substantial improvements, with significantly improved median values and reduced spread across all performance metrics. The median Kling-Gupta Efficiency (KGE) across 646 basins increased to 0.96 with the integration of 1 d lagged streamflow and remained at 0.89 even with a 10 d lag. Integrating Q at the monthly scale also improved streamflow estimations, though to a lesser extent, with the median KGE increasing from 0.80 to 0.86 when integrating streamflow from 1 month ago (Fig. 2).

Integrating lagged SWE at the monthly scale led to better accuracy, whereas its integration at the daily scale did not improve streamflow estimations (Fig. 2). This finding reflects the long-term memory of snow processes in the hydrological cycle, which extends beyond short timescales. The benefits of integrating SWE were more pronounced in snow-dominated basins during the snowmelt season, highlighting its value for improving spring-summer flow estimations (Fig. 3).

Overall, the benefits of integrating different observations at different timescales for streamflow estimations can be roughly ranked as follows: daily DI(Q) > monthly DI(Q) > monthly DI(SWE) > daily DI(SWE).

With its high predictive performance, automation without the need for extensive domain-specific customization, and flexibility to ingest additional observations, the DI-LSTM approach demonstrates large potential for short-term (e.g., 1–10 d) and long-term (1–6 months) operational streamflow forecasts in the Western U.S.

Figure 1. (a) Study basins: blue dots stand for snow-dominated basins, orange dots stand for rain-dominated basins. (b) models: LSTM vs. DI-LSTM model. (c) DI-LSTM with data integration of N-step lagged observations.

Figure 2. Median KGE values of all experiments at the daily scale (left), monthly scale (middle) and monthly scale but only evaluation for April to July (right) over all basins. N on the x-axis stands for DI(Q-N) or DI(SWE-N) experiment.

Figure 3. Metric differences between DI(SWE-N) and LSTM over snow accumulation and snowmelt seasons (difference in KGE, CC, | RV-1 |, and | RB |) over snow-dominated basins. Δ| RV-1 | is used since the ideal value of RV is 1. Only median and interquartile range (25th–75th) are shown here. N stands for DI(SWE-N) experiment. The grey horizontal lines show zero.

Yang, Y., Pan, M., Feng, D., Xiao, M., Dixon, T., Hartman, R., Shen, C., Song, Y., Sengupta, A., Delle Monache, L., & Ralph, F. M. (2025). Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S. Hydrology and Earth System Sciences, 29(20), 5453–5476. https://doi.org/10.5194/hess-29-5453-2025