Streamflow modelling and forecasting for Canadian watersheds using LSTM networks with attention mechanism
Neural Computing and Applications
https://doi.org/10.1007/s00521-022-07523-8
(0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL ARTICLE
Streamflow modelling and forecasting for Canadian watersheds using
LSTM networks with attention mechanism
Lakshika Girihagama1 • Muhammad Naveed Khaliq1
René Roy3 • Laxmi Sushama4 • Amin Elshorbagy5
•
Philippe Lamontagne1 • John Perdikaris2 •
Received: 16 December 2021 / Accepted: 7 June 2022
Crown 2022
Abstract
This study investigates the capability of sequence-to-sequence machine learning (ML) architectures in an effort to develop
streamflow forecasting tools for Canadian watersheds. Such tools are useful to inform local and region-specific water
management and flood forecasting related activities. Two powerful deep-learning variants of the Recurrent Neural Network
were investigated, namely the standard and attention-based encoder-decoder long short-term memory (LSTM) models.
Both models were forced with past hydro-meteorological states and daily meteorological data with a look-back time
window of several days. These models were tested for 10 different watersheds from the Ottawa River watershed, located
within the Great Lakes Saint-Lawrence region of Canada, an economic powerhouse of the country. The results of training
and testing phases suggest that both models are able to simulate overall hydrograph patterns well when compared to
observational records. Between the two models, the attention model significantly outperforms the standard model in all
watersheds, suggesting the importance and usefulness of the attention mechanism in ML architectures, not well explored
for hydrological applications. The mean performance accuracy of the attention model on unseen data, when assessed in
terms of mean Nash–Sutcliffe Efficiency and Kling-Gupta Efficiency is, respectively, found to be 0.985 and 0.954 for these
watersheds. Streamflow forecasts with lead times of up to 5 days with the attention model demonstrate overall skillful
performance with well above the benchmark accuracy of 70%. The results of the study suggest that the encoder–decoder
LSTM, with attention mechanism, is a powerful modelling choice for developing streamflow forecasting systems for
Canadian watersheds.
Keywords Streamflow forecasting LSTM Encoder-decoder architecture Attention-based models Deep learning
1 Introduction
Improved streamflow forecasting capability is important
for water management related activities, informing hydropower generation operations, flood risk management and
operational decision-making at local and regional scales.
Streamflow is the integrated result of highly nonlinear
physical processes that operate at multiple temporal and
& Muhammad Naveed Khaliq
1
National Research Council Canada, Ottawa, ON, Canada
2
Ontario Power Generation, Niagara Falls, ON, Canada
3
Hydro Météo, Notre-Dame-des-Prairies, QC, Canada
4
McGill University, Montreal, QC, Canada
5
University of Saskatchewan, Saskatoon, SK, Canada
spatial scales within a watershed. Traditionally, streamflow
forecasting is accomplished using process-based hydrological models. These models can range from simple
conceptual lumped models to complex physically based
distributed models. Conceptual lumped type models are
based on mathematical formulations of the physical processes involved in runoff generation at the watershed scale
(e.g., Streamflow Synthesis and Reservoir Regulation
model [1] and Soil and Water Assessment Tool [2]). These
models are considerably simplified based on reasonable
assumptions and they also do not capture the spatial variability of physical processes occurring within a watershed.
On the other hand, physically based distributed models can
capture to some extent the spatial variability of the nonlinear physical processes occurring within a watershed
(e.g., MIKE SHE model [3], WATFLOOD model [4–6],
Variable Infiltration Capacity model [7], and MESH model
123
Neural Computing and Applications
[8]). The precise way the process variabilities are handled
in mathematical formulations can vary significantly from
one model to another. Although process-based models
produce deterministic and plausible results in many
instances, uncertainty in parametrization and process
scaling deficiencies are some of the issues that degrade
their performance [9]. Undoubtedly, these models have
shown great value in forecasting streamflow in many
watersheds in different parts of the world, including
Canada [10–14]. Though calibration and testing of a process-based model for a given watershed can be achieved
with a greater detail and depth, transfer of the same model
for applications across other watersheds can compromise
the performance. This is due to the difficulty in the formulation of scale-dependent parameterizations of watershed relevant physical processes [15, 16] and that in turn
impacts model’s generalization ability.
With the growing availability of large amounts of spatial
and temporal data from remote sensing and numerical
weather prediction models (e.g., remotely sensed land use
data, reanalyses products and real-time meteorological
forecasts) and recent advances in computational power,
Machine Learning (ML) methods can also offer powerful
modelling options for developing data-driven streamflow
forecasting systems, with generalization abilities. This is
due to their ability to extract complex dynamical nonlinearities without explicitly defining the scale-relevant
physical processes, as in the case of hydrological models
discussed above. Hence, explicit definitions of governing
equations are not needed for these models. Instead, these
models map multivariate input space to an output space.
Data-driven methods can be categorized as time series
(statistical) methods and ML approaches. The statistical
models simply derive the relationship between variables to
formalize understanding and evaluation of a hypothesis
about the system’s behaviour [17]. The common statistical
methods under this category include autoregressive moving
average models [18], autoregressive integrated moving
average models [19–21], and many other variants of these
time series models. In these deductive methods, the
streamflow observations are assumed to be stochastic
sequences and hence, future streamflow can be predicted
by learning from past observations [22]. Although availability of very long records of observations are crucial for
accurate prediction of future streamflow, the applicability
of these models to real-time forecasting situations, however, remains limited due to lack of generalization ability
and cascading uncertainty in parametrizations [22]. ML
models on the other hand, have proven to overcome some
of the drawbacks associated with process-based and statistical modelling approaches. These inductive models are
developed based on data and are able to extract nonlinear
structures from data and can readily learn from inter-
123
variable interactions. Some perspectives (...truncated)