Death comes but why: A multi-task memory-fused prediction for accurate and explainable illness severity in ICUs
World Wide Web
https://doi.org/10.1007/s11280-023-01211-w
Death comes but why: A multi-task memory-fused prediction
for accurate and explainable illness severity in ICUs
Weitong Chen1 · Wei Emma Zhang1 · Lin Yue2
Received: 12 February 2023 / Revised: 15 February 2023 / Accepted: 13 September 2023
© The Author(s) 2023
Abstract
Predicting the severity of an illness is crucial in intensive care units (ICUs) if a patient‘s life
is to be saved. The existing prediction methods often fail to provide sufficient evidence for
time-critical decisions required in dynamic and changing ICU environments. In this research,
a new method called MM-RNN (multi-task memory-fused recurrent neural network) was
developed to predict the severity of illnesses in intensive care units (ICUs). MM-RNN aims
to address this issue by not only predicting illness severity but also generating an evidencebased explanation of how the prediction was made. The architecture of MM-RNN consists
of task-specific phased LSTMs and a delta memory network that captures asynchronous
feature correlations within and between multiple organ systems. The multi-task nature of
MM-RNN allows it to provide an evidence-based explanation of its predictions, along with
illness severity scores and a heatmap of the patient’s changing condition. The results of
comparison with state-of-the-art methods on real-world clinical data show that MM-RNN
delivers more accurate predictions of illness severity with the added benefit of providing
evidence-based justifications.
Keywords Personalized healthcare · Illness severity prediction · Explainable prediction ·
Time series
This article belongs to the Topical Collection: APWeb-WAIM 2022
Guest editors: Calvanese Diego, Toshiyuki Amagasa and Bohan Li
B Lin Yue
Weitong Chen
Wei Emma Zhang
1
The University of Adelaide, Adelaide 5000, South Australia, Australia
2
The University of Newcastle, Newcastle 2308, New South Wales, Australia
123
World Wide Web
Figure 1 Two different SOFA trajectories for two ICU patients. According to their SOFA score, Patient
ID: 80030 (red) was initially in critical condition but gradually improved and was eventually discharged. In
contrast, the condition of Patient ID: 45767 (blue) deteriorated, and they ultimately passed away
1 Introduction
The exponential growth of electronic health records (EHRs) has drawn significant interest
from the machine learning and data mining communities. With a wealth of information from
multiple sources and formats, these EHRs offer a vast dataset for developing evidence-based
clinical decision-making tools. For example, the My Health Record System1 stores over 41.9
million EHRs on over 6.4 million patients. Despite criticisms by some that EHRs are often
vendor-specific and sometimes limited in scope [1], their sheer size and diversity make them
a valuable resource for deep learning technology. This is especially true in the intensive care
Unit (ICU), where critical decisions are driven by forecasts of patient outcomes based on
pathological and physiological values [1]. As such, deep learning technology has been broadly
applied to advance research in ICU decision support, particularly mortality estimation [2]
and phenotype analysis [3]. In general, clinical decisions in ICUs are time-critical and highly
dependent on physiological data. However, making accurate and rapid decisions in these fastchanging environments without enough real-time information on the severity of a patient‘s
illness can be very challenging for clinicians. As a result, numerous scoring systems have
been developed and progressively refined to assist with rapid patient assessment. Examples
include the sequential organ failure assessment score (SOFA) [4], APACHE II [5], and SAPS
II [6]. The produced scores reflect the current clinical condition of a patient based on a set
of basic physiological indicators.
1.1 Motivation
These scoring systems serve as a simple calculation of a patient’s vital signs at various
times but do not provide real-time information for critical decision-making in the ICU. The
longer the time between updated information, the less opportunity there is to respond to a
deteriorating patient, which is why a continuous monitoring of key indicators such as heart
rate is essential. According to Bouch and Thompson [7], an instantaneous scoring system
that covers a wider range of indicators is urgently needed to support better decision-making
in the ICU. To demonstrate the potential impact of such a system, we present an example
of two ICU patients, whose SOFA scores are charted over time to show changes in their
condition (as illustrated in Figure 1). If linked to medical interventions, these high-frequency
1 https://myhealthrecord.gov.au
123
World Wide Web
SOFA measures could provide insight into the effectiveness of each treatment. This example
highlights both the potential and the need for continuous prediction of illness severity scores
as a new tool for patient monitoring.
Over the years, recurrent neural networks (RNNs) and their variants [8–12], have been
explored as deep models for handling time series data, and many have achieved significant
results with clinical prediction tasks like mortality risk. Given a sequence of multivariate
features, the typical outlook of mortality risk with the prediction techniques of today is
about 24 hours -barely enough time for clinicians to intervene. More importantly, short-term
mortality risk predictions may have ethical implications. For example, the mortality risk to a
patient over the next week may be, say, 80% but the prediction for the next 24 hours may only
be 5%. If faced with an unaffordable treatment, many patients and caregivers may choose not
to continue with clinical services unbeknownst to the consequences of that decision beyond
tomorrow. Thus, continuously predicting the medical trajectory not only offers more detailed
information at a finer time granularity but could also help caregivers concentrate on planning
effective treatments with better consideration of an illness’s true severity.
Despite their solid results to date, learning models have some deficiencies. For instance,
they normally treat all multivariate time-series variables as an entire input stream without
considering the correlations between the physiological variables. However, human organs are
highly correlated to each other and to a patient‘s deterioration. When one or two organs start to
malfunction, others tend to follow over a short period. For example, systolic blood pressure is
positively correlated with diastolic blood pressure and pulse pressure, whereas diastolic blood
pressure is inversely correlated with pulse pressure. Also, a deterioration in the fraction of
inspired oxygen can asynchronously affect cerebral blood flow. Thus, exploiting correlations
between medical time-series variables can further improve classification performance for
ICU prediction tasks. There are few research works that h (...truncated)