Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values
Neural Processing Letters
https://doi.org/10.1007/s11063-022-10950-2
Cyclic Gate Recurrent Neural Networks for Time Series Data
with Missing Values
Philip B. Weerakody1
· Kok Wai Wong1 · Guanjin Wang1
Accepted: 27 June 2022
© The Author(s) 2022
Abstract
Gated Recurrent Neural Networks (RNNs) such as LSTM and GRU have been highly effective in handling sequential time series data in recent years. Although Gated RNNs have an
inherent ability to learn complex temporal dynamics, there is potential for further enhancement by enabling these deep learning networks to directly use time information to recognise
time-dependent patterns in data and identify important segments of time. Synonymous with
time series data in real-world applications are missing values, which often reduce a model’s
ability to perform predictive tasks. Historically, missing values have been handled by simple
or complex imputation techniques as well as machine learning models, which manage the
missing values in the prediction layers. However, these methods do not attempt to identify
the significance of data segments and therefore are susceptible to poor imputation values or
model degradation from high missing value rates. This paper develops Cyclic Gate enhanced
recurrent neural networks with learnt waveform parameters to automatically identify important data segments within a time series and neglect unimportant segments. By using the
proposed networks, the negative impact of missing data on model performance is mitigated
through the addition of customised cyclic opening and closing gate operations. Cyclic Gate
Recurrent Neural Networks are tested on several sequential time series datasets for classification performance. For long sequence datasets with high rates of missing values, Cyclic Gate
enhanced RNN models achieve higher performance metrics than standard gated recurrent
neural network models, conventional non-neural network machine learning algorithms and
current state of the art RNN cell variants.
Keywords Time series · Missing values · Recurrent neural network · GRU · LSTM · RNN
1 Introduction
Due to the numerous types of sensing devices or recording practices that generate data, it is
rare for raw time series data to have all input features sampled at a constant rate with common
timestamps and consistency across multiple features [1]. Missing observations are common
B Philip B. Weerakody
1
Discipline of Information Technology, Murdoch University, Perth, WA, Australia
123
P. B. Weerakody et al.
in univariate and multivariate datasets. Missing data occur in univariate data generated from
a single feature variable measured at a sampling rate without a consistent interval between
observations due to unstructured manual processes, event-driven monitoring, device or signal
failure, and intentional omissions based on cost or importance. For multivariate datasets
derived from numerous measuring techniques and instruments, the frequency at which each
variable is sampled will often be different and result in missing values for one or more features
at any given timestamp.
The impact of missing values on data modelling often results in performance degradation in
forecasting and classification tasks [2]. Therefore, dealing with missing values is an important
and often overlooked part of building an effective model. There are two common approaches
for handling missing values in time series data: missing value imputation at the data preprocessing stage [3–6] and modification of algorithms to directly handle missing values in
the learning process [7, 8]. Imputation based methods estimate missing values and reconstruct
a complete time series which is subsequently fed into prediction layers. Methods that rely on
algorithms within the prediction model to handle missing values during the learning process,
do not aim to develop the most accurate estimation of missing values but rather optimise the
final prediction capability taking into account the missing values.
Over the past few decades, most approaches to tackling missing values in datasets have
focused on imputation techniques, which range from simple statistical methods such as mean,
moving average and simple regression to complex imputation methods involving machine
learning (ML) to predict accurate missing values. Simple statistical imputation methods can
often introduce a loss of accuracy or bias to models, while complex imputation models
are computationally expensive [9, 10]. When applied to time series data, most imputation
techniques fail to capture the temporal dependencies between observations in univariate or
multivariate data. Additionally, missing patterns, which can be time-dependent, are hidden
by imputation techniques and not effectively explored in the prediction layers, resulting in
sub-optimal models. Utilising time information in addition to feature variable data has been
shown to improve a number of machine learning models for time series prediction tasks with
missing or irregular data [8]. Augmentation of model inputs with time values or time intervals
can be applied in several ways, including their use in learnt decay functions to impute missing
values [11] or direct input of time values as a feature variable for the prediction layers [12].
Standard gated Recurrent Neural Network (RNN) models such as LSTM and GRU and
their associated variants provide an ideal starting point for handling time sequential data due
to their success in providing state-of-the-art performance in sequential data modelling tasks.
Their successful applications have included machine translation, speech recognition and other
natural language processing (NLP) applications that require learning temporal dependencies
within a sequence of text [13]. Gated recurrent neural networks are capable of utilising
time inputs within the model and learning temporal patterns in sequential time series data
[14]. Their architectures have the flexibility to allow for modifications to address specific
data issues, including irregular time series sequences [10]. Traditional machine learning
techniques for handling sequential data not based on Neural Network (NN) architectures
have included examples such as Naive Bayes, k-Nearest Neighbor (KNN), Support Vector
Machine (SVM) and Random Forest (RF), which predominantly rely on feature extraction
prior to inference and therefore fail to utilise the rich information associated with the raw time
sequence. Time information has been successfully utilised as part of structural modification
of conventional recurrent neural networks, such as the LSTM and GRU, by modification
of their gate operations [1, 8]. These modifications allow for better prediction of sequences
that can be long, noisy or sparse, by enabling the model to be aware of patterns in time that
identify more and less significant segments of data. The time-aware concepts behind these
123
Cyclic Gate Recurrent Neural Networks for Time Series Data …
structural changes (...truncated)