Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11063-022-10950-2.pdf

Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values

Neural Processing Letters https://doi.org/10.1007/s11063-022-10950-2 Cyclic Gate Recurrent Neural Networks for Time Series Data with Missing Values Philip B. Weerakody1 · Kok Wai Wong1 · Guanjin Wang1 Accepted: 27 June 2022 © The Author(s) 2022 Abstract Gated Recurrent Neural Networks (RNNs) such as LSTM and GRU have been highly effective in handling sequential time series data in recent years. Although Gated RNNs have an inherent ability to learn complex temporal dynamics, there is potential for further enhancement by enabling these deep learning networks to directly use time information to recognise time-dependent patterns in data and identify important segments of time. Synonymous with time series data in real-world applications are missing values, which often reduce a model’s ability to perform predictive tasks. Historically, missing values have been handled by simple or complex imputation techniques as well as machine learning models, which manage the missing values in the prediction layers. However, these methods do not attempt to identify the significance of data segments and therefore are susceptible to poor imputation values or model degradation from high missing value rates. This paper develops Cyclic Gate enhanced recurrent neural networks with learnt waveform parameters to automatically identify important data segments within a time series and neglect unimportant segments. By using the proposed networks, the negative impact of missing data on model performance is mitigated through the addition of customised cyclic opening and closing gate operations. Cyclic Gate Recurrent Neural Networks are tested on several sequential time series datasets for classification performance. For long sequence datasets with high rates of missing values, Cyclic Gate enhanced RNN models achieve higher performance metrics than standard gated recurrent neural network models, conventional non-neural network machine learning algorithms and current state of the art RNN cell variants. Keywords Time series · Missing values · Recurrent neural network · GRU · LSTM · RNN 1 Introduction Due to the numerous types of sensing devices or recording practices that generate data, it is rare for raw time series data to have all input features sampled at a constant rate with common timestamps and consistency across multiple features [1]. Missing observations are common B Philip B. Weerakody 1 Discipline of Information Technology, Murdoch University, Perth, WA, Australia 123 P. B. Weerakody et al. in univariate and multivariate datasets. Missing data occur in univariate data generated from a single feature variable measured at a sampling rate without a consistent interval between observations due to unstructured manual processes, event-driven monitoring, device or signal failure, and intentional omissions based on cost or importance. For multivariate datasets derived from numerous measuring techniques and instruments, the frequency at which each variable is sampled will often be different and result in missing values for one or more features at any given timestamp. The impact of missing values on data modelling often results in performance degradation in forecasting and classification tasks [2]. Therefore, dealing with missing values is an important and often overlooked part of building an effective model. There are two common approaches for handling missing values in time series data: missing value imputation at the data preprocessing stage [3–6] and modification of algorithms to directly handle missing values in the learning process [7, 8]. Imputation based methods estimate missing values and reconstruct a complete time series which is subsequently fed into prediction layers. Methods that rely on algorithms within the prediction model to handle missing values during the learning process, do not aim to develop the most accurate estimation of missing values but rather optimise the final prediction capability taking into account the missing values. Over the past few decades, most approaches to tackling missing values in datasets have focused on imputation techniques, which range from simple statistical methods such as mean, moving average and simple regression to complex imputation methods involving machine learning (ML) to predict accurate missing values. Simple statistical imputation methods can often introduce a loss of accuracy or bias to models, while complex imputation models are computationally expensive [9, 10]. When applied to time series data, most imputation techniques fail to capture the temporal dependencies between observations in univariate or multivariate data. Additionally, missing patterns, which can be time-dependent, are hidden by imputation techniques and not effectively explored in the prediction layers, resulting in sub-optimal models. Utilising time information in addition to feature variable data has been shown to improve a number of machine learning models for time series prediction tasks with missing or irregular data [8]. Augmentation of model inputs with time values or time intervals can be applied in several ways, including their use in learnt decay functions to impute missing values [11] or direct input of time values as a feature variable for the prediction layers [12]. Standard gated Recurrent Neural Network (RNN) models such as LSTM and GRU and their associated variants provide an ideal starting point for handling time sequential data due to their success in providing state-of-the-art performance in sequential data modelling tasks. Their successful applications have included machine translation, speech recognition and other natural language processing (NLP) applications that require learning temporal dependencies within a sequence of text [13]. Gated recurrent neural networks are capable of utilising time inputs within the model and learning temporal patterns in sequential time series data [14]. Their architectures have the flexibility to allow for modifications to address specific data issues, including irregular time series sequences [10]. Traditional machine learning techniques for handling sequential data not based on Neural Network (NN) architectures have included examples such as Naive Bayes, k-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Random Forest (RF), which predominantly rely on feature extraction prior to inference and therefore fail to utilise the rich information associated with the raw time sequence. Time information has been successfully utilised as part of structural modification of conventional recurrent neural networks, such as the LSTM and GRU, by modification of their gate operations [1, 8]. These modifications allow for better prediction of sequences that can be long, noisy or sparse, by enabling the model to be aware of patterns in time that identify more and less significant segments of data. The time-aware concepts behind these 123 Cyclic Gate Recurrent Neural Networks for Time Series Data … structural changes (...truncated)