Robust Learning Algorithm Based on Iterative Least Median of Squares (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs11063-012-9227-z.pdf

Robust Learning Algorithm Based on Iterative Least Median of Squares

Andrzej Rusiecki Outliers and gross errors in training data sets can seriously deteriorate the performance of traditional supervised feedforward neural networks learning algorithms. This is why several learning methods, to some extent robust to outliers, have been proposed. In this paper we present a new robust learning algorithm based on the iterative Least Median of Squares, that outperforms some existing solutions in its accuracy or speed. We demonstrate how to minimise new non-differentiable performance function by a deterministic approximate method. Results of simulations and comparison with other learning methods are demonstrated. Improved robustness of our novel algorithm, for data sets with varying degrees of outliers, is shown. 1 Introduction Feedforward artificial neural networks (FNN) have been successively applied in areas such as function approximation, pattern recognition or signal and image processing. Because the FNNs are universal approximators [9,10], they can potentially be used in any type of problems that require modelling of unknown inputoutput dependencies. Such networks build their models based on training sets consisting of exemplary inputoutput patterns. The main advantage of such approach is its simplicity, since any prior knowledge about modelled system is not required. These networks are usually trained to minimise an error function defined to measure the distance between the current and desired output. During the training process, FNNs try to fit the training data as close as possible. Unfortunately, the performance of this type of learning scheme relies strongly on the quality of training data [8,11,15]. When the data are corrupted with large noise or outliers the network is trained on erroneous examples and tries to model a system different from the desired one. This is because the most popular backpropagation (BP) learning algorithm and many of its variants use the mean squared error (MSE) function. This strategy, based on the least mean squares method, is optimal only for the clean data or data with normal error distribution. Outliers may be defined as observations deviating strongly from the majority of the data. Unfortunately, in routine data, the quantity of outliers can range from 1 to 10 % [8], or in certain cases even more. They may be caused by measurement errors, human mistakes such as errors in copying or wrong decimal points, long-tailed noise resulting in different sample distribution, measurements of members of wrong populations, rounding errors and many other reasons. When we deal with multidimensional data set, finding even one outlying observation involves computationally expensive methods. In the case when more outliers exist, the situation becomes obviously much more complicated. In this paper, we present a new learning algorithm that is robust to various degrees of outlying data in training sets. The novel algorithm takes advantage of the idea of the least median of squares estimator. It is applied iteratively to remove outliers from the training data, but it provides also satisfactory performance when the network is trained on the clean data set. k=1 j=1 2 Network Training with Outliers The feedforward networks learning algorithms, that are based on the minimisation of some kind of criterion function, use backpropagation to calculate the performance gradient with respect to network weights (and biases which may be also considered as additional weights). To introduce network performance function, let us consider, without loss of generality, a simple three layer feedforward neural network with one hidden layer. We assume that the training set consists of N pairs: {(x1, t1), (x2, t2), . . . , (xN , t N )}, where xi R p denotes the p-dimensional i th input vector and ti Rq the corresponding q-dimensional network target. For the given input vector xi = (xi1, xi2, . . . , xi p)T , the output of the j th of l neurons of the hidden layer may be calculated as: zi j = f1 w jk xik b j = f1(i npi j ), for j = 1, 2, . . . , l, where f1() is the activation function of the hidden layer, i npi j is the sum of its weighted inputs, w jk is the weight between the kth net input and j th neuron, and b j is the bias of the j th neuron. For such network its output yi = (yi1, yi2, . . . , yiq )T is given as: wvj zi j bv = f2(i npiv ), for v = 1, 2, . . . , q. Here f2() denotes the output layer activation function, wvj is the weight between the vth neuron of the output layer and the j th neuron of the hidden layer, and bv is the bias of the vth neuron of the output layer. When f1 and f2 are similar, these equations can be simplified, however for the function approximation or regression task, the most common approach is to use the sigmoid activation function in the hidden layers and linear activation in its output. For the residuals ri written as: the performance function may be defined as: where (ri ) is a symmetric and continuous loss function [8], ri is an error for the i -th training pattern (3), and N is the number of elements in the training set. The most popular loss function is of quadratic form: For the quadratic loss function we obtain the minimised error equal to the MSE: v=1 ri = |(yiv tiv )|, 1 E = N i=1 1 Emse = N i . 2 i=1 ri The influence function [8,14] was introduced to measure the impact of data errors to the training process. It may be defined as a derivative of the loss function with respect to residuals: If we assume the MSE performance function, then the influence function becomes linear: which means the larger the error, the more it affects the training process. Since large errors are often caused by outliers, this phenomenon seems to be very dangerous. This is why various robust learning algorithms based on robust estimators have been proposed [1,2,14,20]. 3 Robust Learning Algorithms In the field of robust statistics [8,11] many methods to deal with the problem of outliers have been proposed. They are designed to act properly when the true underlying model deviates from the assumptions, such as normal error distribution. There are robust methods that detect and remove outlying data before the model is built, but more of them, including robust estimators, should be efficient and reliable even if outliers appear. Simultaneously, they should perform well for the observations that are very close to the assumed model. The simplest idea to make the traditional neural network learning algorithm more robust to outliers is to replace the quadratic error with another symmetric and continuous loss function, resulting in the nonlinear influence function. Such nonlinearity should reduce the influence of large errors. Robust loss functions can be based on the robust estimators with proved ability to tolerate different amounts of outlying data. Replacing the MSE performance function with a new robust function results in robust learning method with the reduced impact of outliers. Several such algorithms desti (...truncated)