Kernel Affine Projection Algorithms
EURASIP Journal on Advances in Signal Processing
Hindawi Publishing Corporation
Kernel Affine Projection Algorithms
Weifeng Liu 0
Jos e´ C. Pr´ıncipe 0
Recommended by An´ıbal Figueiras-Vidal
0 Department of Electrical and Computer Engineering, University of Florida , Gainesville, FL 32611 , USA
The combination of the famed kernel trick and affine projection algorithms (APAs) yields powerful nonlinear extensions, named collectively here, KAPA. This paper is a follow-up study of the recently introduced kernel least-mean-square algorithm (KLMS). KAPA inherits the simplicity and online nature of KLMS while reducing its gradient noise, boosting performance. More interestingly, it provides a unifying model for several neural network techniques, including kernel least-mean-square algorithms, kernel adaline, sliding-window kernel recursive-least squares (KRLS), and regularization networks. Therefore, many insights can be gained into the basic relations among them and the tradeoff between computation complexity and performance. Several simulations illustrate its wide applicability.
1. INTRODUCTION
The solid mathematical foundation, wide and successful
applications are making kernel methods very popular. By the
famed kernel trick, many linear methods have been recast in
high dimensional reproducing kernel Hilbert spaces (RKHS)
to yield more powerful nonlinear extensions, including
support vector machines [1], principal component analysis
[2], recursive least squares [3], Hebbian algorithm [4],
Adaline [5], and so forth.
More recently, a kernelized least-mean-square (KLMS)
algorithm was proposed in [6], which implicitly creates
a growing radial basis function network (RBF) with a
learning strategy similar to resource-allocating networks
(RAN) proposed by Platt [7]. As an improvement, kernelized
affine projection algorithms (KAPAs) are presented for the
first time in this paper by reformulating the conventional
affine projection algorithm (APA) [8] in general reproducing
kernel Hilbert spaces (RKHS). The new algorithms are
online, simple, and significantly reduce the gradient noise
compared with the KLMS and thus improve performance.
More interestingly, the KAPA reduces to the kernel
least-mean square (KLMS), sliding-window kernel recursive
least squares (SW-KRLS), kernel adaline, and regularization
networks naturally in special cases. Thus it provides a
unifying model for these existing methods and helps better
understand the basic relations among them and the tradeoff
between complexity and performance. Moreover, it also
advances our understanding on the resource-allocating
networks. Exploiting the underlying linear structure of RKHS, a
brief discussion on its well-posedness will be conducted.
The organization of the paper is as follows. In Section 2,
the affine projection algorithms are briefly reviewed. Next, in
Section 3, the kernel trick is applied to formulate the
nonlinear affine projection algorithms. Other related algorithms
are reviewed as special cases of the KAPA in Section 4. We
detail the implementation of the KAPA in Section 5. Three
experiments are studied in Section 6 to support our theory.
Finally, Section 7 summarizes the conclusions and future
lines of research.
The notation used throughout the paper is summarized
in Table 1.
2.
A REVIEW OF THE AFFINE PROJECTION ALGORITHMS
Let d be a zero-mean scalar-valued random variable, and
let u be a zero-mean L × 1 random variable with a
positive-definite covariance matrix Ru = E uuT . The
crosscovariance vector of d and u is denoted by rdu = E du . The
weight vector w that solves
min E d − wT u 2
w
(
1
)
is given by wo = Ru−1rdu [8].
Several methods that approximate w iteratively also exist,
for example, the common gradient method
(
2
)
(
3
)
w(0) = initial guess;
w(i) = w(i − 1) + η rdu − Ruw(i − 1) ,
or the regularized Newton’s recursion
w(0) = initial guess;
w(i) = w(i − 1) + η Ru + εI −1 rdu − Ruw(i − 1) ,
where ε is a small positive regularization factor and η is the
step size specified by the designer.
Stochastic-gradient algorithms replace the covariance
matrix and the cross-covariance vector by local
approximations directly from data at each iteration. There are
several ways for obtaining such approximations. The tradeoff
is computation complexity, convergence performance, and
steady-state behavior [8].
Assume that we have access to observations of the
random variables d and u over time
d(
1
), d(
2
), . . . ,
u(
1
), u(
2
), . . . .
(
4
)
The Least-mean-square (LMS) algorithm simply uses the
instantaneous values for approximations Ru = u(i)u(i)T and
rdu = d(i)u(i). The corresponding steepest-descent recursion
(
2
) and Newton’s recursion (
3
) become
w(i) = w(i − 1) + ηu(i) d(i) − u(i)T w(i − 1) ;
w(i) = w(i−1)+ηu(i) u(i)T u(i)+εI −1 d(i)−u(i)T w(i−1) .
(
5
)
The affine projection algorithm however employs better
approximations. Specifically, Ru and rdu are replaced by
the instantaneous approximations from the (...truncated)