Constrained Projection Approximation Algorithms for Principal Component Analysis (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs11063-006-9011-z.pdf

Constrained Projection Approximation Algorithms for Principal Component Analysis

0 Advanced Brain Signal Processing Lab, Brain Science Institute , RIKEN, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan 1 Department of Physics, Pohang University of Science and Technology , San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea 2 Department of Computer Science, Pohang University of Science and Technology , San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea In this paper, we introduce a new error measure, integrated reconstruction error (IRE) and show that the minimization of IRE leads to principal eigenvectors (without rotational ambiguity) of the data covariance matrix. Then, we present iterative algorithms for the IRE minimization, where we use the projection approximation. The proposed algorithm is referred to as COnstrained Projection Approximation (COPA) algorithm and its limiting case is called COPAL. Numerical experiments demonstrate that these algorithms successfully find exact principal eigenvectors of the data covariance matrix. 1. Introduction Principal component analysis (PCA) or principal subspace analysis (PSA) is a fundamental multivariate data analysis method which is encountered into a variety of areas in neural networks, signal processing, and machine learning [10]. A variety of adaptive (on-line) algorithms for PCA or PSA can be found in literature [4, 5, 7, 11, 13, 15] (see also [8] and references therein). Most of these algorithms are gradient-based learning algorithms, hence the convergence is slow. The power iteration is a classical method for estimating the largest eigenvector of a symmetric matrix. The subspace iteration is a direct extension of the power iteration, computing subspace spanned by principal eigenvectors of a symmetric matrix. The natural power method is an exemplary instance of the subspace iteration, where the invariant subspace spanned by the n largest eigenvectors of the data covariance matrix, is determined [9]. The natural power iteration provides a general framework for several well-known subspace algorithms, including Ojas subspace rule [11], PAST [16], and OPAST [1]. A common derivation of PSA, is terms of a linear (orthogonal) projection W = [w1, . . . , wn] Rmn such that given a centered data matrix X = [x(1), . . . , x(N )] RmN , the reconstruction error X W W X 2F is minimized, where F denotes the Frobenius norm (Euclidean norm). It is known that the reconstruction error is blind to an arbitrary rotation of the representation space. The minimization of the reconstruction error leads to W = U 1Q, where Q Rnn is an arbitrary orthogonal matrix and the eigendecomposition of the covariance matrix C = XX is given by C = where U 1 Rmn contains n largest eigenvectors, U 2 Rm(mn) consists of the rest of eigenvectors, and associated eigenvalues are in 1, 2 with 1 > 2 > > m. Probabilistic model-based method for PCA was developed, where the linear generative model was considered and expectation maximization (EM) optimization was used to derive iterative PCA algorithms, including probabilistic PCA (PPCA) [14] and EM-PCA [12]. These algorithms are batch algorithms that find principal subspace. Hence, further post-processing is required to determine exact principal eigenvectors of the data covariance matrix, without rotational ambiguity. The natural power iteration is also a PSA-type algorithm, unless the deflation method is used. In this paper, we present iterative algorithms which determine the principal eigenvectors of the data covariance matrix in a parallel fashion (in contrast to the deflation method). To this end, we first introduce the integrated reconstruction error (IRE) and show that its minimization leads to exact principal eigenvectors (without rotational ambiguity). Proposed iterative algorithms emerge from the minimization of the IRE and are referred to as COPA algorithm and COPAL (the limiting case of COPA). These algorithms are the recognition model counterpart of the constrained EM algorithm in [2, 3] where principal directions are estimated through alternating two steps (E and M steps) in the context of the linear coupled generative model. In contrast, our algorithms COPA and COPAL, need not go through two steps, which is a major advantage over EM type algorithms. 2. Integrated Reconstruction Error It was shown in [16] that the reconstruction error JRE = X W W X 2F attains the global minimum if and only if W = U 1Q. Now, we introduce the IRE that is summarized below. DEFINITION 1 (IRE). The integrated reconstruction error, JIRE, is defined as a linear combination of n partial reconstruction errors (PRE), Ji = X W Ei W X 2F , i.e., JIRE(W ) = i=1 [Ei ]jj = Proof. See Appendix. The last term in IRE, Jn, is the standard reconstruction error. It was shown in [16] that W is a stationary point of Jn if and only if W = U 1Q (hence W W = I is satisfied). All stationary points of Jn are saddle points, except when U 1 contains the n dominant eigenvectors of C. In that case, Jn attains the global minimum. The standard reconstruction error Jn is invariant to an orthogonal transform Q because WQQ W = W W . In contrast, the IRE is not invariant under an orthogonal transform, since QEiQ = Ei . This provides an intuitive idea why the IRE minimization leads to the principal eigenvectors of the data covariance matrix, without rotational ambiguity. PREs Ji are of the form Ji = X w1w1 + + wi wi where wi represents the ith column vector of W . The PRE i + 1, Ji+1, represents the reconstruction error for (i + 1)-dimensional principal subspace which completely includes i-dimensional principal subspace. Therefore, the minimization of JIRE implies that each PRE Ji for i = 1, . . . , n, is minimized. The graphical representation is shown in Figure 1, where a coupled linear recognition model is described, with a link of the IRE minimization. Minimizing each Ji is reminiscent of the deflation method where the eigenvectors of C are extracted one by one. Thus, it is expected that the minimization of IRE leads to principal eigenvectors of C. However, a major difference between the deflation method and our method is that the former extracts principal components one by one and the latter find principal components simultaneously. Iterative Algorithms The projection approximation [16] assumes that the difference between W (k+1)X and W (k)X is small, which leads us to consider the following objective function i=1 X W (k+1)Ei Y (k) where Y (k) = W (k)X. The gradient of (3) with respect to W(k+1) is given by JIRE W (k+1) = XY (k) is the Hadamard product (element-wise product). With these definitions, it follows from JIRE W (k+1) = 0 that 1 1 1 = XY = XY 1 U(Yij ) = if i j . where U(Y ) is an element-wise operator, whose arguments Yij are transformed by The operator U(Y ) results from the structure of 1 given by 1 . . . W (k+1) = CW (k) U W (k)CW (k) 1 . UT (Yij ) = This leads to the COPAL algorithm W (k+1) = CW (k) UT W (k)CW (k) 1 . Replacing Y (k) by W (k)X, leads to the updat (...truncated)