Semi-supervised local Fisher discriminant analysis for dimensionality reduction (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5125-7.pdf

Semi-supervised local Fisher discriminant analysis for dimensionality reduction

Masashi Sugiyama 0 1 Tsuyoshi Id 0 1 Shinichi Nakajima 0 1 Jun Sese 0 1 Roni Khardon. 0 T. Id IBM Research, Tokyo Research Laboratory , 1623-14 Shimo-tsuruma, Yamato-shi, Kanagawa 242-8502, Japan 1 M. Sugiyama ( ) Department of Computer Science, Tokyo Institute of Technology , 2-12-2 O-okayama, Meguro-ku, Tokyo 152-8552, Japan When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call SEmi-supervised Local Fisher discriminant analysis (SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets. 1 Introduction The goal of dimensionality reduction is to obtain a low-dimensional representation of highdimensional data samples while preserving most of the intrinsic information contained in the original data (Roweis and Saul 2000; Tenenbaum et al. 2000; Hinton and Salakhutdinov 2006). If dimensionality reduction is carried out appropriately, the compact representation of the data can be used for various tasks such as visualization and classification. In supervised learning scenarios where data samples are accompanied with class labels, Fisher discriminant analysis (FDA) (Fisher 1936; Fukunaga 1990) is a popular dimensionality reduction method. FDA seeks an embedding transformation such that the betweenclass scatter is maximized and the within-class scatter is minimized. FDA works very well if the samples in each class follow Gaussian distributions with a shared covariance structure. However, FDA tends to give undesired results if the samples in a class form several separate clusters or there are outliers (Fukunaga 1990). To overcome this drawback, Local FDA (LFDA) has been proposed (Sugiyama 2007). LFDA localizes the evaluation of the within-class scatter, and thus works well even when within-class multimodality or outliers exist. In addition, LFDA overcomes a critical limitation of the original FDA in dimensionality reductionthe dimension of the FDA embedding space should be less than the number of classes (Fukunaga 1990), while LFDA does not suffer from this restriction in general. Moreover, LFDA was shown to compare favorably with other supervised dimensionality reduction methods through experiments (Sugiyama 2007). However, the performance of LFDA (and all other supervised dimensionality reduction methods) tends to be degraded when only a small number of labeled samples are available. Namely, the supervised dimensionality reduction methods tend to find embedding spaces which are overfitted to the labeled samples. In such cases, it is effective to make use of unlabeled samples that are often available abundantlysuch a setup is called semi-supervised learning (Chapelle et al. 2006). Through extensive experiments, it was shown that principal component analysis (PCA) (Jolliffe 1986), which is an unsupervised dimensionality reduction method for preserving the global data structure, works moderately well in semisupervised learning scenarios (see e.g., Chap. 21 of Chapelle et al. 2006). Although PCA was reported to work well, it may not be the best possible choice in the semi-supervised situation because of its unsupervised nature. In this paper, we propose an alternative semi-supervised dimensionality reduction method. Our basic idea is to smoothly bridge LFDA and PCA so that our reliance on the global structure of unlabeled samples and information brought by (a small number of) labeled samples can be controlled. We show experimentally that the proposed method, which we refer to as semi-supervised LFDA (SELF), compares favorably with other methods. Note that SELF maintains the same computational advantage of LFDA and PCA, i.e., a global solution can be analytically computed based on eigen-decomposition. Therefore, SELF is still computationally as efficient as LFDA and PCA. The rest of this paper is organized as follows. In Sect. 2, the linear dimensionality reduction problem addressed in this paper is formulated and some mathematical facts used in the following sections are briefly summarized. In Sect. 3, existing supervised and unsupervised dimensionality reduction methods are reviewed in a systematic and unified manner. This unified view will be the foundation for developing our new method in the following section. Those who are familiar with the existing methods and interested in immediately looking at the new method may choose to skip the review materials provided in Sect. 3. In Sect. 4, we propose the new semi-supervised dimensionality reduction method SELF and show its properties. Section 5 is devoted to experiments showing the usefulness of the proposed approach. Finally, in Sect. 6, we conclude with a discussion on possible future directions. Many dimensionality reduction techniques developed so far involve an optimization problem of the following form: CT )1 . Roughly speaking, B encodes the quantity that we want to increase (e.g., between-class separability), and C corresponds to the quantity that we want to decrease (e.g., within-class scatter). In the next section, we show how B and C are designed in some specific cases. Note that the same solution T (OPT) can also be obtained as follows (see e.g., Fukunaga 1990): CT = I r , where I r is the identity matrix on Rr and det() denotes the determinant of a matrix. d Let {k }k=1 be the generalized eigenvectors associated with the generalized eigenvalues {k }kd=1 of the following generalized eigenvalue problem: In this section, we formulate the linear dimensionality reduction problem and give some mathematical background. Let xi Rd (i = 1, 2, . . . , n) be d -dimensional sample vectors and let X Rdn be the matrix of all samples: X := (x1|x2| |xn). Let z Rr (1 r d) be a low-dimensional representation of a high-dimensional sample x Rd , where r is the dimensionality of the reduced space. For the moment, we focus on linear dimensionality reduction, i.e., using a transformation matrix T Rdr , an embedded representation z of the sample x is obtained as z = T x, where denotes the transpose of a matrix or a vector. Later, we extend our discussion to cases where the mapping from x to z is non-linear. 2.2 Generalized eigenvalue problem 2 Preliminaries 2.1 Formulation We assume that the generalized eigenvalues are sorted in descending order as k Ck = 0. and the generalized eigenvectors are normalized as k Ck = 1 for k = 1, 2, . . . , d. Note that this normalization is often carried out automatically by an eigen-solver. (...truncated)