Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0077162&type=printable

Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization

Tao D (2013) Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization. PLoS ONE 8(10): e77162. doi:10.1371/journal.pone.0077162 Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization Naiyang Guan 0 Lei Wei 0 Zhigang Luo 0 Dacheng Tao 0 Petros Drineas, Rensselaer Polytechnic Institute, United States of America 0 1 National Key Laboratory of Parallel and Distributed Processing, School of Computer Science, National University of Defense Technology , Changsha, Hunan , China , 2 Centre for Quantum Computation and Intelligent Systems and the Faculty of Engineering and Information Technology, University of Technology , Sydney , Australia Graph regularized nonnegative matrix factorization (GNMF) decomposes a nonnegative data matrix X [Rm|n to the product of two lower-rank nonnegative factor matrices, i.e., W [Rm|r and H[Rr|n (rvminfm,ng) and aims to preserve the local geometric structure of the dataset by minimizing squared Euclidean distance or Kullback-Leibler (KL) divergence between X and WH. The multiplicative update rule (MUR) is usually applied to optimize GNMF, but it suffers from the drawback of slowconvergence because it intrinsically advances one step along the rescaled negative gradient direction with a non-optimal step size. Recently, a multiple step-sizes fast gradient descent (MFGD) method has been proposed for optimizing NMF which accelerates MUR by searching the optimal step-size along the rescaled negative gradient direction with Newton's method. However, the computational cost of MFGD is high because 1) the high-dimensional Hessian matrix is dense and costs too much memory; and 2) the Hessian inverse operator and its multiplication with gradient cost too much time. To overcome these deficiencies of MFGD, we propose an efficient limited-memory FGD (L-FGD) method for optimizing GNMF. In particular, we apply the limited-memory BFGS (L-BFGS) method to directly approximate the multiplication of the inverse Hessian and the gradient for searching the optimal step size in MFGD. The preliminary results on real-world datasets show that L-FGD is more efficient than both MFGD and MUR. To evaluate the effectiveness of L-FGD, we validate its clustering performance for optimizing KL-divergence based GNMF on two popular face image datasets including ORL and PIE and two text corpora including Reuters and TDT2. The experimental results confirm the effectiveness of L-FGD by comparing it with the representative GNMF solvers. - Funding: This work was partially supported by Scientific Research Plan Project of National University of Defense Technology (No. JC13-06-01) and Australian Research Council Discovery Project (120103730). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. NMF factorizes a given nonnegative data matrix X [Rm|n into two lower-rank nonnegative factor matrices, i.e., W [Rm|r and H[Rr|n, where rvm and rvn. It is a powerful dimension reduction method and has been widely used in many fields such as data mining [1] and bioinformatics [2]. Since NMF does not explicitly guarantee parts-based representation [3], Hoyer [4] proposed sparseness constrained NMF (NMFsc) which incorporates the sparseness constraint into NMF. To utilize the discriminative information in a dataset, Zafeiriou et al. [5] proposed discriminant NMF (DNMF) to incorporate Fishers criteria in NMF for classification. Sandler and Lindenbaum [6] proposed an earth movers distance metric-based NMF (EMDNMF) to model the distortion of images for image segmentation and texture classification. Guan et al. [7] investigated Manhattan NMF (MahNMF) for low-rank and sparse matrix factorization of a nonnegative matrix and developed an efficient algorithm to solve MahNMF. Since NMF and its extensions do not consider geometric structure of a dataset, they perform unsatisfactorily in some tasks such as clustering. To consider the local geometric structure of a dataset in NMF, Cai et al. [8] proposed graph regularized nonnegative matrix factorization (GNMF) which encodes the geometric structure in a nearest neighbor (NN) graph for data representation. Along this direction, Guan et al. [9] extended GNMF to manifold-regularized discriminative NMF (MD-NMF) to incorporate discriminative information in a dataset by using margin maximization. The same authors proposed a nonnegative patch alignment framework (NPAF) [10] to unify such NMF-based nonlinear dimension reduction methods. Because the objective functions of GNMF and NPAF are jointly non-convex with respect to both factor matrices, their optimizations are difficult. Similar to NMF, GNMF is NP-hard. It is impossible to obtain its global minimum in polynomial time [11]. Fortunately, GNMF is convex with respect to each factor matrix, i.e., the sub-problems for updating individual factor matrix are convex, and thus it can be solved by recursively updating both factor matrices in the frame of block coordinate descent. Cai et al. [8] exploited the multiplicative update rule (MUR) to update each factor matrix alternately until convergence to a local minimum. MUR searches one step along the rescaled negative gradient direction with a step size setting to one. Since the step size is non-optimal, MUR does not sufficiently utilize the convexity of the sub-problems of Figure 1. Descent of both W and H along their rescaled negative gradient directions in MFGD. doi:10.1371/journal.pone.0077162.g001 GNMF. Although both [12] and [13] can solve squared Euclidean distance based NMF efficiently, they are not general enough to optimize Kullback-Leibler (KL) divergence based GNMF. Recently, Guan et al. [9] proposed a fast gradient descent (FGD) method to accelerate MUR for KL-divergence based GNMF. FGD searches the optimal step size along the rescaled negative gradient direction by using Newtons method. Since FGD sets a single step size for the whole factor matrix, it has the risk of shrinking to MUR, i.e., the final step size shrinks to one. To overcome this deficiency, Guan et al. [10] further proposed a multiple step-size FGD (MFGD) method which sets a step size for each row of W and each column of H, and searches the optimal step size vector by using the multivariate Newtons method. MFGD converges more rapidly than FGD, but the dimensionalities of the Hessian matrices used in the line search procedures for updating both factor matrices are too high, i.e., the Hessian matrices are m6m-dimensional and n6n-dimensional for optimizing W and H, respectively. Therefore, MFGD suffers from the following two drawbacks: 1) both the Hessian inverse operators and their multiplications with the corresponding gradients cost too much computational time, and 2) the dense Hessian matrices consume too much memory. To overcome the aforementioned defic (...truncated)