Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization
Tao D (2013) Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization. PLoS
ONE 8(10): e77162. doi:10.1371/journal.pone.0077162
Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization
Naiyang Guan 0
Lei Wei 0
Zhigang Luo 0
Dacheng Tao 0
Petros Drineas, Rensselaer Polytechnic Institute, United States of America
0 1 National Key Laboratory of Parallel and Distributed Processing, School of Computer Science, National University of Defense Technology , Changsha, Hunan , China , 2 Centre for Quantum Computation and Intelligent Systems and the Faculty of Engineering and Information Technology, University of Technology , Sydney , Australia
Graph regularized nonnegative matrix factorization (GNMF) decomposes a nonnegative data matrix X [Rm|n to the product of two lower-rank nonnegative factor matrices, i.e., W [Rm|r and H[Rr|n (rvminfm,ng) and aims to preserve the local geometric structure of the dataset by minimizing squared Euclidean distance or Kullback-Leibler (KL) divergence between X and WH. The multiplicative update rule (MUR) is usually applied to optimize GNMF, but it suffers from the drawback of slowconvergence because it intrinsically advances one step along the rescaled negative gradient direction with a non-optimal step size. Recently, a multiple step-sizes fast gradient descent (MFGD) method has been proposed for optimizing NMF which accelerates MUR by searching the optimal step-size along the rescaled negative gradient direction with Newton's method. However, the computational cost of MFGD is high because 1) the high-dimensional Hessian matrix is dense and costs too much memory; and 2) the Hessian inverse operator and its multiplication with gradient cost too much time. To overcome these deficiencies of MFGD, we propose an efficient limited-memory FGD (L-FGD) method for optimizing GNMF. In particular, we apply the limited-memory BFGS (L-BFGS) method to directly approximate the multiplication of the inverse Hessian and the gradient for searching the optimal step size in MFGD. The preliminary results on real-world datasets show that L-FGD is more efficient than both MFGD and MUR. To evaluate the effectiveness of L-FGD, we validate its clustering performance for optimizing KL-divergence based GNMF on two popular face image datasets including ORL and PIE and two text corpora including Reuters and TDT2. The experimental results confirm the effectiveness of L-FGD by comparing it with the representative GNMF solvers.
-
Funding: This work was partially supported by Scientific Research Plan Project of National University of Defense Technology (No. JC13-06-01) and Australian
Research Council Discovery Project (120103730). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the
manuscript.
Competing Interests: The authors have declared that no competing interests exist.
NMF factorizes a given nonnegative data matrix X [Rm|n into
two lower-rank nonnegative factor matrices, i.e., W [Rm|r and
H[Rr|n, where rvm and rvn. It is a powerful dimension
reduction method and has been widely used in many fields such as
data mining [1] and bioinformatics [2]. Since NMF does not
explicitly guarantee parts-based representation [3], Hoyer [4]
proposed sparseness constrained NMF (NMFsc) which
incorporates the sparseness constraint into NMF. To utilize the
discriminative information in a dataset, Zafeiriou et al. [5]
proposed discriminant NMF (DNMF) to incorporate Fishers
criteria in NMF for classification. Sandler and Lindenbaum [6]
proposed an earth movers distance metric-based NMF
(EMDNMF) to model the distortion of images for image segmentation
and texture classification. Guan et al. [7] investigated Manhattan
NMF (MahNMF) for low-rank and sparse matrix factorization of a
nonnegative matrix and developed an efficient algorithm to solve
MahNMF.
Since NMF and its extensions do not consider geometric
structure of a dataset, they perform unsatisfactorily in some tasks
such as clustering. To consider the local geometric structure of a
dataset in NMF, Cai et al. [8] proposed graph regularized
nonnegative matrix factorization (GNMF) which encodes the
geometric structure in a nearest neighbor (NN) graph for data
representation. Along this direction, Guan et al. [9] extended
GNMF to manifold-regularized discriminative NMF (MD-NMF)
to incorporate discriminative information in a dataset by using
margin maximization. The same authors proposed a nonnegative
patch alignment framework (NPAF) [10] to unify such NMF-based
nonlinear dimension reduction methods. Because the objective
functions of GNMF and NPAF are jointly non-convex with respect
to both factor matrices, their optimizations are difficult.
Similar to NMF, GNMF is NP-hard. It is impossible to obtain
its global minimum in polynomial time [11]. Fortunately, GNMF
is convex with respect to each factor matrix, i.e., the sub-problems
for updating individual factor matrix are convex, and thus it can
be solved by recursively updating both factor matrices in the frame
of block coordinate descent. Cai et al. [8] exploited the
multiplicative update rule (MUR) to update each factor matrix
alternately until convergence to a local minimum. MUR searches
one step along the rescaled negative gradient direction with a step
size setting to one. Since the step size is non-optimal, MUR does
not sufficiently utilize the convexity of the sub-problems of
Figure 1. Descent of both W and H along their rescaled negative gradient directions in MFGD.
doi:10.1371/journal.pone.0077162.g001
GNMF. Although both [12] and [13] can solve squared Euclidean
distance based NMF efficiently, they are not general enough to
optimize Kullback-Leibler (KL) divergence based GNMF.
Recently, Guan et al. [9] proposed a fast gradient descent (FGD)
method to accelerate MUR for KL-divergence based GNMF.
FGD searches the optimal step size along the rescaled negative
gradient direction by using Newtons method. Since FGD sets a
single step size for the whole factor matrix, it has the risk of
shrinking to MUR, i.e., the final step size shrinks to one. To
overcome this deficiency, Guan et al. [10] further proposed a
multiple step-size FGD (MFGD) method which sets a step size for
each row of W and each column of H, and searches the optimal
step size vector by using the multivariate Newtons method.
MFGD converges more rapidly than FGD, but the
dimensionalities of the Hessian matrices used in the line search procedures for
updating both factor matrices are too high, i.e., the Hessian
matrices are m6m-dimensional and n6n-dimensional for
optimizing W and H, respectively. Therefore, MFGD suffers from the
following two drawbacks: 1) both the Hessian inverse operators
and their multiplications with the corresponding gradients cost too
much computational time, and 2) the dense Hessian matrices
consume too much memory.
To overcome the aforementioned defic (...truncated)