Parallel Nonnegative Matrix Factorization with Manifold Regularization (pdf)

Article PDF cannot be displayed. You can download it here:

http://downloads.hindawi.com/journals/jece/2018/6270816.pdf

Parallel Nonnegative Matrix Factorization with Manifold Regularization

Hindawi Journal of Electrical and Computer Engineering Volume 2018, Article ID 6270816, 10 pages https://doi.org/10.1155/2018/6270816 Research Article Parallel Nonnegative Matrix Factorization with Manifold Regularization Fudong Liu , Zheng Shan, and Yihang Chen State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, Henan 450001, China Correspondence should be addressed to Fudong Liu; Received 12 November 2017; Revised 7 February 2018; Accepted 15 March 2018; Published 2 May 2018 Academic Editor: Tongliang Liu Copyright © 2018 Fudong Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Nonnegative matrix factorization (NMF) decomposes a high-dimensional nonnegative matrix into the product of two reduced dimensional nonnegative matrices. However, conventional NMF neither qualifies large-scale datasets as it maintains all data in memory nor preserves the geometrical structure of data which is needed in some practical tasks. In this paper, we propose a parallel NMF with manifold regularization method (PNMF-M) to overcome the aforementioned deficiencies by parallelizing the manifold regularized NMF on distributed computing system. In particular, PNMF-M distributes both data samples and factor matrices to multiple computing nodes instead of loading the whole dataset in a single node and updates both factor matrices locally on each node. In this way, PNMF-M succeeds to resolve the pressure of memory consumption for large-scale datasets and to speed up the computation by parallelization. For constructing the adjacency matrix in manifold regularization, we propose a two-step distributed graph construction method, which is proved to be equivalent to the batch construction method. Experimental results on popular text corpora and image datasets demonstrate that PNMF-M significantly improves both scalability and time efficiency of conventional NMF thanks to the parallelization on distributed computing system; meanwhile it significantly enhances the representation ability of conventional NMF thanks to the incorporated manifold regularization. 1. Introduction Data representation is a fundamental problem in data analysis. A good representation typically uncovers the latent structure of a dataset by reducing the dimensionality of data. Several methods including principal component analysis (PCA), linear discriminant analysis (LDA), and vector quantization (VQ) have addressed this issue. Recently, nonnegative matrix factorization (NMF) [1] incorporates nonnegativity constraint to obtain parts-based representation of data, and thus it has been widely applied in many applications, such as document clustering [2, 3], image recognition [4, 5], audio processing [6], and video processing [7]. However, conventional NMF suffers from a few deficiencies: (1) conventional NMF usually works in batch mode and requires all data to reside in memory, and this leads to tremendous storage overhead as the increase of the data samples and (2) conventional NMF ignores the geometrical structure embedded in data and causes unsatisfactory representation ability. To overcome the first deficiency, either parallel or distributed algorithms have been proposed for NMF to fit for large-scale datasets. Kanjani [8] utilized multithreading to develop a parallel NMF (PNMF) based on multicore machine. Robila and Maciak [9] introduced two thread-level parallel versions for traditional multiplicative solution and adaptive projected gradient method. However, their methods are prohibited for large-scale datasets due to the memory limitation of a single computer. Liu et al. [10] proposed a distributed NMF (DNMF) to analyze large-scale web dyadic data and verified its effectiveness on distributed computing systems. Dong et al. [11] also attempted to design a PNMF based on the distributed memory platform with the message passing interface library. Although all the above parallel NMF algorithms achieve a considerable speedup in terms of scalability, they cannot consider the geometric structure in dataset. To overcome the second deficiency, Cai et al. [12] proposed graph-regularized NMF (GNMF) which extended conventional NMF by constructing an affinity graph [13] to encode the geometrical information of data and enhanced representation ability. Gu et al. [14] further 2 extended GNMF to avoid trivial solution and scale transfer problems by imposing a normalized cut-like constraint on the cluster assignment matrix. Lu et al. [15] incorporated manifold regularization into NMF for hyperspectral unmixing and obtained desirable unmixing performance. These improved algorithms get better representation ability but work inefficiently for large-scale datasets. Liu et al. [16] introduced the geometric structure of data into incremental NMF [17, 18] and utilized two efficient sparse approximations, buffering and random projected tree, to process large-scale datasets. Yu et al. [19] also presented an incremental GNMF algorithm to improve scalability. But these algorithms only performed well for incremental or streaming datasets and could not deal with large-scale batch datasets. In addition, Guan et al. [20] and Liu et al. [21], respectively, introduce Manhattan distance and large-cone penalty for NMF to improve representation and generalization ability. In conclusion, none of the above works can simultaneously overcome both deficiencies due to the great computation for calculating the decomposition and the storage requirement of the adjacency matrix. In this paper, we take the best of advantages of parallel NMF and manifold regularized NMF and design a parallel NMF with manifold regularization method (PNMF-M) by parallelizing manifold regularized NMF on distributed computing systems. In particular, PNMF-M distributes both data samples and factor matrices to multiple computing nodes in a balanced way and parallelizes the update for both factor matrices locally on each node. Since the graph construction is the bottleneck of the computation of PNMF-M, we adopt a two-step distributed graph construction method to compute the adjacency matrix and obtain an adjacent graph equivalent to that constructed in batch mode. Experimental results on popular text corpora and image datasets show that PNMF-M not only outperforms conventional NMF in terms of both scalability and time efficiency by parallelization, but also significantly enhances the representation ability of conventional NMF by incorporating manifold regularization. Journal of Electrical and Computer Engineering 𝑉 ∈ R𝑘×𝑛 . Given that there are 𝑝 processing units, both 𝑘 and 𝑛 are exactly divisible by 𝑝. They divide the above three matrices into 𝑝 equally sized submatrices by column; that is, 𝑋 = [𝑋1 , . . . , 𝑋𝑝 ], 𝑈 = [𝑈1 , . . . , 𝑈𝑝 ], and 𝑉 = [𝑉1 , . . . , 𝑉𝑝 ]. They still minimize t (...truncated)