A unified framework for link prediction based on non-negative matrix factorization with coupling multivariate information

PLOS ONE, Nov 2018

Many link prediction methods have been developed to infer unobserved links or predict missing links based on the observed network structure that is always incomplete and subject to interfering noise. Thus, the performance of existing methods is usually limited in that their computation depends only on input graph structures, and they do not consider external information. The effects of social influence and homophily suggest that both network structure and node attribute information should help to resolve the task of link prediction. This work proposes SASNMF, a link prediction unified framework based on non-negative matrix factorization that considers not only graph structure but also the internal and external auxiliary information, which refers to both the node attributes and the structural latent feature information extracted from the network. Furthermore, three different combinations of internal and external information are proposed and input into the framework to solve the link prediction problem. Extensive experimental results on thirteen real networks, five node attribute networks and eight non-attribute networks show that the proposed framework has competitive performance compared with benchmark methods and state-of-the-art methods, indicating the superiority of the presented algorithm.

A unified framework for link prediction based on non-negative matrix factorization with coupling multivariate information

RESEARCH ARTICLE A unified framework for link prediction based on non-negative matrix factorization with coupling multivariate information Wenjun Wang1, Minghu Tang ID1,2*, Pengfei Jiao1 1 School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China, 2 School of Computer Science and Technology, Qinghai Nationalities University, Qinghai, China a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Wang W, Tang M, Jiao P (2018) A unified framework for link prediction based on nonnegative matrix factorization with coupling multivariate information. PLoS ONE 13(11): e0208185. https://doi.org/10.1371/journal. pone.0208185 Editor: Ivan Olier, Liverpool John Moores University, UNITED KINGDOM Received: May 13, 2018 Accepted: November 13, 2018 Published: November 29, 2018 Copyright: © 2018 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: This work was supported by the Major Project of National Social Science Fundation of China (14ZDB153),the major research plan of the National Natural Science Foundation of China (91746205,91746107,91224009,51438009), the research project of applied basic of Qinghai Province(2018-ZJ-707). The funders had no role in * Abstract Many link prediction methods have been developed to infer unobserved links or predict missing links based on the observed network structure that is always incomplete and subject to interfering noise. Thus, the performance of existing methods is usually limited in that their computation depends only on input graph structures, and they do not consider external information. The effects of social influence and homophily suggest that both network structure and node attribute information should help to resolve the task of link prediction. This work proposes SASNMF, a link prediction unified framework based on non-negative matrix factorization that considers not only graph structure but also the internal and external auxiliary information, which refers to both the node attributes and the structural latent feature information extracted from the network. Furthermore, three different combinations of internal and external information are proposed and input into the framework to solve the link prediction problem. Extensive experimental results on thirteen real networks, five node attribute networks and eight non-attribute networks show that the proposed framework has competitive performance compared with benchmark methods and state-of-the-art methods, indicating the superiority of the presented algorithm. Introduction As a very important research direction in complex networks, link prediction is attracting a large number of researchers from different disciplines, including computer science, biology, physics and sociology, because of its wide application. It aims to infer the likelihood of the existence of a link between two nodes unconnected by means of the known structure information in the network [1–3]. Link prediction can be used to explore the evolution mechanism of the network [4,5], recommend trusted partners in business trade [6], recommend travel hotspots [7,8], mine suspects in counterterrorism networks [9–11], analyse criminal networks [12,13] and so on. In recent years, with the development of complex network research, people have proposed many ways to predict the links for specific networks in different fields from various PLOS ONE | https://doi.org/10.1371/journal.pone.0208185 November 29, 2018 1 / 22 Link prediction based on NMF with multivariate attributes study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. perspectives [14–16]. In simple terms, the existing methods for link prediction can be divided into three categories: unsupervised, supervised and other mixed methods. i) The first computes similarity scores between two nodes based on the known topological structure of the network. It is one of the most widely used methods in recent years and methods such as Common neighbour(CN), Adamic-Adar index(AA), and Resource Allocation index(RA), became the baseline for judging new methods [1]. This kind of method only depends on the information of known topology structure in network. Therefore, its prediction results are easily affected by network data sparsity (The number of edges known to be present is often significantly less than the number of edges known to be absent.). In fact, this is still the biggest challenge in the current research of link prediction. ii) The supervised approaches, on the other hand, attempt to be directly predictive of link behaviour. They generally need to find the characteristics of the node interaction and learn latent features from the topological structure of network [17–19]. Our work is to use this method to achieve multiple attribute fusion techniques to improve prediction performance. iii) The mixed methods include many methods, such as those mainly based on the probability model, perturbation-based frameworks, and matrix completion, etc. The probability model is inherently high cost in computational complexity since its application is limited [20,21]. In addition, structural perturbation-based and matrix completion methods are the most recently proposed the state-of-the-art approaches. Lü LY et al. [22] assumed that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. Based on the perturbation of the adjacency matrix, they proposed a universal structural consistency index that is free of prior knowledge of the network organisation. Furthermore, Xu XY [23] and Wang WJ et al. [24] proposed a perturbation framework based on matrix decomposition for link prediction. On the other hand, Pech Ratha et al. [25] proposed a method for link prediction based on matrix completion. Although these methods can achieve prediction tasks, there is still a shortcomings of insufficient useful information to some extent. Moreover, they are always challenged by high computational costs and data sparsity and network noise. In addition, with the increase of data scale, how the proposed method can be scalable, transplantable and robust in large-scale networks becomes the evaluation basis of the algorithm. Therefore, how to mine the network features, solve the above challenges and improve the performance of link prediction become the main concerns in this paper. In fact, a complex network is an abstraction of real world, where the nodes represent entities that have ver (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0208185&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208185

Wenjun Wang, Minghu Tang, Pengfei Jiao. A unified framework for link prediction based on non-negative matrix factorization with coupling multivariate information, PLOS ONE, 2018, Volume 13, Issue 11, DOI: 10.1371/journal.pone.0208185