A unified framework for link prediction based on non-negative matrix factorization with coupling multivariate information
RESEARCH ARTICLE
A unified framework for link prediction based
on non-negative matrix factorization with
coupling multivariate information
Wenjun Wang1, Minghu Tang ID1,2*, Pengfei Jiao1
1 School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University,
Tianjin, China, 2 School of Computer Science and Technology, Qinghai Nationalities University, Qinghai,
China
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Wang W, Tang M, Jiao P (2018) A unified
framework for link prediction based on nonnegative matrix factorization with coupling
multivariate information. PLoS ONE 13(11):
e0208185. https://doi.org/10.1371/journal.
pone.0208185
Editor: Ivan Olier, Liverpool John Moores
University, UNITED KINGDOM
Received: May 13, 2018
Accepted: November 13, 2018
Published: November 29, 2018
Copyright: © 2018 Wang et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Data Availability
Statement: All relevant data are within the paper
and its Supporting Information files.
Funding: This work was supported by the Major
Project of National Social Science Fundation of
China (14ZDB153),the major research plan of the
National Natural Science Foundation of China
(91746205,91746107,91224009,51438009), the
research project of applied basic of Qinghai
Province(2018-ZJ-707). The funders had no role in
*
Abstract
Many link prediction methods have been developed to infer unobserved links or predict
missing links based on the observed network structure that is always incomplete and subject
to interfering noise. Thus, the performance of existing methods is usually limited in that their
computation depends only on input graph structures, and they do not consider external
information. The effects of social influence and homophily suggest that both network structure and node attribute information should help to resolve the task of link prediction. This
work proposes SASNMF, a link prediction unified framework based on non-negative matrix
factorization that considers not only graph structure but also the internal and external auxiliary information, which refers to both the node attributes and the structural latent feature
information extracted from the network. Furthermore, three different combinations of internal and external information are proposed and input into the framework to solve the link prediction problem. Extensive experimental results on thirteen real networks, five node attribute
networks and eight non-attribute networks show that the proposed framework has competitive performance compared with benchmark methods and state-of-the-art methods, indicating the superiority of the presented algorithm.
Introduction
As a very important research direction in complex networks, link prediction is attracting a
large number of researchers from different disciplines, including computer science, biology,
physics and sociology, because of its wide application. It aims to infer the likelihood of the existence of a link between two nodes unconnected by means of the known structure information
in the network [1–3]. Link prediction can be used to explore the evolution mechanism of the
network [4,5], recommend trusted partners in business trade [6], recommend travel hotspots
[7,8], mine suspects in counterterrorism networks [9–11], analyse criminal networks [12,13]
and so on.
In recent years, with the development of complex network research, people have proposed
many ways to predict the links for specific networks in different fields from various
PLOS ONE | https://doi.org/10.1371/journal.pone.0208185 November 29, 2018
1 / 22
Link prediction based on NMF with multivariate attributes
study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
perspectives [14–16]. In simple terms, the existing methods for link prediction can be divided
into three categories: unsupervised, supervised and other mixed methods. i) The first computes similarity scores between two nodes based on the known topological structure of the network. It is one of the most widely used methods in recent years and methods such as Common
neighbour(CN), Adamic-Adar index(AA), and Resource Allocation index(RA), became the
baseline for judging new methods [1]. This kind of method only depends on the information
of known topology structure in network. Therefore, its prediction results are easily affected by
network data sparsity (The number of edges known to be present is often significantly less
than the number of edges known to be absent.). In fact, this is still the biggest challenge in the
current research of link prediction. ii) The supervised approaches, on the other hand, attempt
to be directly predictive of link behaviour. They generally need to find the characteristics of the
node interaction and learn latent features from the topological structure of network [17–19].
Our work is to use this method to achieve multiple attribute fusion techniques to improve prediction performance. iii) The mixed methods include many methods, such as those mainly
based on the probability model, perturbation-based frameworks, and matrix completion, etc.
The probability model is inherently high cost in computational complexity since its application
is limited [20,21]. In addition, structural perturbation-based and matrix completion methods
are the most recently proposed the state-of-the-art approaches. Lü LY et al. [22] assumed that
the regularity of a network is reflected in the consistency of structural features before and after
a random removal of a small set of links. Based on the perturbation of the adjacency matrix,
they proposed a universal structural consistency index that is free of prior knowledge of the
network organisation. Furthermore, Xu XY [23] and Wang WJ et al. [24] proposed a perturbation framework based on matrix decomposition for link prediction. On the other hand, Pech
Ratha et al. [25] proposed a method for link prediction based on matrix completion.
Although these methods can achieve prediction tasks, there is still a shortcomings of insufficient useful information to some extent. Moreover, they are always challenged by high computational costs and data sparsity and network noise. In addition, with the increase of data scale,
how the proposed method can be scalable, transplantable and robust in large-scale networks
becomes the evaluation basis of the algorithm. Therefore, how to mine the network features,
solve the above challenges and improve the performance of link prediction become the main
concerns in this paper.
In fact, a complex network is an abstraction of real world, where the nodes represent entities that have ver (...truncated)