Measuring the importance of vertices in the weighted human disease network
RESEARCH ARTICLE
Measuring the importance of vertices in the
weighted human disease network
Seyed Mehrzad Almasi, Ting Hu ID*
Department of Computer Science, Memorial University, St. John’s, NL, Canada
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Almasi SM, Hu T (2019) Measuring the
importance of vertices in the weighted human
disease network. PLoS ONE 14(3): e0205936.
https://doi.org/10.1371/journal.pone.0205936
Editor: Kwang-Il Goh, Korea University, KOREA,
REPUBLIC OF
Received: September 30, 2018
Accepted: February 26, 2019
Published: March 22, 2019
Copyright: © 2019 Almasi, Hu. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The source code for
computing the vertex centrality measure DIL-W is
provided at: https://github.com/MIBlab-MUN/
vertex-centrality-DILW.
Funding: TH acknowledges the Discovery Grant
RGPIN-2016-04699 from the National Sciences
and Engineering Research Council of Canada
(NSERC) (http://www.nserc-crsng.gc.ca/
ResearchPortal-PortailDeRecherche/InstructionsInstructions/DG-SD_eng.asp). The funders had no
role in study design, data collection and analysis,
decision to publish, or preparation of the
manuscript.
Abstract
Many human genetic disorders and diseases are known to be related to each other through
frequently observed co-occurrences. Studying the correlations among multiple diseases
provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network
science has seen increasing applications on modeling complex biological systems, and can
be a powerful tool to elucidate the correlations of multiple human diseases. In this article,
known disease-gene associations were represented using a weighted bipartite network. We
extracted a weighted human diseases network from such a bipartite network to show the
correlations of diseases. Subsequently, we proposed a new centrality measurement for the
weighted human disease network (WHDN) in order to quantify the importance of diseases.
Using our centrality measurement to quantify the importance of vertices in WHDN, we were
able to find a set of most central diseases. By investigating the 30 top diseases and their
most correlated neighbors in the network, we identified disease linkages including known
disease pairs and novel findings. Our research helps better understand the common
genetic origin of human diseases and suggests top diseases that likely induce other related
diseases.
Introduction
During the past decades, significant progress has been made in our understanding of human
diseases [1]. However, the genetic architectures of complex diseases are still largely unclear.
Many common diseases tend to be related to each other, and it is speculated that they may
share common genetic origin. Thus, studying the correlations of human diseases has the
potentials of better understanding the genotype to phenotype mapping [2, 3] and better predicting disease association genes [4, 5, 6, 7, 8]. Moreover, learning which diseases are correlated can help use existing drugs to treat multiple similar diseases [9, 10, 11, 12, 13].
Meanwhile, network science is a rising field where entities and their complex relationships
are studied on a global scale [14, 15, 16], and has seen increasing applications to perform
advanced analysis on biomedical data [17, 18, 19, 20, 21, 22, 23, 24]. There are various cellular
components in the human body that interact with each other within the same cell or across different cells [15]. A network called the human interactome can be constructed according to the
PLOS ONE | https://doi.org/10.1371/journal.pone.0205936 March 22, 2019
1 / 24
Centrality measure in the weighted human disease network
Competing interests: The authors have declared
that no competing interests exist.
interactions of those different cellular components. Each component can be represented as a
vertex in the network and interactions among them can be captured as links (or edges) connecting pairs of the cellular components. Those cellular components can be proteins or metabolites, and the network refers to protein-protein interaction (PPI) network [25, 26, 27] or
metabolic network [28, 29, 30].
Some studies aimed at identifying the correlations among diseases through network analysis [15, 31, 32]. Goh et al. [33] constructed a human disease network (HDN) by connecting
pairs of diseases when they share common association genes. Of 1,284 diseases in the HDN,
867 have at least one link to other diseases, and 516 form a giant component, suggesting that
the genetic origins of most diseases, to some extent, are shared with other diseases. Moreover,
the HDN naturally and visibly clustered according to major disease classes such as cancer
cluster and neurological disease cluster. Zhou et al. [34] extracted over twenty million bibliographic records from PubMed [35] in order to obtain 147,978 connections between 322 symptoms and 4,219 diseases. A human symptoms-disease network (HSDN) was then constructed
and was able to show the symptom similarity between all pairs of diseases (7,488,851 links) in
the network. The weight of links represented the similarity of symptoms between two diseases.
They showed that the correlations among diseases were significantly related to the genetic
associations that each pair of diseases had in common as well as the interactions between their
related proteins. Lee et al. [36] built a disease metabolism network in order to study disease
comorbidity for better disease prediction and prevention. Two diseases are connected if
enzymes associated with them catalyze adjacent metabolic reactions. Their results show that
diseases with higher degrees, i.e., connecting with many other diseases, have a higher rate of
prevalence and mortality.
Measuring the centrality of vertices helps identify important vertices in the network in
terms of connecting to all other vertices. Centrality measures have been used frequently to analyze biological networks over the past decades [37, 38, 39]. The most common centrality measures include degree (the total number of neighbors), closeness (the total distance to all other
vertices), and betweenness (the fraction of locating on the shortest paths of all pairs of vertices)
[40]. Despite wide applications in biological networks, these centrality measures are rather
general and may not be able to capture all the properties of vertices in the context of biological
networks. Furthermore, closeness and betweenness have high computational complexity due
to the fact that pair-wise shortest paths in a network need to be enumerated in order to compute the centralities. Therefore, carefully tailored and more efficient cen (...truncated)