Measuring the importance of vertices in the weighted human disease network

PLOS ONE, Mar 2019

Many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network science has seen increasing applications on modeling complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases. In this article, known disease-gene associations were represented using a weighted bipartite network. We extracted a weighted human diseases network from such a bipartite network to show the correlations of diseases. Subsequently, we proposed a new centrality measurement for the weighted human disease network (WHDN) in order to quantify the importance of diseases. Using our centrality measurement to quantify the importance of vertices in WHDN, we were able to find a set of most central diseases. By investigating the 30 top diseases and their most correlated neighbors in the network, we identified disease linkages including known disease pairs and novel findings. Our research helps better understand the common genetic origin of human diseases and suggests top diseases that likely induce other related diseases.

Measuring the importance of vertices in the weighted human disease network

RESEARCH ARTICLE Measuring the importance of vertices in the weighted human disease network Seyed Mehrzad Almasi, Ting Hu ID* Department of Computer Science, Memorial University, St. John’s, NL, Canada * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Almasi SM, Hu T (2019) Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3): e0205936. https://doi.org/10.1371/journal.pone.0205936 Editor: Kwang-Il Goh, Korea University, KOREA, REPUBLIC OF Received: September 30, 2018 Accepted: February 26, 2019 Published: March 22, 2019 Copyright: © 2019 Almasi, Hu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The source code for computing the vertex centrality measure DIL-W is provided at: https://github.com/MIBlab-MUN/ vertex-centrality-DILW. Funding: TH acknowledges the Discovery Grant RGPIN-2016-04699 from the National Sciences and Engineering Research Council of Canada (NSERC) (http://www.nserc-crsng.gc.ca/ ResearchPortal-PortailDeRecherche/InstructionsInstructions/DG-SD_eng.asp). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Abstract Many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network science has seen increasing applications on modeling complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases. In this article, known disease-gene associations were represented using a weighted bipartite network. We extracted a weighted human diseases network from such a bipartite network to show the correlations of diseases. Subsequently, we proposed a new centrality measurement for the weighted human disease network (WHDN) in order to quantify the importance of diseases. Using our centrality measurement to quantify the importance of vertices in WHDN, we were able to find a set of most central diseases. By investigating the 30 top diseases and their most correlated neighbors in the network, we identified disease linkages including known disease pairs and novel findings. Our research helps better understand the common genetic origin of human diseases and suggests top diseases that likely induce other related diseases. Introduction During the past decades, significant progress has been made in our understanding of human diseases [1]. However, the genetic architectures of complex diseases are still largely unclear. Many common diseases tend to be related to each other, and it is speculated that they may share common genetic origin. Thus, studying the correlations of human diseases has the potentials of better understanding the genotype to phenotype mapping [2, 3] and better predicting disease association genes [4, 5, 6, 7, 8]. Moreover, learning which diseases are correlated can help use existing drugs to treat multiple similar diseases [9, 10, 11, 12, 13]. Meanwhile, network science is a rising field where entities and their complex relationships are studied on a global scale [14, 15, 16], and has seen increasing applications to perform advanced analysis on biomedical data [17, 18, 19, 20, 21, 22, 23, 24]. There are various cellular components in the human body that interact with each other within the same cell or across different cells [15]. A network called the human interactome can be constructed according to the PLOS ONE | https://doi.org/10.1371/journal.pone.0205936 March 22, 2019 1 / 24 Centrality measure in the weighted human disease network Competing interests: The authors have declared that no competing interests exist. interactions of those different cellular components. Each component can be represented as a vertex in the network and interactions among them can be captured as links (or edges) connecting pairs of the cellular components. Those cellular components can be proteins or metabolites, and the network refers to protein-protein interaction (PPI) network [25, 26, 27] or metabolic network [28, 29, 30]. Some studies aimed at identifying the correlations among diseases through network analysis [15, 31, 32]. Goh et al. [33] constructed a human disease network (HDN) by connecting pairs of diseases when they share common association genes. Of 1,284 diseases in the HDN, 867 have at least one link to other diseases, and 516 form a giant component, suggesting that the genetic origins of most diseases, to some extent, are shared with other diseases. Moreover, the HDN naturally and visibly clustered according to major disease classes such as cancer cluster and neurological disease cluster. Zhou et al. [34] extracted over twenty million bibliographic records from PubMed [35] in order to obtain 147,978 connections between 322 symptoms and 4,219 diseases. A human symptoms-disease network (HSDN) was then constructed and was able to show the symptom similarity between all pairs of diseases (7,488,851 links) in the network. The weight of links represented the similarity of symptoms between two diseases. They showed that the correlations among diseases were significantly related to the genetic associations that each pair of diseases had in common as well as the interactions between their related proteins. Lee et al. [36] built a disease metabolism network in order to study disease comorbidity for better disease prediction and prevention. Two diseases are connected if enzymes associated with them catalyze adjacent metabolic reactions. Their results show that diseases with higher degrees, i.e., connecting with many other diseases, have a higher rate of prevalence and mortality. Measuring the centrality of vertices helps identify important vertices in the network in terms of connecting to all other vertices. Centrality measures have been used frequently to analyze biological networks over the past decades [37, 38, 39]. The most common centrality measures include degree (the total number of neighbors), closeness (the total distance to all other vertices), and betweenness (the fraction of locating on the shortest paths of all pairs of vertices) [40]. Despite wide applications in biological networks, these centrality measures are rather general and may not be able to capture all the properties of vertices in the context of biological networks. Furthermore, closeness and betweenness have high computational complexity due to the fact that pair-wise shortest paths in a network need to be enumerated in order to compute the centralities. Therefore, carefully tailored and more efficient cen (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0205936&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205936

Seyed Mehrzad Almasi, Ting Hu. Measuring the importance of vertices in the weighted human disease network, PLOS ONE, 2019, Volume 14, Issue 3, DOI: 10.1371/journal.pone.0205936