Inferring pleiotropy by network analysis: linked diseases in the human PPI network
Nguyen et al. BMC Systems Biology 2011, 5:179
http://www.biomedcentral.com/1752-0509/5/179
RESEARCH ARTICLE
Open Access
Inferring pleiotropy by network analysis: linked
diseases in the human PPI network
Thanh-Phuong Nguyen1, Wei-chung Liu2 and Ferenc Jordán1*
Abstract
Background: Earlier, we identified proteins connecting different disease proteins in the human protein-protein
interaction network and quantified their mediator role. An analysis of the networks of these mediators shows that
proteins connecting heart disease and diabetes largely overlap with the ones connecting heart disease and
obesity.
Results: We quantified their overlap, and based on the identified topological patterns, we inferred the structural
disease-relatedness of several proteins. Literature data provide a functional look of them, well supporting our
findings. For example, the inferred structurally important role of the PDZ domain-containing protein GIPC1 in
diabetes is supported despite the lack of this information in the Online Mendelian Inheritance in Man database.
Several key mediator proteins identified here clearly has pleiotropic effects, supported by ample evidence for their
general but always of only secondary importance.
Conclusions: We suggest that studying central nodes in mediator networks may contribute to better
understanding and quantifying pleiotropy. Network analysis provides potentially useful tools here, as well as helps
in improving databases.
Background
The systems perspective on complex biological systems
emphasizes that individual genes act in genetic networks
and individual proteins play their roles in protein-protein interaction (PPI) networks [1]. There is increasing
interest in these networks, as their analysis helps to
understand the relationship between the components (i.
e. genes, proteins) and how these are positioned in the
whole system. Well-connected hubs seem to be of high
functional importance [2,3]. Consequently, studies on
diseases based on PPI networks had the starting point
by analysing the centrality of disease proteins. Genes
associated with a particular phenotype or function are
not randomly positioned in the PPI network, but tend
to exhibit high connectivity; they may cluster together
and can occur in central network locations [4,5].
Beyond focusing on the number of neighbours of
graph nodes (their degree), wider neighbourhoods, indirect effects and larger subsets of nodes can also be
* Correspondence:
1
The Microsoft Research - University of Trento, Centre for Computational and
Systems Biology, Povo/Trento, Italy
Full list of author information is available at the end of the article
analyzed by the rich arsenal of network analytical tools.
This non-local information may help, for example, to
quantify the structural relationships between different
sets of proteins. In an earlier paper [6], we have determined proteins that mediate indirect effects between
sets of proteins causing five diseases in the human PPI
network. Their mediator role was quantified and they
were ranked according to structural importance. Their
functional role may be of high interest, as proteins
involved in certain pairs of diseases have no direct interactions among them [6]. These findings motivated an
appealing problem: „which proteins connect diseases in
the human PPI network?”.
To be connected to diverse regions of the PPI network
may lend a functionally pleiotropic character to a protein in a classical, genetic sense: it has been demonstrated that high connectivity correlates well with
pleiotropic effects [7,8]. The most central mediators are
especially important in connecting apparently distant
nodes in the human PPI network. Specific network positions may render strange but characteristic behaviour
(expression pattern) to different proteins [9,10]. Instead
of being exceptional, these epistatic effects may be of
© 2011 Nguyen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Nguyen et al. BMC Systems Biology 2011, 5:179
http://www.biomedcentral.com/1752-0509/5/179
primary importance in physiology [11] and in better
understanding animal development and adaptation.
In this paper, (1) we compare two interaction networks of mediators (mediating indirect effects between
heart disease and obesity, and between heart disease and
diabetes), (2) we analyse the structure of these two networks and their aggregated total network, (3) we study
the overlap between the two mediator networks, and (4)
we infer biological functions for some proteins and provide supporting literature data. All in all, we illustrate
that network analysis is an excellent tool for identifying
pleiotropy and epistasis from complex networks
extracted from multiple databases.
Results
Network analysis
We obtained 9 proteins involved in heart diseases (H),
as well as 44 and 20 involved in diabetes (D) and obesity
(O), respectively. The HD network contains N = 2142
nodes and L = 3537 links, while the HO network contains N = 1746 nodes and L = 2567 links and the total
network contains N = 2221 nodes and L = 3686 links.
Figure 1 provides a schematic illustration for how the
networks had been constructed (see Methods). Figure 2
shows the relationships between mediator proteins in
the HD (Figure 2a) and the HO (Figure 2b) networks.
The HD network (Figure 3a) contains 25 HD mediators
and their 2117 neighbours and the HO network (Figure
3b) contains 12 HO mediators and their 1734 neighbours. In the „total” network (Figure 4), 9 shared mediators appear, so it contains only 28 mediator proteins. In
this total network, 1667 nodes are present in both the
HD and the HO network, 475 only in the HD and 79
only in the HO network.
The distributions of individual structural indices are
very similar for all of the three analyzed networks. Additional file 1 shows all values of the six network indices
for all nodes in the three networks. Figure 5 shows
these distributions only for the total network. We can
observe that almost all indices follow a strongly leftskewed distribution where only a few nodes are extremely important. While degree (D), topological importance (TI) and betweenness centrality (BC) have really
only one or a few hubs, topological overlap (TO) indicates several key nodes. Closeness centrality (CC) has a
unimodal, normal-like distribution.
For each network, there seem to be strong and positive
rank correlation between all centrality indices but not
for the overlap indices (TO 3 0.01 and TO 3 0.005 ). TO
indices correlate positively and weakly with other centrality indices whereas they correlate negatively and
weakly with CC (see Table 1). D best correlates with
TI3 . The TO measure offers different, complementary
information than the centrality indices. (...truncated)