Detecting functional modules in the yeast protein–protein interaction network
Jingchun Chen
0
Bo Yuan
0
0
Integrated Biomedical Science Graduate Program, Department of Biomedical Informatics and Department of Pharmacology, The Ohio State University
,
333 W. 10th Avenue, Columbus, OH 43210
,
USA
Motivation: Identification of functional modules in protein interaction networks is a first step in understanding the organization and dynamics of cell functions. To ensure that the identified modules are biologically meaningful, network-partitioning algorithms should take into account not only topological features but also functional relationships, and identified modules should be rigorously validated. Results: In this study we first integrate proteomics and microarray datasets and represent the yeast protein-protein interaction network as a weighted graph. We then extend a betweenness-based partition algorithm, and use it to identify 266 functional modules in the yeast proteome network. For validation we show that the functional modules are indeed densely connected subgraphs. In addition, genes in the same functional module confer a similar phenotype. Furthermore, known protein complexes are largely contained in the functional modules in their entirety. We also analyze an example of a functional module and show that functional modules can be useful for gene annotation. Contact: Supplementary Information: Supplementary data are available at Bioinformatics online The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email:
1 INTRODUCTION
As a critical level of biology hierarchy, functional modules are
cellular entities that perform certain biological functions, which
are relatively independent from each other (Barabasi and Oltvai,
2004; Hartwell et al., 1999). Revealing modular structures in
biological networks will help us in understanding how cells function
(Hartwell et al., 1999; Bork et al., 2004). Many questions remain to
be answered, but the detection of the functional modules is a
preliminary step.
Recently a number of network partition algorithms have been
designed to find community and modular structures in complex
networks. On the basis of shortest-path algorithm in graph theory,
Girvan and Newman generalized the concept of vertex betweenness
to edges to distinguish between inter-community edges and
intracommunity edges. They designed an algorithm that iteratively
removes the edges of the highest betweenness until a given network
breaks into desired number of clusters (Girvan and Newman,
2002). Building on this work, Parisi and colleagues strengthened
the definition of community and proposed a local topology-based
concept of edge clustering coefficient to replace the global edge
To whom correspondence should be addressed
betweenness measurement (Radicchi et al., 2004). In another study,
using shortest-distance as a metric, Rives and Galitski applied a
hierarchical clustering algorithm to reveal the modular organization
of yeast signaling networks (Rives and Galitski, 2003). Spirin and
Mirny combined clique detection, superparamagnetic clustering
(SPC) and Monte Carlo optimization (MC) to search for functional
modules in the yeast protein network (Spirin and Mirny, 2003). Berg
and Lassig used a probabilistic model to expand the motif concept
and proposed a local graph alignment algorithm to detect such
probabilistic motifs in the transcription network of Escherichia
coli (Berg and Lassig, 2004). More recently, Xiong and colleagues
applied an association pattern discovery method to find the
hypercliques (functional modules) in the yeast proteome network (Xiong
et al., 2005). One common theme shared by these work is that
networks were represented as unweighted graphs. Even though
they do capture essential features of many complex networks,
unweighted graph representations will impose a big limitation on
the study of biological networks. Proteinprotein interaction
networks, in particular, have a very high degree of inter-module
crosstalk (Rives and Galitski, 2003), which makes it very difficult to
partition them using algorithms based solely on topology. Some
recent works do take this into consideration and use weighted graph
representations. Shamir and his colleagues applied a biclustering
algorithm to the integrated genomic data to partition the molecular
network of yeast (Tanay et al., 2004). However, their weighting
scheme is applied on the bipartite graph to represent the level of
association between genes and properties, not between pairs of
interacting genes. Another interesting work is from Ouzouniss
group (Pereira-Leal et al., 2004). They first transformed the
yeast protein interaction network into a line graph, and then applied
a graph flow-based clustering algorithm to find functional modules.
In their work, the weight of an edge represents the level of
confidence attributed to that interaction, which may not indicate the
functional correlation between the two proteins. In recent years
high-throughput studies have generated a huge amount of functional
genomic data. In particular, microarray technology has been applied
to study yeast gene expressions under all kinds of conditions, and
the results of these studies are centralized for public access (Ball
et al., 2005). It is therefore highly desirable to develop new methods
that would take advantages of functional genomics information and
partition proteinprotein interaction networks in a biologically more
meaningful way.
Here we report our study on detecting the functional modules
in the proteinprotein interaction network of Saccharomyces
cerevisiae. Our first goal was to develop an algorithm that partitions
weighted graph into communities. Our next goal was to apply this
new algorithm to find functional modules in the yeast protein
protein interaction network and to rigorously validate these modules
at both topological and functional level. We also wanted to assess
the functional modules in the context of protein complexes and gene
annotation. Our results indicate that (1) our algorithm is a useful tool
in studying the modularity and organization of biological networks;
(2) genes in the same functional module confer similar deletion
phenotype; (3) known protein complexes are largely contained in
the functional modules in their entirety and (4) module
identification could be very useful for gene annotation.
The proteinprotein interaction network of yeast
Recently, several studies addressed the issue of confidence in the protein
protein interaction dataset of Saccharomyces cerevisiae that were obtained
by high-throughput techniques (Uetz et al., 2000; Ito et al., 2001; Ho et al.,
2002), assigning each interaction a confidence score (von Mering et al.,
2002; Bader et al., 2004; Patil and Nakamura, 2005). We downloaded
these datasets from the publishers websites. We then selected from each
of the datasets only high confidence interactions, which were then unioned
together. After removing redundancy, the final dataset contains 10 899
intera (...truncated)