Detecting functional modules in the yeast protein–protein interaction network (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/22/18/2283.full.pdf

Detecting functional modules in the yeast protein–protein interaction network

Jingchun Chen 0 Bo Yuan 0 0 Integrated Biomedical Science Graduate Program, Department of Biomedical Informatics and Department of Pharmacology, The Ohio State University , 333 W. 10th Avenue, Columbus, OH 43210 , USA Motivation: Identification of functional modules in protein interaction networks is a first step in understanding the organization and dynamics of cell functions. To ensure that the identified modules are biologically meaningful, network-partitioning algorithms should take into account not only topological features but also functional relationships, and identified modules should be rigorously validated. Results: In this study we first integrate proteomics and microarray datasets and represent the yeast protein-protein interaction network as a weighted graph. We then extend a betweenness-based partition algorithm, and use it to identify 266 functional modules in the yeast proteome network. For validation we show that the functional modules are indeed densely connected subgraphs. In addition, genes in the same functional module confer a similar phenotype. Furthermore, known protein complexes are largely contained in the functional modules in their entirety. We also analyze an example of a functional module and show that functional modules can be useful for gene annotation. Contact: Supplementary Information: Supplementary data are available at Bioinformatics online The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: 1 INTRODUCTION As a critical level of biology hierarchy, functional modules are cellular entities that perform certain biological functions, which are relatively independent from each other (Barabasi and Oltvai, 2004; Hartwell et al., 1999). Revealing modular structures in biological networks will help us in understanding how cells function (Hartwell et al., 1999; Bork et al., 2004). Many questions remain to be answered, but the detection of the functional modules is a preliminary step. Recently a number of network partition algorithms have been designed to find community and modular structures in complex networks. On the basis of shortest-path algorithm in graph theory, Girvan and Newman generalized the concept of vertex betweenness to edges to distinguish between inter-community edges and intracommunity edges. They designed an algorithm that iteratively removes the edges of the highest betweenness until a given network breaks into desired number of clusters (Girvan and Newman, 2002). Building on this work, Parisi and colleagues strengthened the definition of community and proposed a local topology-based concept of edge clustering coefficient to replace the global edge To whom correspondence should be addressed betweenness measurement (Radicchi et al., 2004). In another study, using shortest-distance as a metric, Rives and Galitski applied a hierarchical clustering algorithm to reveal the modular organization of yeast signaling networks (Rives and Galitski, 2003). Spirin and Mirny combined clique detection, superparamagnetic clustering (SPC) and Monte Carlo optimization (MC) to search for functional modules in the yeast protein network (Spirin and Mirny, 2003). Berg and Lassig used a probabilistic model to expand the motif concept and proposed a local graph alignment algorithm to detect such probabilistic motifs in the transcription network of Escherichia coli (Berg and Lassig, 2004). More recently, Xiong and colleagues applied an association pattern discovery method to find the hypercliques (functional modules) in the yeast proteome network (Xiong et al., 2005). One common theme shared by these work is that networks were represented as unweighted graphs. Even though they do capture essential features of many complex networks, unweighted graph representations will impose a big limitation on the study of biological networks. Proteinprotein interaction networks, in particular, have a very high degree of inter-module crosstalk (Rives and Galitski, 2003), which makes it very difficult to partition them using algorithms based solely on topology. Some recent works do take this into consideration and use weighted graph representations. Shamir and his colleagues applied a biclustering algorithm to the integrated genomic data to partition the molecular network of yeast (Tanay et al., 2004). However, their weighting scheme is applied on the bipartite graph to represent the level of association between genes and properties, not between pairs of interacting genes. Another interesting work is from Ouzouniss group (Pereira-Leal et al., 2004). They first transformed the yeast protein interaction network into a line graph, and then applied a graph flow-based clustering algorithm to find functional modules. In their work, the weight of an edge represents the level of confidence attributed to that interaction, which may not indicate the functional correlation between the two proteins. In recent years high-throughput studies have generated a huge amount of functional genomic data. In particular, microarray technology has been applied to study yeast gene expressions under all kinds of conditions, and the results of these studies are centralized for public access (Ball et al., 2005). It is therefore highly desirable to develop new methods that would take advantages of functional genomics information and partition proteinprotein interaction networks in a biologically more meaningful way. Here we report our study on detecting the functional modules in the proteinprotein interaction network of Saccharomyces cerevisiae. Our first goal was to develop an algorithm that partitions weighted graph into communities. Our next goal was to apply this new algorithm to find functional modules in the yeast protein protein interaction network and to rigorously validate these modules at both topological and functional level. We also wanted to assess the functional modules in the context of protein complexes and gene annotation. Our results indicate that (1) our algorithm is a useful tool in studying the modularity and organization of biological networks; (2) genes in the same functional module confer similar deletion phenotype; (3) known protein complexes are largely contained in the functional modules in their entirety and (4) module identification could be very useful for gene annotation. The proteinprotein interaction network of yeast Recently, several studies addressed the issue of confidence in the protein protein interaction dataset of Saccharomyces cerevisiae that were obtained by high-throughput techniques (Uetz et al., 2000; Ito et al., 2001; Ho et al., 2002), assigning each interaction a confidence score (von Mering et al., 2002; Bader et al., 2004; Patil and Nakamura, 2005). We downloaded these datasets from the publishers websites. We then selected from each of the datasets only high confidence interactions, which were then unioned together. After removing redundancy, the final dataset contains 10 899 intera (...truncated)