A protein–protein interaction guided method for competitive transcription factor binding improves target predictions

Dec 2009

An important milestone in revealing cells' functions is to build a comprehensive understanding of transcriptional regulation processes. These processes are largely regulated by transcription factors (TFs) binding to DNA sites. Several TF binding site (TFBS) prediction methods have been developed, but they usually model binding of a single TF at a time albeit few methods for predicting binding of multiple TFs also exist. In this article, we propose a probabilistic model that predicts binding of several TFs simultaneously. Our method explicitly models the competitive binding between TFs and uses the prior knowledge of existing protein–protein interactions (PPIs), which mimics the situation in the nucleus. Modeling DNA binding for multiple TFs improves the accuracy of binding site prediction remarkably when compared with other programs and the cases where individual binding prediction results of separate TFs have been combined. The traditional TFBS prediction methods usually predict overwhelming number of false positives. This lack of specificity is overcome remarkably with our competitive binding prediction method. In addition, previously unpredictable binding sites can be detected with the help of PPIs. Source codes are available at http://www.cs.tut.fi/∼harrila/.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/37/22/e146.full.pdf

A protein–protein interaction guided method for competitive transcription factor binding improves target predictions

Kirsti Laurila 1 Olli Yli-Harja 1 Harri La hdesma ki 0 1 0 Department of Information and Computer Science, Helsinki University of Technology , P.O. Box 5400, FI-02015 TKK, Finland 1 Department of Signal Processing, Tampere University of Technology , P.O. Box 527, FI-33101 Tampere An important milestone in revealing cells' functions is to build a comprehensive understanding of transcriptional regulation processes. These processes are largely regulated by transcription factors (TFs) binding to DNA sites. Several TF binding site (TFBS) prediction methods have been developed, but they usually model binding of a single TF at a time albeit few methods for predicting binding of multiple TFs also exist. In this article, we propose a probabilistic model that predicts binding of several TFs simultaneously. Our method explicitly models the competitive binding between TFs and uses the prior knowledge of existing proteinprotein interactions (PPIs), which mimics the situation in the nucleus. Modeling DNA binding for multiple TFs improves the accuracy of binding site prediction remarkably when compared with other programs and the cases where individual binding prediction results of separate TFs have been combined. The traditional TFBS prediction methods usually predict overwhelming number of false positives. This lack of specificity is overcome remarkably with our competitive binding prediction method. In addition, previously unpredictable binding sites can be detected with the help of PPIs. Source codes are available at http://www.cs .tut.fi/ harrila/. - A significant proportion of cells functions is determined by transcription of genes. Thus, it is important to understand the transcriptional regulation which is to a large extent controlled by transcription factors (TFs) binding to DNA. DNA sites that are bound by a TF can be identified by experimental methods, such as electromobility shift assay (EMSA). Moreover, recent high-throughput methods including chromatin immunoprecipitation-chip (ChIP-chip) or -sequencing (ChIP-seq) have increased our knowledge of the TF binding sites (TFBSs) remarkably. However, these experimental techniques are laborious and limited by the specificity of antibodies and additionally, they allow to study only one protein at a time in certain conditions. Hence, computational TFBS prediction methods have an important role in revealing genome-wide transcriptional regulation. Most of the existing TFBS prediction methods consider the binding of a single TF at a time. These methods result in lot of false positive predictions as individual sequence motif models are sensitive but not very specific. Even though searching of all possible binding sites of one TF is important, it gives only a limited view of the whole transcription regulation processes of a cell. Rather than using only a single TF to regulate the expression of a gene, several TFs participate in the process in a combinatorial manner, in certain conditions and at the same time. Further, other DNA binding TFs are also present in the nucleus even though they may not regulate the gene of interest directly. If these TFs have accessible binding sites on the promoter of the studied gene, they can bind to DNA and block the binding of the other TFs. For example, in regulation of collagen type I (1) and in differentiation processes of hematopoietic stem cells (2), specific TFs can block the binding of other TFs that are participating in the regulation. Therefore, the transcription regulation process by TFs can be thought of as a competition between TFs. Those TFs that have the highest affinities to bind the sequence will, on average, win the competition of the binding site, but even those TFs that have lower affinities to this site have their *To whom correspondence should be addressed. Email: Correspondence may also be address to Harri La hdesma ki Tel: +358 3 3115 11; Fax: +358 33 115 4989 Email: chance as determined by the steady state of the physical binding competition. Competition of binding sites is also affected by explicit interactions between regulatory TFs. For these reasons, studying the binding of all different TFs simultaneously is biologically more realistic than combining the predictions made for individual TFs. A few schemas for predicting TFBS of multiple TFs at the same time already exist. These methods basically use two different approaches (3). The methods in the first category search for closely located binding sites as it is known that TFs interact with each other in the regulation process, and thus the TFBSs should be near to each other to allow interactions. These proximal TFBSs can then be applied to further searching and grouping to find regulating factors as has been done in (46). The other methods search for so-called cis-regulatory modules. These modules are clusters of binding sites for TFs that are known to affect expression together and to possibly interact with each other. Methods for searching cis-regulatory modules are presented, for example, in (7) where hidden Markov models and expectation maximization are used and in (8) which applies Gibbs sampler to the model. In this article, we present a new method for predicting binding of several TFs simultaneously. Our method makes Bayesian inference for integrated probabilistic sequence specificity models and TFBSs and uses the prior knowledge of existing proteinprotein interactions (PPIs) in prediction. Modeling results in a carefully constructed set of binding sites in the mouse genome show remarkable improvement compared with the cases where the individual prediction results of separate TFs have been combined. Especially the number of false binding sites is decreased significantly and previously unpredictable binding sites can be identified. A comparison with a widely used multiple TFBS prediction method, MSCAN (6), also shows the better performance of our model. MATERIALS AND METHODS MultiTF-PPI: a probabilistic model for competitive TF binding with PPIs We formulate a PPI guided probabilistic model for competitive TF binding prediction, MultiTF-PPI. The goal of our method is to develop a biologically realistic model that mimics the situation in the cell. Thus, we take into account the existence of several TFs in the regulating process and their cooperation in the form of explicit and implicit interactions. As the knowledge of existing PPIs is not always available, we also provide a version of our multiple TF predictor without PPIs, MultiTF. In our modeling schema, we explicitly model simultaneous binding of several TFs to the same DNA sequence, which corresponds the situation where a large number of TFs compete for the binding to the same sites on a promoter. The proposed MultiTF-PPI method uses a similar idea as our previously developed probabilistic TF binding prediction method (9) which was developed for analyzing binding of a single TF together with additional sequence-level information. Here, we apply Bayesian (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/37/22/e146.full.pdf
Article home page: http://nar.oxfordjournals.org/content/37/22/e146.abstract

Kirsti Laurila, Olli Yli-Harja, Harri Lähdesmäki. A protein–protein interaction guided method for competitive transcription factor binding improves target predictions, 2009, pp. e146-e146, 37/22, DOI: 10.1093/nar/gkp789