GRiP: a computational tool to simulate transcription factor binding in prokaryotes (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/28/9/1287.full.pdf

GRiP: a computational tool to simulate transcription factor binding in prokaryotes

Nicolae Radu Zabet 0 1 Boris Adryan 0 1 Associate Editor: Trey Ideker 0 Department of Genetics, University of Cambridge , Downing Street, Cambridge CB2 3EH, UK 1 Cambridge Systems Biology Centre, University of Cambridge , Tennis Court Road, Cambridge CB2 1QR Motivation: Transcription factors (TFs) are proteins that regulate gene activity by binding to specific sites on the DNA. Understanding the way these molecules locate their target site is of great importance in understanding gene regulation. We developed a comprehensive computational model of this process and estimated the model parameters in (N.R.Zabet and B.Adryan, submitted for publication). Results: GRiP (gene regulation in prokaryotes) is a highly versatile implementation of this model and simulates the search process in a computationally efficient way. This program aims to provide researchers in the field with a flexible and highly customizable simulation framework. Its features include representation of DNA sequence, TFs and the interaction between TFs and the DNA (facilitated diffusion mechanism), or between various TFs (cooperative behaviour). The software will record both information on the dynamics associated with the search process (locations of molecules) and also steady-state results (affinity landscape, occupancy-bias and collision hotspots). Availability: http://logic.sysbiol.cam.ac.uk/grip Contact: Supplementary information: Supplementary data are available at Bioinformatics online. The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 1 INTRODUCTION It is well established now that transcription factor (TF) find their target site through facilitated diffusion, a combination between 1D random walk on the DNA and 3D diffusion in the cytoplasm (Berg et al., 1981; Elf et al., 2007). Once bound to the DNA, TFs perform three main types of movements: (i) sliding , (ii) hopping and (iii) jumping (Mirny et al., 2009). The first two mechanisms, sliding and hopping, assume that the TF performs small movements on the DNA without releasing into the cytoplasm, whereas the third assumes a 3D diffusion in the cytoplasm before rebinding. With few exceptions, most of the theoretical efforts have been invested into analytical solutions of the facilitated diffusion mechanism. If one wants to consider real DNA sequences and dynamic crowding on the DNA (mobile roadblocks), then this rules out analytical solutions. Computational methods and, in particular, stochastic simulations overcome these limitations and provide a more accurate mechanistic representation of the underling biological process. In particular, these type of stochastic simulations can be used to answer question related to how TFs perform the search process. For example, one could investigate whether molecules prefer to hop or to slide and what is the contribution of these two alternative movements on the DNA to the overall 1D random walk in a crowded environment. Building on the comprehensive model constructed in (N.R.Zabet and B.Adryan, submitted for publication), we developed GRiP (gene regulation in prokaryotes), a program that allows stochastic simulation of the search process of TFs for their target sites on the DNA. The analyzed systems can be large. For example, Escherichia.coli K-12 has a 4.6 Mbp genome and there are 104 DNA binding proteins (agents). To produce results within relative short time, previous software had to either rely on coarse grain models (Wunderlich and Mirny, 2008) or to consider small subsystems (Chu et al., 2009). GRiP represents a new and efficient implementation of the TF search process, which considers a highly detailed model of 1D diffusion and, at the same time, it simulates at least 4 times faster than previous software (Barnes and Chu, 2010; Chu et al., 2009). Consequently, by allowing genome-wide stochastic simulations of a highly detailed model of facilitated diffusion, GRiP can highlight possible biases in the results, where the level of details was insufficient (coarse grain models) or the size of the analyzed system was too small. A few studies, such as Das and Kolomeisky (2010), addressed the problem of facilitated diffusion through simulations focusing on the 3D diffusion rather than the 1D case. The 3D diffusion is time and resource consuming, especially for simulations at the genome level. van Zon et al. (2006) showed that the model based on the zerodimensional Chemical Master Equation can reliably represent the rate at which TFs associate non-specifically with the DNA, as long as the model takes into account that once a molecule unbinds from the DNA, it has a high probability of fast rebinding in close proximity. This suggests that there is no need to simulate the 3D diffusion explicitly, but rather have this replaced by a simple arrival rate and ensuring that the model incorporates the fast rebinding probability in the unbinding rate, a strategy which we also adopt. We implemented the target finding process as a hybrid model mixing agent-based methods with event driven stochastic simulation algorithms (Gillespie, 1977). The software is implemented in Java 1.6, which ensures high portability. N.R.Zabet and B.Adryan In the simulator, each TF molecule is represented as an agent able to perform certain actions, whereas the DNA molecule is modelled as a string of base pairs (A, T, C, G). There is no measure of distance between the molecules, but the TF molecules can be either free in the cytoplasm or bound on the DNA at certain positions. The free TF molecules have only one action available, namely to bind to the DNA. The cytoplasm is assumed to be a perfectly mixed reservoir from where the free TF molecules can find the DNA at exponentially distributed times. To simulate the 3D diffusion we use the Direct Method implementation of Gillespie Algorithm (Gillespie, 1977) which generates a statistically correct trajectory of the Master Equation. The model considers volume exclusion, allowing only one TF to cover certain base pair at any specific time point. A bound molecule will occupy a number of consecutive base pairs on the DNA. The size on the DNA of each TF molecule is computed as the number of base pairs of the DNA binding motif added to the number of obstructed base pairs on the left side of the molecule and the number of obstructed base pairs on the right side. A feature which was not considered by previous models (Barnes and Chu, 2010; Chu et al., 2009) is TF orientation on the DNA. If TFs are not symmetric, the user can set TF molecules to have two orientations on the DNA, which can lead to different affinities depending on the molecule orientation. Whenever a TF binds to the DNA, the sys (...truncated)