GRiP: a computational tool to simulate transcription factor binding in prokaryotes
Nicolae Radu Zabet
0
1
Boris Adryan
0
1
Associate Editor: Trey Ideker
0
Department of Genetics, University of Cambridge
, Downing Street, Cambridge CB2 3EH,
UK
1
Cambridge Systems Biology Centre, University of Cambridge
, Tennis Court Road, Cambridge CB2 1QR
Motivation: Transcription factors (TFs) are proteins that regulate gene activity by binding to specific sites on the DNA. Understanding the way these molecules locate their target site is of great importance in understanding gene regulation. We developed a comprehensive computational model of this process and estimated the model parameters in (N.R.Zabet and B.Adryan, submitted for publication). Results: GRiP (gene regulation in prokaryotes) is a highly versatile implementation of this model and simulates the search process in a computationally efficient way. This program aims to provide researchers in the field with a flexible and highly customizable simulation framework. Its features include representation of DNA sequence, TFs and the interaction between TFs and the DNA (facilitated diffusion mechanism), or between various TFs (cooperative behaviour). The software will record both information on the dynamics associated with the search process (locations of molecules) and also steady-state results (affinity landscape, occupancy-bias and collision hotspots). Availability: http://logic.sysbiol.cam.ac.uk/grip Contact: Supplementary information: Supplementary data are available at Bioinformatics online. The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 INTRODUCTION
It is well established now that transcription factor (TF) find their
target site through facilitated diffusion, a combination between 1D
random walk on the DNA and 3D diffusion in the cytoplasm (Berg
et al., 1981; Elf et al., 2007). Once bound to the DNA, TFs perform
three main types of movements: (i) sliding , (ii) hopping and (iii)
jumping (Mirny et al., 2009). The first two mechanisms, sliding and
hopping, assume that the TF performs small movements on the DNA
without releasing into the cytoplasm, whereas the third assumes a
3D diffusion in the cytoplasm before rebinding.
With few exceptions, most of the theoretical efforts have
been invested into analytical solutions of the facilitated diffusion
mechanism. If one wants to consider real DNA sequences and
dynamic crowding on the DNA (mobile roadblocks), then this
rules out analytical solutions. Computational methods and, in
particular, stochastic simulations overcome these limitations and
provide a more accurate mechanistic representation of the underling
biological process. In particular, these type of stochastic simulations
can be used to answer question related to how TFs perform the search
process. For example, one could investigate whether molecules
prefer to hop or to slide and what is the contribution of these two
alternative movements on the DNA to the overall 1D random walk
in a crowded environment.
Building on the comprehensive model constructed in (N.R.Zabet
and B.Adryan, submitted for publication), we developed GRiP
(gene regulation in prokaryotes), a program that allows stochastic
simulation of the search process of TFs for their target sites on the
DNA.
The analyzed systems can be large. For example, Escherichia.coli
K-12 has a 4.6 Mbp genome and there are 104 DNA binding
proteins (agents). To produce results within relative short time,
previous software had to either rely on coarse grain models
(Wunderlich and Mirny, 2008) or to consider small subsystems (Chu
et al., 2009). GRiP represents a new and efficient implementation
of the TF search process, which considers a highly detailed model
of 1D diffusion and, at the same time, it simulates at least 4
times faster than previous software (Barnes and Chu, 2010; Chu
et al., 2009). Consequently, by allowing genome-wide stochastic
simulations of a highly detailed model of facilitated diffusion, GRiP
can highlight possible biases in the results, where the level of details
was insufficient (coarse grain models) or the size of the analyzed
system was too small.
A few studies, such as Das and Kolomeisky (2010), addressed the
problem of facilitated diffusion through simulations focusing on the
3D diffusion rather than the 1D case. The 3D diffusion is time and
resource consuming, especially for simulations at the genome level.
van Zon et al. (2006) showed that the model based on the
zerodimensional Chemical Master Equation can reliably represent the
rate at which TFs associate non-specifically with the DNA, as long as
the model takes into account that once a molecule unbinds from the
DNA, it has a high probability of fast rebinding in close proximity.
This suggests that there is no need to simulate the 3D diffusion
explicitly, but rather have this replaced by a simple arrival rate and
ensuring that the model incorporates the fast rebinding probability
in the unbinding rate, a strategy which we also adopt.
We implemented the target finding process as a hybrid model
mixing agent-based methods with event driven stochastic simulation
algorithms (Gillespie, 1977). The software is implemented in Java
1.6, which ensures high portability.
N.R.Zabet and B.Adryan
In the simulator, each TF molecule is represented as an agent able
to perform certain actions, whereas the DNA molecule is modelled
as a string of base pairs (A, T, C, G). There is no measure of distance
between the molecules, but the TF molecules can be either free in
the cytoplasm or bound on the DNA at certain positions. The free
TF molecules have only one action available, namely to bind to the
DNA.
The cytoplasm is assumed to be a perfectly mixed reservoir from
where the free TF molecules can find the DNA at exponentially
distributed times. To simulate the 3D diffusion we use the Direct
Method implementation of Gillespie Algorithm (Gillespie, 1977)
which generates a statistically correct trajectory of the Master
Equation.
The model considers volume exclusion, allowing only one TF to
cover certain base pair at any specific time point. A bound molecule
will occupy a number of consecutive base pairs on the DNA. The
size on the DNA of each TF molecule is computed as the number
of base pairs of the DNA binding motif added to the number of
obstructed base pairs on the left side of the molecule and the number
of obstructed base pairs on the right side.
A feature which was not considered by previous models (Barnes
and Chu, 2010; Chu et al., 2009) is TF orientation on the DNA.
If TFs are not symmetric, the user can set TF molecules to have
two orientations on the DNA, which can lead to different affinities
depending on the molecule orientation. Whenever a TF binds to
the DNA, the sys (...truncated)