Equilibrium model selection: dTTP induced R1 dimerization
BMC Systems Biology
BioMed Central
Methodology article
Open Access
Equilibrium model selection: dTTP induced R1 dimerization
Tomas Radivoyevitch
Address: Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106, USA
Email: Tomas Radivoyevitch -
Published: 4 February 2008
BMC Systems Biology 2008, 2:15
doi:10.1186/1752-0509-2-15
Received: 27 July 2007
Accepted: 4 February 2008
This article is available from: http://www.biomedcentral.com/1752-0509/2/15
© 2008 Radivoyevitch; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Biochemical equilibria are usually modeled iteratively: given one or a few fitted
models, if there is a lack of fit or over fitting, a new model with additional or fewer parameters is
then fitted, and the process is repeated. The problem with this approach is that different analysts
can propose and select different models and thus extract different binding parameter estimates
from the same data. An alternative is to first generate a comprehensive standardized list of plausible
models, and to then fit them exhaustively, or semi-exhaustively.
Results: A framework is presented in which equilibriums are modeled as pairs (g, h) where g = 0
maps total reactant concentrations (system inputs) into free reactant concentrations (system
states) which h then maps into expected values of measurements (system outputs). By letting
dissociation constants Kd be either freely estimated, infinity, zero, or equal to other Kd, and by
letting undamaged protein fractions be either freely estimated or 1, many g models are formed. A
standard space of g models for ligand-induced protein dimerization equilibria is given. Coupled to
an h model, the resulting (g, h) were fitted to dTTP induced R1 dimerization data (R1 is the large
subunit of ribonucleotide reductase). Models with the fewest parameters were fitted first.
Thereafter, upon fitting a batch, the next batch of models (with one more parameter) was fitted
only if the current batch yielded a model that was better (based on the Akaike Information
Criterion) than the best model in the previous batch (with one less parameter). Within batches
models were fitted in parallel. This semi-exhaustive approach yielded the same best models as an
exhaustive model space fit, but in approximately one-fifth the time.
Conclusion: Comprehensive model space based biochemical equilibrium model selection
methods are realizable. Their significance to systems biology as mappings of data into mathematical
models warrants their development.
Background
Ribonucleotide reductase (RNR) has a small subunit R2
that exists almost exclusively as a dimer, and a large subunit R1 that dimerizes when dTTP, dGTP, dATP, or ATP
binds to its specificity site, and hexamerizes when dATP or
ATP binds to its activity site [1-6]. Thus, R1 is the backbone of a biochemical equilibrium network that contains
a large number of R1 complexes. This network has more
dissociation constants (Kd) than can be estimated from
currently available data, so assumptions must be made to
reduce the number of independent Kd. These assumptions
come in two forms: those that state that for the data at
hand, a Kd is too large or small to be distinguished from
infinity or zero, respectively, and those that state that the
Page 1 of 13
(page number not for citation purposes)
BMC Systems Biology 2008, 2:15
http://www.biomedcentral.com/1752-0509/2/15
data are too weak to rule out a null hypothesis of the form
Kd = K ′d . Model parameters such as the fraction of R1
capable of forming dimers and hexamers, and the enzymatic activities of these R1 states, also come with plausible null hypotheses. In general, different null hypotheses
define different models that yield different estimates of
the freely estimated parameters. Unfortunately, as modelers traverse a path of reasonable hypotheses until they
arrive at a model that provides both a good fit and Kd confidence interval limits that are not too wide, they often
stop at different places, and thus report different Kd values.
Such Kd estimate extraction differences could be reduced,
if a systematic reproducible approach to biochemical
equilibria model building was established. Progress
toward this goal is described in this paper.
where all of the h specific parameters (e.g. kcat's and protein masses) are contained in the vector L and, if the yn are
vectors of measurements, the en are vectors of zero mean
noise, potentially correlated within steady states, but
uncorrelated between steady states; only scalar yn are considered hereafter. The model parameters K, p and L are not
indexed by n because they are fitted jointly to the entire
dataset, i.e. one set of estimates of these parameters
describes all N steady states simultaneously as one (g, h)
model of one underlying biochemical equilibrium network.
System models
The I equations of a system model g = 0 are
g 1(Fn , Tn , K , p) = pTn1 − Fn1
−
Results
Model
Consider a dataset comprised of N steady state non-covalent binding equilibriums indexed by n in which J different complexes can potentially form from a protein R of
known total concentration Tn1 through interactions with
itself and I - 1 other reactants (e.g. substrate, effectors and
other proteins) of known total concentrations Tni (1 <i ≤
I). Suppose Wij copies of the ith reactant exist in the jth
complex and that a particular R molecule is either undamaged with probability p, and thus capable of forming each
of the plausible complexes, or damaged with probability
1 - p, and thus incapable of forming any complexes.
Define Tn = (Tn1, Tn2, ...TnI), Fn = (Fn1, Fn2, ...FnI) as the corresponding free reactant concentrations, K = (K1, K2, ...KJ)
as the dissociation constants (of complexes to free reactants), yn as the measurement(s) made at the nth steady
state, and Zn = (Zn1, Zn2, ...ZnJ) as the concentrations of
complexes predicted by W, K and Fn to be
Wi′j
∏iI′ =1 Fni′
Z nj =
.
Kj
(1)
The relationship between the system inputs (Tn), states
(Fn) and outputs (yn) is then modeled by I total concentration constraints
g(Fn, Tn, K, p) = 0
that must be solved for the I free reactant concentrations
Fn at each n (1 <n ≤ N) given the inputs Tn, and an output
measurement model h that connects Fn to expected values
of the outputs E(yn)
yn = h(Fn, K, p, L) + εn
Wi′j
∏iI′ =1 Fni′
J
∑W
1j
Kj
j =1
=0
g i (Fn , Tn , K ) = Tni − Fni
J
−
∑W
ij
(2)
Wi′j
∏iI′ =1 Fni′
Kj
j =1
= 0 (1 < i ≤ I)
where pTn1 is the total concentration of undamaged R and
Fn1 is the concentration of free R that is undamaged and
thus capable of forming complexes. If all biologically
plausible candidate complexes are present in these (...truncated)