Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells
Felicia SL Ng
0
Judith Schu tte
0
David Ruau
0
Evangelia Diamanti
0
Rebecca Hannah
0
Sarah
0
J. Kinston
0
Berthold Go ttgens
0
0
Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University
,
Cambridge CB2 0XY
,
UK
*To whom correspondence should be addressed. Tel: +44 1223 336 829; Fax: +44 1223 762670; Email: C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
-
Combinatorial transcription factor (TF) binding is
essential for cell-type-specific gene regulation.
However, much remains to be learned about the
mechanisms of TF interactions, including to what extent
constrained spacing and orientation of interacting
TFs are critical for regulatory element activity. To
examine the relative prevalence of the enhanceosome
versus the TF collective model of combinatorial TF
binding, a comprehensive analysis of TF binding site
sequences in large scale datasets is necessary. We
developed a motif-pair discovery pipeline to identify
motif co-occurrences with preferential distance(s)
between motifs in TF-bound regions. Utilizing a
compendium of 289 mouse haematopoietic TF
ChIPseq datasets, we demonstrate that
haematopoieticrelated motif-pairs commonly occur with highly
conserved constrained spacing and orientation between
motifs. Furthermore, motif clustering revealed
specific associations for both heterotypic and
homotypic motif-pairs with particular haematopoietic cell
types. We also showed that disrupting the
spacing between motif-pairs significantly affects
transcriptional activity in a well-known motif-pairE-box
and GATA, and in two previously unknown
motifpairs with constrained spacingEts and Homeobox
as well as Ets and E-box. In this study, we
provide evidence for widespread sequence-specific TF
pair interaction with DNA that conforms to the
enhanceosome model, and furthermore identify
associations between specific haematopoietic cell-types
and motif-pairs.
INTRODUCTION
Transcription factors (TFs) are primary mediators of gene
regulation, and they have long been known as essential
regulators of cell fate decisions in the haematopoietic
system. TF proteins form complexes, bind regulatory DNA
sequences on enhancers and promoter regions and help to
recruit the basic transcriptional machinery to control the
expression of nearby genes. The interaction between two TFs
and the DNA therefore represents the most basic
component in understanding larger TF complex formations (1).
However, the molecular mechanisms by which such
complexes control gene expression are still largely unknown.
One of the best-understood enhancers controls expression
of the interferon- gene, where it is now recognised that
specificity in gene expression does not arise from the
cumulative effect of individual TF binding events but from
synergistic effects of multiple TFs mediating the assembly of a
higher order enhanceosome complex (2). Precise
combinations of TFs as well as the orientation and spacing between
TFs are therefore requirements for assembly of a
transcriptionally active enhanceosome in this particular instance.
However, a recent study in Drosophila suggests that none
of the above requirements are prevalent in the majority of
functionally validated enhancers. Instead, tightly controlled
gene expression is postulated to be achievable using flexible
spacing between TFs and redundancy in TF interaction (3).
Recent advances and improved cost-effectiveness in next
generation sequencing technology have greatly increased
the number of publicly available genome-wide TF
binding profiles generated by chromatin immunoprecipitation
coupled with sequencing (ChIP-seq). To date, hundreds of
datasets exist in the public domain that have been
generated by the haematopoiesis research community. We have
previously described the development of the HAEMCODE
compendium and web interface (4), which provides access
for the wider scientific community to several hundred,
carefully curated ChIP-seq datasets in mouse blood cells, thus
enabling comparative analysis of datasets generated in
multiple different laboratories. Similar resources have also been
generated for embryonic stem cells (5,6).
In mouse haematopoiesis, specific spacing between two
DNA binding motifs has previously been reported to be
functionally important. Examples include (i) E-box and
GATA motifs separated by 9bp and bound by TAL1 and
GATA factors (7) that are important for the
transcriptional activity of several erythroid gene regulatory
elements and (ii) the Ets and IRF motif-pair separated by 2bp
which occurs in gene regulatory sequences associated with
genes important for lymphoid development (8). At
genomescale, similar analyses have been conducted on ENCODE
datasets to show that binding of TF pairs can be spatially
constrained (9,10). To assess the relative prevalence of
spatially constrained binding versus the more relaxed model of
the TF collective, genome-scale studies coupled with
comprehensive statistical analysis and experimental validation
will be required. Given the pivotal importance of
combinatorial TF interactions in driving cell fate choices (1113),
research in this area not only has the potential to reveal new
mechanistic aspects of TF function, but also inform our
understanding of cell lineage specification during mammalian
development.
Community efforts such as the development of
HAEMCODE provide powerful new platforms for generating novel
hypothesis that will lead to a better understanding of TF
function. In this paper, we performed a comprehensive
analysis of motif co-occurrence making use of all TF-datasets
present in the HAEMCODE ChIP-seq compendium. By
modelling TF binding to DNA with position weight
matrices (PWMs), we were able to systematically predict
binding sites and quantify the spacing between motifs. To infer
TF pairs interacting with DNA, we chose an unbiased
approach by considering not just sequence motifs that
correspond to the TF precipitated in a particular ChIP-seq
experiment, but also all other sequence motifs that were
significantly enriched in the de novo motif discovery in a given
sample. In this manner, we were able to comprehensively
map instances of motif co-occurrence and quantify short
range distances (100 bp) between any pair of TFs across
a large number of TFs and cell types. Statistical analysis
indicated that TF partner choices are not random but are
instead closely linked to cell-type-specific function. Moreover,
experimental validation confirmed the functionality of two
previously unknown motif-pairs and their spacing,
involving the pairing of Ets with E-box and Ets with Homeobox
TFs respectively.
(...truncated)