Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells

Nucleic Acids Research, Dec 2014

Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the ‘enhanceosome’ versus the ‘TF collective’ model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIP-seq datasets, we demonstrate that haematopoietic-related motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types. We also showed that disrupting the spacing between motif-pairs significantly affects transcriptional activity in a well-known motif-pair—E-box and GATA, and in two previously unknown motif-pairs with constrained spacing—Ets and Homeobox as well as Ets and E-box. In this study, we provide evidence for widespread sequence-specific TF pair interaction with DNA that conforms to the ‘enhanceosome’ model, and furthermore identify associations between specific haematopoietic cell-types and motif-pairs.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/42/22/13513.full.pdf

Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells

Felicia SL Ng 0 Judith Schu tte 0 David Ruau 0 Evangelia Diamanti 0 Rebecca Hannah 0 Sarah 0 J. Kinston 0 Berthold Go ttgens 0 0 Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University , Cambridge CB2 0XY , UK *To whom correspondence should be addressed. Tel: +44 1223 336 829; Fax: +44 1223 762670; Email: C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. - Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the enhanceosome versus the TF collective model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIPseq datasets, we demonstrate that haematopoieticrelated motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types. We also showed that disrupting the spacing between motif-pairs significantly affects transcriptional activity in a well-known motif-pairE-box and GATA, and in two previously unknown motifpairs with constrained spacingEts and Homeobox as well as Ets and E-box. In this study, we provide evidence for widespread sequence-specific TF pair interaction with DNA that conforms to the enhanceosome model, and furthermore identify associations between specific haematopoietic cell-types and motif-pairs. INTRODUCTION Transcription factors (TFs) are primary mediators of gene regulation, and they have long been known as essential regulators of cell fate decisions in the haematopoietic system. TF proteins form complexes, bind regulatory DNA sequences on enhancers and promoter regions and help to recruit the basic transcriptional machinery to control the expression of nearby genes. The interaction between two TFs and the DNA therefore represents the most basic component in understanding larger TF complex formations (1). However, the molecular mechanisms by which such complexes control gene expression are still largely unknown. One of the best-understood enhancers controls expression of the interferon- gene, where it is now recognised that specificity in gene expression does not arise from the cumulative effect of individual TF binding events but from synergistic effects of multiple TFs mediating the assembly of a higher order enhanceosome complex (2). Precise combinations of TFs as well as the orientation and spacing between TFs are therefore requirements for assembly of a transcriptionally active enhanceosome in this particular instance. However, a recent study in Drosophila suggests that none of the above requirements are prevalent in the majority of functionally validated enhancers. Instead, tightly controlled gene expression is postulated to be achievable using flexible spacing between TFs and redundancy in TF interaction (3). Recent advances and improved cost-effectiveness in next generation sequencing technology have greatly increased the number of publicly available genome-wide TF binding profiles generated by chromatin immunoprecipitation coupled with sequencing (ChIP-seq). To date, hundreds of datasets exist in the public domain that have been generated by the haematopoiesis research community. We have previously described the development of the HAEMCODE compendium and web interface (4), which provides access for the wider scientific community to several hundred, carefully curated ChIP-seq datasets in mouse blood cells, thus enabling comparative analysis of datasets generated in multiple different laboratories. Similar resources have also been generated for embryonic stem cells (5,6). In mouse haematopoiesis, specific spacing between two DNA binding motifs has previously been reported to be functionally important. Examples include (i) E-box and GATA motifs separated by 9bp and bound by TAL1 and GATA factors (7) that are important for the transcriptional activity of several erythroid gene regulatory elements and (ii) the Ets and IRF motif-pair separated by 2bp which occurs in gene regulatory sequences associated with genes important for lymphoid development (8). At genomescale, similar analyses have been conducted on ENCODE datasets to show that binding of TF pairs can be spatially constrained (9,10). To assess the relative prevalence of spatially constrained binding versus the more relaxed model of the TF collective, genome-scale studies coupled with comprehensive statistical analysis and experimental validation will be required. Given the pivotal importance of combinatorial TF interactions in driving cell fate choices (1113), research in this area not only has the potential to reveal new mechanistic aspects of TF function, but also inform our understanding of cell lineage specification during mammalian development. Community efforts such as the development of HAEMCODE provide powerful new platforms for generating novel hypothesis that will lead to a better understanding of TF function. In this paper, we performed a comprehensive analysis of motif co-occurrence making use of all TF-datasets present in the HAEMCODE ChIP-seq compendium. By modelling TF binding to DNA with position weight matrices (PWMs), we were able to systematically predict binding sites and quantify the spacing between motifs. To infer TF pairs interacting with DNA, we chose an unbiased approach by considering not just sequence motifs that correspond to the TF precipitated in a particular ChIP-seq experiment, but also all other sequence motifs that were significantly enriched in the de novo motif discovery in a given sample. In this manner, we were able to comprehensively map instances of motif co-occurrence and quantify short range distances (100 bp) between any pair of TFs across a large number of TFs and cell types. Statistical analysis indicated that TF partner choices are not random but are instead closely linked to cell-type-specific function. Moreover, experimental validation confirmed the functionality of two previously unknown motif-pairs and their spacing, involving the pairing of Ets with E-box and Ets with Homeobox TFs respectively. (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/42/22/13513.full.pdf
Article home page: http://nar.oxfordjournals.org/content/42/22/13513.abstract

Felicia SL Ng, Judith Schütte, David Ruau, Evangelia Diamanti, Rebecca Hannah, Sarah J. Kinston, Berthold Göttgens. Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells, Nucleic Acids Research, 2014, pp. 13513-13524, 42/22, DOI: 10.1093/nar/gku1254