BART: a transcription factor prediction tool with query gene sets or epigenomic profiles

Bioinformatics, Aug 2018

Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive enough to detect factors binding at distal enhancers. Here, we present binding analysis for regulation of transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for functional genomics research.

BART: a transcription factor prediction tool with query gene sets or epigenomic profiles

Bioinformatics, 34(16), 2018, 2867–2869 doi: 10.1093/bioinformatics/bty194 Advance Access Publication Date: 28 March 2018 Applications Note Data and text mining BART: a transcription factor prediction tool with query gene sets or epigenomic profiles 1 Center for Public Health Genomics, 2Department of Biomedical Engineering, 3Department of Public Health Sciences, 4Department of Biochemistry and Molecular Genetics and 5Cancer Center, University of Virginia, Charlottesville, VA 22908, USA *To whom correspondence should be addressed. Associate Editor: Jonathan Wren Received on December 18, 2017; revised on March 9, 2018; editorial decision on March 18, 2018; accepted on March 27, 2018 Abstract Summary: Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive enough to detect factors binding at distal enhancers. Here, we present binding analysis for regulation of transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for functional genomics research. Availability and implementation: BART is implemented in Python and available at http://faculty.vir ginia.edu/zanglab/bart. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction Transcriptional regulation of gene expression plays a critical role in many cellular processes, including cancer development and progression (Bradner et al., 2017; Lambert et al., 2018). Identification of functional transcription factors is essential for understanding gene regulatory mechanisms in such processes. In gene expression profiling studies, ontology and pathway analyses (Huang et al., 2009; McLean et al., 2010; Subramanian et al., 2005) can identify functional annotations of differentially expressed genes; however, this approach is unable to predict transcription factors that regulate those gene sets. Most existing methods for cis-regulatory prediction rely upon detecting overrepresented DNA sequence motifs near the gene promoters to identify sequence-specific DNA-binding factors (Boeva, 2016). Such methods are limited by the context-specific nature of transcription factor activity and by multiple factors sharing similar motifs (Jolma et al., 2013). Moreover, most cis-regulatory events in mammalian genomes occur at distal enhancers, which cover much larger regions than promoters but without direct assignment to target genes; these regions are usually difficult to capture by motif scan alone (Shlyueva et al., 2014). Several methods have been developed to overcome these limitations of motif-based, promoter-biased approaches using comprehensive epigenomic information (Dozmorov, 2017), such as DNaseI hypersensitive sites (Sheffield et al., 2013). Model-based analysis of regulation of gene expression (MARGE) is a method developed for modeling differential gene expression using a compendium of public H3K27ac ChIP-seq datasets (Wang et al., 2016). By quantifying the regulatory potential of active enhancer histone mark H3K27ac on each gene in the genome from each ChIP-seq dataset, MARGE uses a semi-supervised learning approach to predict a genome-wide C The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: V 2867 Zhenjia Wang1, Mete Civelek1,2, Clint L. Miller1,2,3,4, Nathan C. Sheffield1,2,3,4, Michael J. Guertin1,4 and Chongzhi Zang1,2,3,4,5,* 2868 Z.Wang et al. cis-regulatory profile for any query gene set. Leveraging over 6000 transcription factor ChIP-seq datasets available in the public domain (Mei et al., 2017), we have developed binding analysis for regulation of transcription (BART), a new method for prediction of functional transcription factors by associating ChIP-seq binding information with MARGE-predicted genomic cis-regulatory regions. 2 Materials and methods A B C D E F Fig. 1. BART workflow. (A) Cis-regulatory profile is generated from query gene set by MARGE or from a ChIP-seq dataset by genomic mapping. Yellow bars indicate UDHS. (B) Each transcription factor binding profile from a ChIP-seq dataset is converted to a binary string showing presence or absence at each UDHS. (C) Top: Each ROC curve represents the prediction performance of a transcription factor profile from B by the query cis-regulatory profile from A; Bottom: Area under the ROC curve (AUC) is calculated for all datasets. (D) AUC are grouped by factor, and Wilcoxon test is performed for each factor compared with all datasets as background. In this example, cumulative distributions show significantly higher AUC for TF_a (red). (E) Wilcoxon test statistic is calculated for each transcription factor from each dataset in the background for Z-score calculation. (F) BART outputs a ranked list of all transcription factors 3 Results and discussion We tested BART on several gene sets derived from differentially expressed genes after activation or inhibition of known transcription factors, including ESR1, AR, NR3C1, PPARG, NOTCH1, and POU5F1 (Wang et al., 2016). In the BART result, the true functional factor was ranked on top (1/454) of the candidates in 4/6 gene sets; and ranked No.2 and No.47 for ESR1 and NR3C1, respectively (Supplementary Fig. S4). The highest ranked factor predicted from NR3C1 target genes is NR2A1, another nuclear receptor. The correct predictions are robust and not affected by randomness in MARGE outputs (Supplementary Fig. S5). These results indicate that BART can successfully predict transcription factors that regulate a given gene set. To evaluate the performance of BART, we compare BART with four other transcription factor prediction tools that take a gene set as query, including the ENCODE ChIP-seq Significance Tool (Auerbach et al., 2013), HOMER (Heinz et al., 2010), iRegulon (Janky et al., 2014) and Pscan (Zambelli et al., 2009) (Supplementary Table S1). On prediction of the true factor from the six gene sets, BART outperforms other methods for five cases, except NR3C1 (Supplementary Table S2). BART can identify transcription factors that regulate any gene set or associate with any genomic profile. BART provides functional interpretations to differential gene expression analysis. BART makes predictions based on direct binding information from public ChIP-seq data only, as an orthogonal approach to conventional DNA sequence motif search. It focuses on transcription factor binding at open chromatin regions in the genome represented by UDHS, most of which are locat (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/bioinformatics/article-pdf/34/16/2867/48917832/bioinformatics_34_16_2867.pdf
Article home page: https://academic.oup.com/bioinformatics/article/34/16/2867/4956015

Wang, Zhenjia, Civelek, Mete, Miller, Clint L, Sheffield, Nathan C, Guertin, Michael J, Zang, Chongzhi. BART: a transcription factor prediction tool with query gene sets or epigenomic profiles, Bioinformatics, 2018, pp. 2867-2869, Volume 34, Issue 16, DOI: 10.1093/bioinformatics/bty194