ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
BMC Genomics
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
Joshua WK Ho 0 2 3
Eric Bishop 1 2
Peter V Karchenko 0 2 3 4
Nicolas Ngre 5
Kevin P White 5
Peter J Park 0 2 3 4
0 Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School , Boston, MA , USA
1 Program in Bioinformatics, Boston University , Boston, MA , USA
2 Center for Biomedical Informatics, Harvard Medical School , Boston, MA , USA
3 Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School , Boston, MA , USA
4 Informatics Program, Children's Hospital , Boston, MA , USA
5 Institute for Genomics and Systems Biology, University of Chicago , Chicago, IL , USA
Background: Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or highthroughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results: Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions: Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
-
Background
Chromatin immunoprecipitation (ChIP) followed by
genomic tiling microarray hybridization (ChIP-chip) or
massively parallel sequencing (ChIP-seq) are two of the
most widely used approaches for genome-wide
identification and characterization of in vivo protein-DNA
interactions. They can be used to analyze many
important DNA-interacting proteins including RNA
polymerases, transcription factors, transcriptional co-factors,
and histone proteins [1]. Indeed these genome-wide
ChIP analysis approaches have led to many important
discoveries related to transcriptional regulation [2-4],
epigenetic regulation through histone modification [5],
nucleosome organization [6,7], and interindividual
variation in protein-DNA interactions [8,9].
ChIP-chip first appeared in the literature about
10 years ago and was one of the earliest approaches to
performing genome-wide mapping of protein-DNA
interactions in organisms with small genomes, such as
yeast [2,10]. Currently, various tiling microarray
platforms of common model organisms are well supported
by commercial vendors, and many bioinformatics tools
have been developed for ChIP-chip analysis [11-14].
Fueled by rapid development of the second generation
high-throughput sequencing technologies in the past
few years, ChIP-seq has emerged as an attractive
alternative to ChIP-chip [1]. For instance, ChIP-seq generally
produces profiles with higher spatial resolution, dynamic
range, and genomic coverage, allowing it to have higher
sensitivity and specificity over ChIP-chip in terms of
protein binding site identification. Further, ChIP-seq can
be used to analyze virtually any species with a
sequenced genome since it is not constrained by the
availability of an organism-specific microarray. Many
current ChIP-seq protocols can work with a smaller
amount of initial material compared to ChIP-chip
[15,16]. Moreover, ChIP-seq is already a more
costeffective way of analyzing mammalian genomes, and the
cost effectiveness will likely become more apparent as
the cost of high-throughput sequencing technology
continues to drop. These factors have led to the rapid
adoption of ChIP-seq technology.
However, despite the widespread use of both
ChIPchip and ChIP-seq, only a few small-scale studies have
attempted to quantitatively compare these technologies
using real data. Euskirchen et al. [17] compared the
STAT1 bin (...truncated)