Design and Analysis of Bar-seq Experiments (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.g3journal.org/content/ggg/4/1/11.full.pdf

Design and Analysis of Bar-seq Experiments

INVESTIGATION Design and Analysis of Bar-seq Experiments David G. Robinson,* Wei Chen,† John D. Storey,*,1 and David Gresham‡,1 *Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, †Berlin Institute for Medical Systems Biology, Max-Delbrück-Center for Molecular Medicine, 13125 Berlin, Germany, and ‡Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003 ABSTRACT High-throughput quantitative DNA sequencing enables the parallel phenotyping of pools of thousands of mutants. However, the appropriate analytical methods and experimental design that maximize the efﬁciency of these methods while maintaining statistical power are currently unknown. Here, we have used Bar-seq analysis of the Saccharomyces cerevisiae yeast deletion library to systematically test the effect of experimental design parameters and sequence read depth on experimental results. We present computational methods that efﬁciently and accurately estimate effect sizes and their statistical signiﬁcance by adapting existing methods for RNA-seq analysis. Using simulated variation of experimental designs, we found that biological replicates are critical for statistical analysis of Bar-seq data, whereas technical replicates are of less value. By subsampling sequence reads, we found that when using four-fold biological replication, 6 million reads per condition achieved 96% power to detect a two-fold change (or more) at a 5% false discovery rate. Our guidelines for experimental design and computational analysis enables the study of the yeast deletion collection in up to 30 different conditions in a single sequencing lane. These ﬁndings are relevant to a variety of pooled genetic screening methods that use high-throughput quantitative DNA sequencing, including Tn-seq. Uncovering the connection between genotype and phenotype remains one of the central challenges of modern genetics. At the same time, the rate at which new genomes are sequenced currently outpaces our capacity to functionally annotate those genomes. Addressing these challenges requires efﬁcient means of quantifying phenotypes associated with deﬁned genetic perturbations. Methods for uniquely identifying and quantifying phenotypic effects of mutant alleles in complex mixtures enable the parallel analysis of hundreds to thousands of genotypes. Pooled mutant analysis entails the use of either libraries of deﬁned mutants tagged with unique DNA sequences (molecular barcodes) (Winzeler et al. 1999; Giaever et al. 2002) or complex libraries of tens of thousands of unique mutants generated by random insertional mutagenesis. Analogously, comprehensive libraries of short Copyright © 2014 Robinson et al. doi: 10.1534/g3.113.008565 Manuscript received September 16, 2013; accepted for publication October 20, 2013; published Early Online November 5, 2013. This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/ by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supporting information is available online at http://www.g3journal.org/lookup/ suppl/doi:10.1534/g3.113.008565/-/DC1 1 Corresponding authors: Carl Icahn Labs, Princeton University, Princeton, NJ 08544. E-mail: ; 12 Waverly Place, Room 203, New York University, New York, NY 10003. E-mail: KEYWORDS yeast Bar-seq galactose functional genomics Sacchromyces cerevisiae hairpin RNAs (shRNAs) enable parallel analysis of perturbations of mammalian genes in cell culture (Schlabach et al. 2008; Silva et al. 2008; Sims et al. 2011). Recently, methods for estimating mutant abundances in complex mixtures have been introduced that capitalize on advances in highthroughput quantitative DNA sequencing. Barcode analysis by sequencing (Bar-seq) was ﬁrst developed to analyze libraries of thousands of Saccharomyces cerevisiae gene deletion mutants (Smith et al. 2009) and has subsequently been used to analyze a library of deletion mutants in Schizzosaccharomyces pombe (Han et al. 2010). The use of Bar-seq enables efﬁcient, accurate, and comprehensive genetic screens for addressing a variety of questions, such as deﬁning the genetic requirements for initiation and maintenance of cell quiescence in response to distinct starvation signals (Gresham et al. 2011). In organisms for which barcoded mutant libraries are not available, high-throughput DNA sequencing of pools of transposon insertion mutants (Tn-seq) enables multiplexed mutant analysis. Tn-seq was initially applied in studies of Streptococcus pneumonia (van Opijnen et al. 2009) and Haemophilus inﬂuenzae (Gawronski et al. 2009) and has subsequently been adapted for use in diverse organisms (Brutinel and Gralnick 2012; Gallagher et al. 2011). Similarly, PhiTSeq facilitates simultaneous analysis of thousands of transposon-mutagenized haploid human cells (Carette et al. 2011). The widespread adoption of pooled mutant screens using high-throughput quantitative DNA sequencing attests to the power of these methods for efﬁcient genetic analysis. Volume 4 | January 2014 | 11 In contrast to the rapid technological advances in pooled mutant analysis, there has not yet been a statistical treatment of the experimental design and analysis of data generated by high-throughput DNA sequence analysis of these complex libraries. Thus, major methodological and analytical questions remain unanswered. What is the appropriate statistical framework for analyzing DNA sequence count data? What are the sources of variation? What is the appropriate study design for maximizing the power and accuracy to detect differences in mutant abundances? What sequence read depth maximizes the precision of these methods while minimizing the cost and resources required? We undertook a study that aimed to address these questions with the goal of providing guidance for the design and analysis of pooled mutant screens using high-throughput DNA sequencing. Using experimental analysis of the S. cerevisiae gene deletion collection in two different conditions, we studied the contribution of treatment and biological and technical variation to Bar-seq data (Figure 1). We demonstrated that the negative binomial models used to analyze RNA-seq data are also directly applicable to Bar-seq data. Using computational subsampling of our experimental data, we studied the effect of different experimental designs on the results from Bar-seq analysis. We found that biological replicates substantially improved statistical power, whereas technical replicates provided only moderate additional statistical power. We also found that increasing sequencing depth beyond 6 million reads per condition provided limited improvement in the experimental results, regardless of experimental design. Our results provide information directly relevant to designing future hig (...truncated)