Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples
Andrew M. Smith
0
1
2
Lawrence E. Heisler
0
6
Robert P. St.Onge
4
5
Eveline Farias-Hesson
3
Iain M. Wallace
0
1
John Bodeau
8
Adam N. Harris
7
Kathleen M. Perry
8
Guri Giaever
0
2
6
Nader Pourmand
3
4
Corey Nislow
0
1
2
0
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street
,
Toronto
, Ontario M5S 3E1
1
Banting and Best Department of Medical Research, University of Toronto, 112 College Street
,
Toronto
, Ontario M5G 1L6
2
Department of Molecular Genetics, University of Toronto, 1 King's College Circle
,
Toronto
, Ontario M5S 1A8
3
Biomolecular Engineering, University of California at Santa Cruz
, Santa Cruz,
CA 95064
4
Stanford Genome Technology Center, Stanford University
, Palo Alto,
CA 94304
5
Department of Biochemistry, Stanford University
,
Stanford, CA 94305
6
Department of Pharmaceutical Sciences, University of Toronto, 144 College Street
,
Toronto
, Ontario M5S 3M2,
Canada
7
Life Technologies Corporation, 5791 Van Allen Way, Carlsbad,
CA 92009, USA
8
Life Technologies Corporation,
850 Lincoln Centre Drive
, Foster City,
CA 94404
*To whom correspondence should be addressed. Tel: +1 416 946 8351; Fax: +1 416 978 8287; Email: Correspondence may also be addressed to Nader Pourmand. Tel: +1 831 502 7315; Fax: +1 831 459 2891; Email: The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
-
Next-generation sequencing has proven an
extremely effective technology for molecular
counting applications where the number of
sequence reads provides a digital readout for
RNA-seq, ChIP-seq, Tn-seq and other applications.
The extremely large number of sequence reads that
can be obtained per run permits the analysis of
increasingly complex samples. For lower complexity
samples, however, a point of diminishing returns is
reached when the number of counts per sequence
results in oversampling with no increase in data
quality. A solution to making next-generation
sequencing as efficient and affordable as possible
involves assaying multiple samples in a single run.
Here, we report the successful 96-plexing of
complex pools of DNA barcoded yeast mutants
and show that such Bar-seq assessment of these
samples is comparable with data provided by
barcode microarrays, the current benchmark for
this application. The cost reduction and increased
throughput permitted by highly multiplexed
sequencing will greatly expand the scope of
chemogenomics assays and, equally importantly,
the approach is suitable for other sequence
counting applications that could benefit from
massive parallelization.
Next-generation sequencing (NGS) technologies can
generate up to several hundred million reads of DNA
sequence per lane or slide, and this capacity continues to
increase at a rapid pace. This massive capacity has allowed
exploration of diverse biological questions (14).
Although pooled chemogenomic screens of compound
gene interactions in yeast (516) and mammalian cells
(17,18) are typically assessed using barcode microarrays,
counting of individual strains could also be assessed by
barcode sequencing. We recently developed such an
assay (Bar-seq) to monitor thousands of genechemical
interactions (19). We now expand upon this
proofof-principle to interrogate 96 samples in parallel,
developing the methodology and analytical tools to use NGS
to simultaneously monitor several hundred thousand
geneenvironment interactions using a method
that should be readily adaptable to an automated
workflow.
Here, we demonstrate successful multiplexing of
samples obtained from 96 distinct pooled yeast growth
assays, with each sample comprising 6200 uniquely
barcoded yeast mutants. This 96-plex experiment
represents a 150-fold increase in unique observations over our
proof-of-principle assessment, and provides substantial
cost reduction/experiment over microarrays.
Furthermore, while many aspects of microarray assay
costs are fixed, the cost of multiplex barcode sequencing
continues to decline as the number of reads per experiment
increases. Indeed, this increase in sequencing rate has
recently been shown to outpace the rate of Moores law
(20). To assess the data quality at this level of
multiplexing, all 96 samples were also assessed by microarray and
we then compared the ability of both platforms to detect
specific compoundgene interactions. It is expected that
the principle of this 96-fold multiplexing application,
with its ability to discriminate many sample types/slide
or flow cell can be applied, with modification, to other
molecular counting methods such as RNA-seq (21),
ChIP-seq (22), promoter assays (23), histone occupancy
(24) and Tn-seq (25). To systematically test highly
multiplexed Bar-seq, we required a large pool of distinct
sequences whose relative abundances could be varied
and whose quantities could also be assessed by an
orthogonal method. The Yeast Knock Out collection of
6200 Saccharomyces cerevisiae mutants, although
designed for testing gene function, provides a suitable
test bed for new sequencing methods (19). Each yeast
deletion mutant contains three salient features: a
dominant drug resistance marker replacing the deleted
gene; two unique 20 base molecular barcodes; and
universal primers that flank each barcode to allow amplification
of all barcodes in a pooled manner using a single set
of primers. Pooled competitive growth assays are typically
carried out on 6200 mutants, and their relative
abundances inferred from the signal from a barcode
microarray (516). The rapid pace of advance in
sequencing depth have led us and others to exploring
diverse strategies for multiplexing of samples for NGS
samples (19,2534).
One essential element for multiplexing prior to
sequencing is the incorporation (in this instance using
modified primers during PCR) of a unique experimental
indexing tag (See Supplementary Figure S1 for structure
of PCR amplicon). Following PCR, the amplified DNA is
purified and quantified, then pooled with amplicons
derived from other samples with different indexing tags.
The pooled PCR products are then purified from a single
lane of a polyacrylamide gel, reducing costs and sample
preparation time. Further, combining samples prior to
purification reduces potential liquid transfer errors,
providing for greater uniformity, and also reducing the
number of emulsion PCRs reactions required prior to
di-nucleotide sequencing on the SOLiD V3 instrument.
In our 20- and 96-plex sequencing runs, two independent
reads were obtained for each feature (Supplementary
Figure S1): the first sequence read was primed from the
P1 adapter sequence, capturing the sequence of the first
common primer (U1) and the yeast barcode. The second
sequencing read, pr (...truncated)