SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis
Ste phanie Le Hellard
2
Ste phane J. Ballereau
2
Peter M. Visscher
1
Helen S. Torrance
2
Jeni Pinson
2
Stewart W. Morris
2
Marian L. Thomson
2
Colin A. M. Semple
0
2
Walter J. Muir
3
Douglas H. R. Blackwood
3
David J. Porteous
2
Kathryn L. Evans
2
0
MRC Human Genetics Unit, Western General Hospital
, Crewe Road, Edinburgh EH4 2XU,
UK
1
Institute of Cell, Animal and Population Biology, University of Edinburgh
, West Mains Road, Edinburgh EH9 3JT,
UK
2
Medical Genetics Section, Molecular Medicine Centre, University of Edinburgh, Western General Hospital
, Crewe Road, Edinburgh EH4 2XU,
UK
3
Department of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital
, Edinburgh EH10 5HF,
UK
We have compared the accuracy, efficiency and robustness of three methods of genotyping single nucleotide polymorphisms on pooled DNAs. We conclude that (i) the frequencies of the two alleles in pools should be corrected with a factor for unequal allelic amplification, which should be estimated from the mean ratio of a set of heterozygotes (k); (ii) the repeatability of an assay is more important than pinpoint accuracy when estimating allele frequencies, and assays should therefore be optimised to increase the repeatability; and (iii) the size of a pool has a relatively small effect on the accuracy of allele frequency estimation. We therefore recommend that large pools are genotyped and replicated a minimum of four times. In addition, we describe statistical approaches to allow rigorous comparison of DNA pool results. Finally, we describe an extension to our ACeDB database that facilitates management and analysis of the data generated by association studies.
-
Single nucleotide polymorphisms (SNPs) are the most
common type of polymorphism in the human genome, with
an approximate frequency of one every kilobase (1). These
biallelic variants are relatively easy to genotype compared
with VNTRs and microsatellites. For these reasons SNPs are
thought to have a promising future in a wide range of human
genetics applications including pharmacogenomics, the study
of population evolution, analysis of forensic samples and the
identification of susceptibility genes involved in complex
diseases. Hence, a large proportion of the effort of genome
centres is now focused on the identification and the mapping
of a large collection of SNPs: to date about 1 260 000 have
been mapped onto the human draft sequence (http://snp.
cshl.org/).
The study of complex common diseases and quantitative
traits is confounded by the effects of disease heterogeneity,
genegene and geneenvironment interactions. This means
that large numbers of SNPs must be surveyed in large numbers
of individuals in order to detect single gene variants with a
small to moderate effect size (2,3). The use of pooled samples,
comprised of equal amounts of genomic DNA from up to 1000
individuals, has been proposed as a means of reducing the
number of genotyping reactions required. The method used to
genotype SNPs in pooled DNAs must provide accurate
estimates of allele frequencies, and must be time and cost
effective. The spectra of methods currently available for
genotyping SNPs in individual samples [for an extensive
review of SNP genotyping methods see Syvanen (4)] can be
divided into three classes. First, methods such as SSCP or
dHPLC that are based on the physicalchemical properties of
the alleles. Secondly, methods such as TAQMAN (Applied
Biosystems); oligo-ligation assay; Invader assay (Third
Wave Technologies Inc.); and allele-specific amplification
and padlock probes that are based on hybridisation,
amplification or ligation of an allele-specific probe. Thirdly, methods
based on allele-specific extension or minisequencing from a
primer adjacent to the site of the SNP such as SNaPshot
(Applied Biosystems); primer extension read by dHPLC or
by mass spectrometry; primer extension performed on
microarrays; fluorescence polarisation; bioluminometric
assay coupled with modified primer extension reactions
(BAMPER) and Pyrosequencing (Pyrosequencing).
Previous studies have shown that allelic frequencies can be
accurately estimated from pools using primer extension
followed by dHPLC (5); TAQMAN and RFLP analysis
(6); allele-specific amplification with real-time PCR (7); SSCP
(8); BAMPER (9) and MassARRAY (10). In common with
many other groups, we wish to screen a large candidate region
for evidence of genetic association. The preferred strategy is to
assay small numbers of pooled DNA samples with large
numbers of SNPs. Consequently, methods such as
Pyrosequencing , TAQMAN or BAMPER that use modified
primers are too expensive. Methods based on hybridisation or
on physicalchemical properties are ruled out as each assay
must be optimised. We therefore chose to compare the
robustness, accuracy and cost of three methods based on
minisequencing: SNaPshot (Applied Biosystems) and
primer extension followed either by dHPLC, or mass
spectrometry (MassARRAY system by Sequenom).
We have a (...truncated)