A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease
Lange C (2011) A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on
Disease. PLoS Genet 7(2): e1001289. doi:10.1371/journal.pgen.1001289
A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease
Iuliana Ionita-Laza 0
Joseph D. Buxbaum 0
Nan M. Laird 0
Christoph Lange 0
Suzanne M. Leal, Baylor College of Medicine, United States of America
0 1 Department of Biostatistics, Columbia University , New York , New York, United States of America, 2 Department of Psychiatry, Mount Sinai School of Medicine , New York , New York, United States of America, 3 Department of Biostatistics, Harvard University , Boston , Massachusetts, United States of America, 4 Institute for Genomic Mathematics, University of Bonn , Bonn, Germany , 5 German Center for Neurodegenerative Diseases , Bonn , Germany
Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes.
-
Funding: This work was supported by NIH grants 1R03HG005908, R01MH087590, and R01MH081862. The funders had no role in study design, data collection
and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
. These authors contributed equally to this work.
Common diseases such as diabetes, heart disease,
schizophrenia, etc., are likely caused by a complex interplay among many
genes and environmental factors. At any single disease locus allelic
heterogeneity is expected, i.e., there may be multiple, different
susceptibility mutations at the locus conferring risk in different
individuals [1].
Common and rare variants could both be important
contributors to disease risk. Thus far, in a first attempt to find disease
susceptibility loci, most research has focused on the discovery of
common susceptibility variants. This effort has been helped by the
widespread availability of genome-wide arrays providing almost
complete genomic coverage for common variants. The
genomewide association studies performed so far have led to the discovery
of many common variants reproducibly associated with various
complex traits, showing that common variants can indeed affect
risk to common diseases [2,3]. However, the estimated effect sizes
for these variants are small (most odds ratios are below 1:5), with
only a small fraction of trait heritability explained by these variants
[4]. For example, at least 40 loci have been identified for height,
but these loci together explain only 5% of the 80% estimated
heritability for this trait [5]. One possible explanation for this
missing heritability is that, in addition to common variants, rare
variants are also important.
Evidence to support a potential role for rare variants in complex
traits comes from both empirical and theoretical studies. There is
an increasing number of recent studies on obesity, autism,
schizophrenia, epilepsy, hypertension, HDL cholesterol, some
cancers, Type-1 diabetes etc. [615] that implicate rare variants
(both single position variants and structural variants) in these traits.
From a theoretical point of view, population genetics theory
predicts that most disease loci do not have susceptibility alleles at
intermediate frequencies [16,17].
With rapid advances in next-generation sequencing
technologies it is becoming increasingly feasible to efficiently sequence large
number of individuals genome-wide, allowing for the first time a
systematic assessment of the role rare variants may play in
influencing risk to complex diseases [1821]. The analysis of the
resulting rare genetic variation poses many statistical challenges.
Due to the low frequencies of rare disease variants (as low as 0:001,
and maybe lower) and the large number of rare variants in the
genome, studies with realistic sample sizes will have low power to
detect such loci one at a time, the way we have done in order to
find common susceptibility variants [5,22]. It is then necessary to
perform an overall test for all rare variants in a gene or, more
generally a candidate region, under the expectation that cases with
disease are different with respect to rare variants compared with
control individuals. Several methods along these lines have already
been proposed. One of the first statistical methods proposed for
Risk to common diseases, such as diabetes, heart disease,
etc., is influenced by a complex interaction among genetic
and environmental factors. Most of the disease-association
studies conducted so far have focused on common
variants, widely available on genotyping platforms.
However, recent advances in sequencing technologies pave the
way for large-scale medical sequencing studies with the
goal of elucidating the role rare variants may play in
affecting susceptibility to complex traits. The large number
of rare variants and their low frequencies pose great
challenges for the analysis of these data. We present here a
novel testing strategy, based on a weighted-sum statistic,
that is less sensitive than existing methods to the presence
of both risk and protective variants in the genetic region
under investigation. We show applications to simulated
data and to a real dataset on Type-1 Diabetes.
the analysis of rare variants [23] is based on testing whether the
proportion of carriers of rare variants is significantly different
between cases and controls. A subsequent paper by Madsen and
Browning [24] introduced the concept of weighting variants
according to their est (...truncated)