A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pgen.1001289&representation=PDF

A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease

Lange C (2011) A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease. PLoS Genet 7(2): e1001289. doi:10.1371/journal.pgen.1001289 A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease Iuliana Ionita-Laza 0 Joseph D. Buxbaum 0 Nan M. Laird 0 Christoph Lange 0 Suzanne M. Leal, Baylor College of Medicine, United States of America 0 1 Department of Biostatistics, Columbia University , New York , New York, United States of America, 2 Department of Psychiatry, Mount Sinai School of Medicine , New York , New York, United States of America, 3 Department of Biostatistics, Harvard University , Boston , Massachusetts, United States of America, 4 Institute for Genomic Mathematics, University of Bonn , Bonn, Germany , 5 German Center for Neurodegenerative Diseases , Bonn , Germany Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes. - Funding: This work was supported by NIH grants 1R03HG005908, R01MH087590, and R01MH081862. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. . These authors contributed equally to this work. Common diseases such as diabetes, heart disease, schizophrenia, etc., are likely caused by a complex interplay among many genes and environmental factors. At any single disease locus allelic heterogeneity is expected, i.e., there may be multiple, different susceptibility mutations at the locus conferring risk in different individuals [1]. Common and rare variants could both be important contributors to disease risk. Thus far, in a first attempt to find disease susceptibility loci, most research has focused on the discovery of common susceptibility variants. This effort has been helped by the widespread availability of genome-wide arrays providing almost complete genomic coverage for common variants. The genomewide association studies performed so far have led to the discovery of many common variants reproducibly associated with various complex traits, showing that common variants can indeed affect risk to common diseases [2,3]. However, the estimated effect sizes for these variants are small (most odds ratios are below 1:5), with only a small fraction of trait heritability explained by these variants [4]. For example, at least 40 loci have been identified for height, but these loci together explain only 5% of the 80% estimated heritability for this trait [5]. One possible explanation for this missing heritability is that, in addition to common variants, rare variants are also important. Evidence to support a potential role for rare variants in complex traits comes from both empirical and theoretical studies. There is an increasing number of recent studies on obesity, autism, schizophrenia, epilepsy, hypertension, HDL cholesterol, some cancers, Type-1 diabetes etc. [615] that implicate rare variants (both single position variants and structural variants) in these traits. From a theoretical point of view, population genetics theory predicts that most disease loci do not have susceptibility alleles at intermediate frequencies [16,17]. With rapid advances in next-generation sequencing technologies it is becoming increasingly feasible to efficiently sequence large number of individuals genome-wide, allowing for the first time a systematic assessment of the role rare variants may play in influencing risk to complex diseases [1821]. The analysis of the resulting rare genetic variation poses many statistical challenges. Due to the low frequencies of rare disease variants (as low as 0:001, and maybe lower) and the large number of rare variants in the genome, studies with realistic sample sizes will have low power to detect such loci one at a time, the way we have done in order to find common susceptibility variants [5,22]. It is then necessary to perform an overall test for all rare variants in a gene or, more generally a candidate region, under the expectation that cases with disease are different with respect to rare variants compared with control individuals. Several methods along these lines have already been proposed. One of the first statistical methods proposed for Risk to common diseases, such as diabetes, heart disease, etc., is influenced by a complex interaction among genetic and environmental factors. Most of the disease-association studies conducted so far have focused on common variants, widely available on genotyping platforms. However, recent advances in sequencing technologies pave the way for large-scale medical sequencing studies with the goal of elucidating the role rare variants may play in affecting susceptibility to complex traits. The large number of rare variants and their low frequencies pose great challenges for the analysis of these data. We present here a novel testing strategy, based on a weighted-sum statistic, that is less sensitive than existing methods to the presence of both risk and protective variants in the genetic region under investigation. We show applications to simulated data and to a real dataset on Type-1 Diabetes. the analysis of rare variants [23] is based on testing whether the proportion of carriers of rare variants is significantly different between cases and controls. A subsequent paper by Madsen and Browning [24] introduced the concept of weighting variants according to their est (...truncated)