Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers

PLoS Genetics, Feb 2009

Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10−13, 10−13, and 10−3, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.

Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers

Weeks DE (2009) Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genet 5(2): e1000337. doi:10.1371/journal.pgen.1000337 Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers Johanna Jakobsdottir 0 1 2 Michael B. Gorin 0 1 2 Yvette P. Conley 0 1 2 Robert E. Ferrell 0 1 2 Daniel E. Weeks 0 1 2 Goncalo R. Abecasis, University of Michigan, United Stated of America 0 Competing Interests: The authors are listed as the inventors in a patent filed by the University of Pittsburgh for the LOC387715/ARMS2 locus 1 Funding: This work was supported by NEI grant R01EY009859, The Steinbach Foundation, New York, Research to Prevent Blindness , New York , The Eye and Ear Foundation of Pittsburgh, the American Health Assistance Foundation , Clarks- burg, Maryland , and the Jules Stein Eye Institute , Los Angeles , California (all to MBG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript 2 1 Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 2 Department of Ophthalmology and Jules Stein Eye Institute, The David Geffen School of Medicine, University of California Los Angeles , Los Angeles , California, United States of America, 3 Department of Health Promotion and Development, School of Nursing, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 4 Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh , Pittsburgh, Pennsylvania , United States of America Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our agerelated macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10213, 10213, and 1023, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time. - Recent successes in the discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. A number of companies now offer, for relatively modest fees, personalized genomics services that provide individualized disease-risk estimates based on genomewide SNP genotyping. Most companies offering such profiling make it clear that they are not a clinical service and that their calculations are not intended for diagnostic or prognostic purposes. They typically advise their clients to consult their health care provider for more information. In most cases, people would turn to their general physician [1]. However, as noted by others [2,3], few doctors currently have enough genetics training to actually make sense of the risk calculations now commercially offered. Many physicians seem to feel the same way. In surveys in five European countries, physicians ranked the disciplines in which they felt they needed more training to overcome future challenges [4,5]. In all countries, the top ranked area was genetics of common disease, and ranked second was approaching genetic risk assessment in clinical practice. Not only are risk results likely to be often poorly understood by the tested individuals and their physicians, but also these results are often based on risk models, such as logistic regression models, that may not be good classification models [6]. Therefore, the disclaimer made by the companies that their services are not intended as medical advice cannot be overemphasized. Current knowledge of the role of most genes in complex diseases is at the group level of correlations of disease status with SNPs. Most of these SNPs were discovered via genetic association studies aimed at finding variants correlated with disease risk. It is hoped that these discoveries will provide insights into the pathogenesis and etiology, and ultimately lead to developments of new treatments or preventive therapies. Assuming these SNPs will also be effective classifiers, they are now being used in individual-level risk estimation, classification, and clinical decision-making. However, for many complex diseases, such as the ones discussed here (agerelated macular degeneration [AMD], type II diabetes, inflammatory bowel disease [Crohns disease], and cardiovascular disease), it is arguable whether the era of genomics in personalized medicine is here yet. In this article, we discuss and explore how useful highly associated SNPs might be for individual-level risk estimation and prediction. Our focus will be on the classification accuracy of genetic testing, with an emphasis on two popular statistical methods for evaluating biomarkers. We give realistic real-data examples that illustrate that, currently, the genetic information is of limited value for personalized medicine. We also discuss and apply risk-based and classification-based analysis approaches to our AMD data. Two Statistical Methods There are two basic statistical approaches for evaluating markers. The risk-based approach models the risk as a function of marker(s), often with adjustment for covariates, and is commonly appl (...truncated)


This is a preview of a remote PDF: http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pgen.1000337&representation=PDF
Article home page: http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000337

Johanna Jakobsdottir, Michael B. Gorin, Yvette P. Conley, Robert E. Ferrell, Daniel E. Weeks. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers, PLoS Genetics, 2009, Volume 5, Issue 2, DOI: 10.1371/journal.pgen.1000337