A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis

BMC Genomics, Jul 2008

Background The difficulty in elucidating the genetic basis of complex diseases roots in the many factors that can affect the development of a disease. Some of these genetic effects may interact in complex ways, proving undetectable by current single-locus methodology. Results We have developed an analysis tool called Hypothesis Free Clinical Cloning (HFCC) to search for genome-wide epistasis in a case-control design. HFCC combines a relatively fast computing algorithm for genome-wide epistasis detection, with the flexibility to test a variety of different epistatic models in multi-locus combinations. HFCC has good power to detect multi-locus interactions simulated under a variety of genetic models and noise conditions. Most importantly, HFCC can accomplish exhaustive genome-wide epistasis search with large datasets as demonstrated with a 400,000 SNP set typed on a cohort of Parkinson's disease patients and controls. Conclusion With the current availability of genetic studies with large numbers of individuals and genetic markers, HFCC can have a great impact in the identification of epistatic effects that escape the standard single-locus association analyses.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-9-360.pdf

A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis

Javier Gayn 0 1 Antonio Gonzlez-Prez 1 Fernando Bermudo 1 Mara Eugenia Sez 1 Jose Luis Royo 1 Antonio Quintas 1 Jose Jorge Galan 1 Francisco Jess Morn 1 Reposo Ramirez-Lorca 1 Luis Miguel Real 1 Agustn Ruiz 1 0 Wellcome Trust Centre for Human Genetics, University of Oxford , Oxford OX3 7BN , UK 1 Neocodex , Avda. Charles Darwin 6, Acc. A, 41092 Sevilla , Spain Background: The difficulty in elucidating the genetic basis of complex diseases roots in the many factors that can affect the development of a disease. Some of these genetic effects may interact in complex ways, proving undetectable by current single-locus methodology. Results: We have developed an analysis tool called Hypothesis Free Clinical Cloning (HFCC) to search for genome-wide epistasis in a case-control design. HFCC combines a relatively fast computing algorithm for genome-wide epistasis detection, with the flexibility to test a variety of different epistatic models in multi-locus combinations. HFCC has good power to detect multi-locus interactions simulated under a variety of genetic models and noise conditions. Most importantly, HFCC can accomplish exhaustive genome-wide epistasis search with large datasets as demonstrated with a 400,000 SNP set typed on a cohort of Parkinson's disease patients and controls. Conclusion: With the current availability of genetic studies with large numbers of individuals and genetic markers, HFCC can have a great impact in the identification of epistatic effects that escape the standard single-locus association analyses. - Background Most common diseases have an etiology so complex that years of research have yielded scarce results towards the elucidation of their causes. Technology and methodology are improving quickly but results have been arriving slowly. Nonetheless, optimism is in the air, because large studies of many individuals and genetic markers seem to finally be revealing some of the genetic factors behind these common diseases [1]. The difficulty of elucidating the genetic basis of complex diseases roots in the many factors that can affect the development of a disease. Many factors, both genetic and environmental, each with possibly only a small effect, may be necessary for the expression of a particular disease phenotype. For example, most associations reported in the recent wave of genome-wide association studies of different common diseases exhibited small (1.11.4) to moderate (1.52) odds ratios [2]. These small effects may only be detectable by means of genetic association analysis in very large samples, or in smaller sub-samples in which, by sample selection, this effect is enlarged: a sub-sample where the allele frequency of a particular risk gene is increased; or a sub-sample where a combination of other alleles or environmental factors act to increase the observable effect of a particular gene [3]. Many genes may contribute to the expression of complex diseases. It is quite reasonable to expect that the effects of some of these genes do not sum up in a simple fashion. Epistasis generally refers to an interaction between the effects of genes at different loci, although the term has been used in different contexts by different disciplines [4]. Some of these genetic effects may interact among them, such that the presence of two or more particular genes may increase the risk to a disease more than expected from their independent effects, the expectation being derived from a pre-defined model, such as additive or multiplicative. For example, the odds ratio for an epistatic effect of two genes may be larger, even much larger, than the combined effect (sum or product) of each of the two single genes [5,6]. Moreover, there are biological models of epistasis where genes only have epistatic effects [7], such as a two-locus mutation masking a known phenotype. Some of these genetic effects may prove undetectable by current single-locus methodology [8]. There have been some early attempts to search for epistatic effects [5,9-11], but there is currently a need for methods to study this important genetic phenomenon, perhaps key for complex diseases. A wealth of current research in molecular genetics has discovered millions of genetic markers which provide a good coverage of common genetic variation across the entire human genome [12]. At the same time, advances in genotyping technology have greatly increased the quantity and quality of genotypes. Current genotyping platforms can generate millions of genotypes in short periods of time. These events have made possible the genetic association analysis of a trait across the entire genome. Although the arrival of genome-wide association testing is great news for the genetic dissection of complex traits, the large number of statistical tests involved raises the issue of statistical significance. For example, to maintain a Type I error of 5 percent when testing 100,000 markers for genetic association may require a test-statistic with a probability value of 5 10-7, if a Bonferroni correction is applied. Nonetheless, many of these markers are correlated so this correction would be too strict, but in any case the required p-value would be very small. This problem of multiple testing is even more extreme for the test of epistasis. For example, for 100,000 markers, there are a total of 5 10+9 two-locus combinations, which would require a Bonferroni-corrected p-value of 1 10-11 for a genome-wide significance level of 0.05, which again would be overly conservative due to the correlated nature of many of these tests. To achieve these significance levels it is necessary to study large samples and expect to find large epistatic effects. Replication of findings in independent samples is sought for growing confidence in statistical results. The lack of replication of association results may be due to different causes, some technical (low power due to small samples, bad quality of phenotypic or genotypic data, uncorrected noise or covariates) and some biological (heterogeneity of effects or population-specific risks). An approach to tackle the multiple testing issue is to divide the available sample into independent groups and to carry out the analysis in these independent groups to look for consistent results across the groups. Some true genetic effects will be missed due to lack of power (due to the reduced sample in each group) and to heterogeneity, but this approach may allow the identification of moderate/large-sized epistatic effects that are frequent and consistent. In this scenario, we have developed an analysis tool to search for genome-wide epistasis in a case-control design. Hypothesis Free Clinical Cloning (HFCC) is an standalone software which allows for single-locus genetic association testing, as well as epistasis testing for multilocus combinations of markers. Due to the intense computational burden, it is programmed to take advantage of computer clusters by dividing the tasks into processes which ca (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2164-9-360.pdf
Article home page: http://www.biomedcentral.com/1471-2164/9/360

Javier Gayán, Antonio González-Pérez, Fernando Bermudo, María Sáez, Jose Royo, Antonio Quintas, Jose Galan, Francisco Morón, Reposo Ramirez-Lorca, Luis Real, Agustín Ruiz. A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis, BMC Genomics, 2008, pp. 360, 9, DOI: 10.1186/1471-2164-9-360