A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis
Javier Gayn
0
1
Antonio Gonzlez-Prez
1
Fernando Bermudo
1
Mara Eugenia Sez
1
Jose Luis Royo
1
Antonio Quintas
1
Jose Jorge Galan
1
Francisco Jess Morn
1
Reposo Ramirez-Lorca
1
Luis Miguel Real
1
Agustn Ruiz
1
0
Wellcome Trust Centre for Human Genetics, University of Oxford
,
Oxford OX3 7BN
,
UK
1
Neocodex
,
Avda. Charles Darwin 6, Acc. A, 41092 Sevilla
,
Spain
Background: The difficulty in elucidating the genetic basis of complex diseases roots in the many factors that can affect the development of a disease. Some of these genetic effects may interact in complex ways, proving undetectable by current single-locus methodology. Results: We have developed an analysis tool called Hypothesis Free Clinical Cloning (HFCC) to search for genome-wide epistasis in a case-control design. HFCC combines a relatively fast computing algorithm for genome-wide epistasis detection, with the flexibility to test a variety of different epistatic models in multi-locus combinations. HFCC has good power to detect multi-locus interactions simulated under a variety of genetic models and noise conditions. Most importantly, HFCC can accomplish exhaustive genome-wide epistasis search with large datasets as demonstrated with a 400,000 SNP set typed on a cohort of Parkinson's disease patients and controls. Conclusion: With the current availability of genetic studies with large numbers of individuals and genetic markers, HFCC can have a great impact in the identification of epistatic effects that escape the standard single-locus association analyses.
-
Background
Most common diseases have an etiology so complex that
years of research have yielded scarce results towards the
elucidation of their causes. Technology and methodology
are improving quickly but results have been arriving
slowly. Nonetheless, optimism is in the air, because large
studies of many individuals and genetic markers seem to
finally be revealing some of the genetic factors behind
these common diseases [1].
The difficulty of elucidating the genetic basis of complex
diseases roots in the many factors that can affect the
development of a disease. Many factors, both genetic and
environmental, each with possibly only a small effect, may be
necessary for the expression of a particular disease
phenotype. For example, most associations reported in the
recent wave of genome-wide association studies of
different common diseases exhibited small (1.11.4) to
moderate (1.52) odds ratios [2].
These small effects may only be detectable by means of
genetic association analysis in very large samples, or in
smaller sub-samples in which, by sample selection, this
effect is enlarged: a sub-sample where the allele frequency
of a particular risk gene is increased; or a sub-sample
where a combination of other alleles or environmental
factors act to increase the observable effect of a particular
gene [3].
Many genes may contribute to the expression of complex
diseases. It is quite reasonable to expect that the effects of
some of these genes do not sum up in a simple fashion.
Epistasis generally refers to an interaction between the
effects of genes at different loci, although the term has
been used in different contexts by different disciplines [4].
Some of these genetic effects may interact among them,
such that the presence of two or more particular genes
may increase the risk to a disease more than expected from
their independent effects, the expectation being derived
from a pre-defined model, such as additive or
multiplicative. For example, the odds ratio for an epistatic effect of
two genes may be larger, even much larger, than the
combined effect (sum or product) of each of the two single
genes [5,6]. Moreover, there are biological models of
epistasis where genes only have epistatic effects [7], such
as a two-locus mutation masking a known phenotype.
Some of these genetic effects may prove undetectable by
current single-locus methodology [8]. There have been
some early attempts to search for epistatic effects [5,9-11],
but there is currently a need for methods to study this
important genetic phenomenon, perhaps key for complex
diseases.
A wealth of current research in molecular genetics has
discovered millions of genetic markers which provide a good
coverage of common genetic variation across the entire
human genome [12]. At the same time, advances in
genotyping technology have greatly increased the quantity and
quality of genotypes. Current genotyping platforms can
generate millions of genotypes in short periods of time.
These events have made possible the genetic association
analysis of a trait across the entire genome.
Although the arrival of genome-wide association testing is
great news for the genetic dissection of complex traits, the
large number of statistical tests involved raises the issue of
statistical significance. For example, to maintain a Type I
error of 5 percent when testing 100,000 markers for
genetic association may require a test-statistic with a
probability value of 5 10-7, if a Bonferroni correction is
applied. Nonetheless, many of these markers are
correlated so this correction would be too strict, but in any case
the required p-value would be very small.
This problem of multiple testing is even more extreme for
the test of epistasis. For example, for 100,000 markers,
there are a total of 5 10+9 two-locus combinations,
which would require a Bonferroni-corrected p-value of 1
10-11 for a genome-wide significance level of 0.05, which
again would be overly conservative due to the correlated
nature of many of these tests. To achieve these significance
levels it is necessary to study large samples and expect to
find large epistatic effects.
Replication of findings in independent samples is sought
for growing confidence in statistical results. The lack of
replication of association results may be due to different
causes, some technical (low power due to small samples,
bad quality of phenotypic or genotypic data, uncorrected
noise or covariates) and some biological (heterogeneity of
effects or population-specific risks). An approach to tackle
the multiple testing issue is to divide the available sample
into independent groups and to carry out the analysis in
these independent groups to look for consistent results
across the groups. Some true genetic effects will be missed
due to lack of power (due to the reduced sample in each
group) and to heterogeneity, but this approach may allow
the identification of moderate/large-sized epistatic effects
that are frequent and consistent.
In this scenario, we have developed an analysis tool to
search for genome-wide epistasis in a case-control design.
Hypothesis Free Clinical Cloning (HFCC) is an
standalone software which allows for single-locus genetic
association testing, as well as epistasis testing for
multilocus combinations of markers. Due to the intense
computational burden, it is programmed to take advantage of
computer clusters by dividing the tasks into processes
which ca (...truncated)