Using gene expression to investigate the genetic basis of complex disorders
Alexandra C. Nica
0
Emmanouil T. Dermitzakis
0
0
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus
,
Cambridge CB10 1HH
,
UK
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review, we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
-
The ability of genome-wide association studies (GWAS) to
help understand the genetic basis of complex disorders
has recently become apparent. Well-documented common
human genetic variation maps (e.g. HapMap project) (1),
large patient samples with accurately recorded phenotypic
information as well as appropriate statistical methods to
assess significance (2) and account for potential biases,
have all contributed to the current outburst of successful
GWAS. Numerous susceptibility variants for a large number
of complex diseases have been reported and effectively
replicated. A present catalog of published GWAS (http://
www.genome.gov/26525384) includes single nucleotide
polymorphisms (SNPs) not only associated with major common
disorders [Crohns disease (3), type 2 diabetes (4), lung
cancer (5) etc.] but also with disease-relevant or
anthropomorphic quantitative traits [e.g. body mass index (6) or
height (7)].
What has not kept the pace however with the capacity
to design and perform successful GWAS is our ability to
understand how variants discovered via this
hypothesisfree approach influence complex traits, In fact, few of the
association studies go beyond reporting the most statistically
significant hits and if they do, the suggested functionality is
typically speculative, based on available annotation of genes
in the vicinity of the variants. Since many of the discovered
susceptibility polymorphisms fall in non-coding regions and
with an increasing number of regulatory variants already
implicated in a series of common disorders (8), one
conventional approach has been to interrogate disease associated
SNPs for associations with differential gene expression.
Moffatt et al. (9) found that the same most significant SNPs
associated with childhood asthma risk also explain 29.5%
of the variance in ORMDL3 transcript levels, measured in
lymphoblastoid cell lines. While an interesting observation,
this still cannot be regarded as convincing evidence for a
causal relationship between ORMDL3 and asthma onset.
The concurrent progress towards uncovering the genetic
basis of regulatory variation (10) has revealed an abundance
of expression quantitative trait loci (eQTLs) in the human
genome, making an accidental overlap between these and
disease signals very likely. Thus, while gene expression is a
very informative and immediate DNA phenotype, integrating
expression data and disease studies genetics for an ultimate
understanding of disease etiology is not straightforward.
ADVANCES AND CURRENT ISSUES IN
EXPRESSION AND DISEASE STUDIES
Power of current eQTL studies
Natural variation in human gene expression has been recently
quantified on a genome-wide scale using microarray
technologies. Linkage and association studies coupling expression
with genetic variability data have started to reveal the genetics
underlying part of this variation, including complex
allele-specific interactions (11) and its relatively high level
of heritability (12 15). Most of the variants discovered with
these approaches (a field also called Genetical Genomics)
explain variance in transcript levels of nearby genes (so
called cis eQTLs) but a few distal acting regulators have
also been reported (trans associations). The sample sizes of
genome-wide expression association studies have been fairly
small though, meaning that the discoveries made so far
represent generally large genetic effects [Stranger et al. (12)
report an R2 coefficient of determination ranging from 0.27
to almost 1 for the SNP gene associations detected in the
270 HapMap individuals]. The magnitude of the discovered
effects drops when pooling populations together with
appropriate corrections, a direct consequence of the increased
statistical power due to the larger sample sizes. The
importance of appropriate statistical power has been extensively
demonstrated in complex disease GWAS, where samples of
a few thousand paired cases and controls have become a
prerequisite (16). The main reason for this requirement is the fact
that the individual contribution of genetic variants towards
complex trait determination is known to be small. In fact, all
susceptibility alleles discovered so far explain only a small fraction
of disease risk, with odds ratios typically in the range of 1.21.5
(16,17). Given the marked difference between the magnitudes of
detected genetic effects on expression variation and disease
predisposition, respectively, it is not surprising that only few
instances of overlapping signals have been observed, even
when expression in a disease relevant tissue was considered.
Small genetic effects on expression variation or complex
interactions between regulatory variants with moderate or large
effects could become decisive on a permissive environmental
background. Current expression analyses are underpowered
with respect to these kinds of discoveries; hence whole-genome
expression association studies on larger samples would be very
desirable. Such efforts are on the way, including the
quantification of expression levels in blood cells of 820 HapMap III
individuals from eight populations (Barbara Stranger, Stephen
Montgomery and Emmanouil Dermitzakis, personal
communication). Combined with SNP genotyping data, this resource
will give insight into the level of expression differences
among populations and generate many additional eQTLs with
more subtle effects, some of them potentially related to disease.
Tissue-specific phenotypes
Confined by the availability of human tissue samples,
expression experiments have been initially performed in
lympho (...truncated)