Pathway-based analysis using reduced gene subsets in genome-wide association studies
BMC Bioinformatics
Pathway-based analysis using reduced gene subsets in genome-wide association studies
Jingyuan Zhao 0
Simone Gupta 2
Mark Seielstad 1
Jianjun Liu 0
Anbupalam Thalamuthu 0
0 Human Genetics , 60 Biopolis Street 02-01 , Genome Institute of Singapore , 138672 , Singapore
1 Institute for Human Genetics , 513 Parnassus Avenue , University of California , San Francisco , CA 94143 and Blood Systems Research Institute , 270 Masonic Avenue, San Francisco, CA 94118 , USA
2 McKusick - Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University , 733 N. Broadway St. Baltimore, MD 21205 , USA
Background: Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study. Results: We applied this new method to a published GWAS of psoriasis and identified 6 biologically plausible pathways, after adjustment for multiple testing. The pathways identified in our analysis overlap with those reported in previous studies. Further, using simulations across a range of gene numbers and effect sizes, we demonstrate that the proposed approach enjoys higher power than several other approaches to detect associated pathways. Conclusions: The proposed method could increase the power to discover susceptibility pathways and to identify associated genes using GWAS. In our analysis of genome-wide psoriasis data, we have identified a number of relevant pathways for psoriasis.
-
Background
Genetic association studies aim to detect associations
between disease phenotypes and genetic variants. A
commonly used tool to establish association between a
SNP and a disease is to perform statistical tests of
association for each individual SNP marker. A multiple
testing correction can then be applied to control the overall
type I error. However, such an approach typically
captures only a small proportion of the contributing genetic
variants. One likely reason is that common and complex
diseases result from the joint effects of multiple loci and
environmental factors, each of which has a small
individual contribution [1,2]. A variety of tests have been
proposed to establish the joint association of multiple SNPs
with the phenotype.
For such joint association analyses, the first category of
statistical methods are those that use single SNP p-values
or test statistics to construct a new joint test statistic. The
Most Significant SNP method (MSS) uses the smallest
pvalue to declare significance. Fishers method combines
pvalues by using negative of the twice of logarithm of
product of p-values. Another group of test statistics pools
SNPs with relatively strong signals from univariate tests,
which include the sum of the K largest test statistics [3],
the product of all the tests declared to be significant with
some level a [4], the product of the K most significant
pvalues (Ranked Truncated Product; RTP) [5], and the
weighted and truncated sum of logarithm of p-values [6].
Another category consists of strategies that modify the
standard multivariate test statistics in order to reduce the
effective degrees of freedom and hence improve the
power [7-9]. Evaluation of some of the methods is
presented in [10]. Some other methods use genetic similarity
between individuals to establish multi-marker association
in a genomic region with the phenotypes [2,11,12]. A
comparison of the methods using the similarity measures
and some additional references on multi-marker
association tests can be found in [12].
The above multi-marker association tests are useful to
establish the joint association of a set of SNPs with the
trait. But brute-force searches to identify the subsets of
associated SNPs contained on high-density SNP arrays
are inefficient. Gene annotation databases group
functionally relevant genes into biological pathways. Some of
these pathways are likely to be involved in the etiology of
complex diseases [13] and hence testing a few hundred
such pathways to identify the subsets of genes involved in
the diseases avoids a huge multiple testing burden. Thus
pathway-based analyses can offer an attractive alternative
to improve the power of GWAS, and may help us to
identify relevant subsets of genes in meaningful biological
pathways underlying complex diseases.
In the era of post-GWAS analysis there is considerable
interest in pathway analysis, and several approaches for
testing associations with pathways have been proposed.
The Kolmogorov-Smirnov procedure is used to detect
pathways containing a relatively high proportion of
significant SNPs in [14] or significant genes [15]. The SNP
Ratio Test (SRT) assesses association by comparing the
proportion of significant SNPs within a pathway with
those ratios in permuted data sets [16]. The Prioritizing
Risk Pathways (PRP) method defines a risk score to
identify risk pathways by integrating genetic factors and
biological networks [17]. Combinations of univariate test
statistics or univariate p-values for pathway association
analysis is also considered in the literature. Using the
Adaptive Ranked Truncated Product (ARTP) method
[18] for gene-level p-values, a pathway association
method has been proposed [19]. The sum of the
Armitage trend test statistics of all of the SNPs within the
pathway is used in [20]. Accounting for the correlation
among the SNPs within the pathway, three approaches
for combining univariate test statistics have been
recently proposed [21]. Important factors for
considerations in pathway analysis and a review of statistical
methods for testing pathway associations are given in
[22]; some interesting insights into the pathway analysis
using the existing data bases are summarized in [23].
Most of the pathway association methods consider all
the genetic variants (e.g. SNPs) within a pathway as
possible risk factors [17,20,21]. If only a subset of genetic
variants within a pathway has contribution to the
disease then these methods may result in loss of power. To
address this problem we propose a pathway-based
analysis using a model selection criterion to id (...truncated)