Pathway-based analysis using reduced gene subsets in genome-wide association studies

BMC Bioinformatics, Jan 2011

Background Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study. Results We applied this new method to a published GWAS of psoriasis and identified 6 biologically plausible pathways, after adjustment for multiple testing. The pathways identified in our analysis overlap with those reported in previous studies. Further, using simulations across a range of gene numbers and effect sizes, we demonstrate that the proposed approach enjoys higher power than several other approaches to detect associated pathways. Conclusions The proposed method could increase the power to discover susceptibility pathways and to identify associated genes using GWAS. In our analysis of genome-wide psoriasis data, we have identified a number of relevant pathways for psoriasis.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2105-12-17.pdf

Pathway-based analysis using reduced gene subsets in genome-wide association studies

BMC Bioinformatics Pathway-based analysis using reduced gene subsets in genome-wide association studies Jingyuan Zhao 0 Simone Gupta 2 Mark Seielstad 1 Jianjun Liu 0 Anbupalam Thalamuthu 0 0 Human Genetics , 60 Biopolis Street 02-01 , Genome Institute of Singapore , 138672 , Singapore 1 Institute for Human Genetics , 513 Parnassus Avenue , University of California , San Francisco , CA 94143 and Blood Systems Research Institute , 270 Masonic Avenue, San Francisco, CA 94118 , USA 2 McKusick - Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University , 733 N. Broadway St. Baltimore, MD 21205 , USA Background: Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study. Results: We applied this new method to a published GWAS of psoriasis and identified 6 biologically plausible pathways, after adjustment for multiple testing. The pathways identified in our analysis overlap with those reported in previous studies. Further, using simulations across a range of gene numbers and effect sizes, we demonstrate that the proposed approach enjoys higher power than several other approaches to detect associated pathways. Conclusions: The proposed method could increase the power to discover susceptibility pathways and to identify associated genes using GWAS. In our analysis of genome-wide psoriasis data, we have identified a number of relevant pathways for psoriasis. - Background Genetic association studies aim to detect associations between disease phenotypes and genetic variants. A commonly used tool to establish association between a SNP and a disease is to perform statistical tests of association for each individual SNP marker. A multiple testing correction can then be applied to control the overall type I error. However, such an approach typically captures only a small proportion of the contributing genetic variants. One likely reason is that common and complex diseases result from the joint effects of multiple loci and environmental factors, each of which has a small individual contribution [1,2]. A variety of tests have been proposed to establish the joint association of multiple SNPs with the phenotype. For such joint association analyses, the first category of statistical methods are those that use single SNP p-values or test statistics to construct a new joint test statistic. The Most Significant SNP method (MSS) uses the smallest pvalue to declare significance. Fishers method combines pvalues by using negative of the twice of logarithm of product of p-values. Another group of test statistics pools SNPs with relatively strong signals from univariate tests, which include the sum of the K largest test statistics [3], the product of all the tests declared to be significant with some level a [4], the product of the K most significant pvalues (Ranked Truncated Product; RTP) [5], and the weighted and truncated sum of logarithm of p-values [6]. Another category consists of strategies that modify the standard multivariate test statistics in order to reduce the effective degrees of freedom and hence improve the power [7-9]. Evaluation of some of the methods is presented in [10]. Some other methods use genetic similarity between individuals to establish multi-marker association in a genomic region with the phenotypes [2,11,12]. A comparison of the methods using the similarity measures and some additional references on multi-marker association tests can be found in [12]. The above multi-marker association tests are useful to establish the joint association of a set of SNPs with the trait. But brute-force searches to identify the subsets of associated SNPs contained on high-density SNP arrays are inefficient. Gene annotation databases group functionally relevant genes into biological pathways. Some of these pathways are likely to be involved in the etiology of complex diseases [13] and hence testing a few hundred such pathways to identify the subsets of genes involved in the diseases avoids a huge multiple testing burden. Thus pathway-based analyses can offer an attractive alternative to improve the power of GWAS, and may help us to identify relevant subsets of genes in meaningful biological pathways underlying complex diseases. In the era of post-GWAS analysis there is considerable interest in pathway analysis, and several approaches for testing associations with pathways have been proposed. The Kolmogorov-Smirnov procedure is used to detect pathways containing a relatively high proportion of significant SNPs in [14] or significant genes [15]. The SNP Ratio Test (SRT) assesses association by comparing the proportion of significant SNPs within a pathway with those ratios in permuted data sets [16]. The Prioritizing Risk Pathways (PRP) method defines a risk score to identify risk pathways by integrating genetic factors and biological networks [17]. Combinations of univariate test statistics or univariate p-values for pathway association analysis is also considered in the literature. Using the Adaptive Ranked Truncated Product (ARTP) method [18] for gene-level p-values, a pathway association method has been proposed [19]. The sum of the Armitage trend test statistics of all of the SNPs within the pathway is used in [20]. Accounting for the correlation among the SNPs within the pathway, three approaches for combining univariate test statistics have been recently proposed [21]. Important factors for considerations in pathway analysis and a review of statistical methods for testing pathway associations are given in [22]; some interesting insights into the pathway analysis using the existing data bases are summarized in [23]. Most of the pathway association methods consider all the genetic variants (e.g. SNPs) within a pathway as possible risk factors [17,20,21]. If only a subset of genetic variants within a pathway has contribution to the disease then these methods may result in loss of power. To address this problem we propose a pathway-based analysis using a model selection criterion to id (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2105-12-17.pdf
Article home page: http://www.biomedcentral.com/1471-2105/12/17

Jingyuan Zhao, Simone Gupta, Mark Seielstad, Jianjun Liu, Anbupalam Thalamuthu. Pathway-based analysis using reduced gene subsets in genome-wide association studies, BMC Bioinformatics, 2011, pp. 17, 12, DOI: 10.1186/1471-2105-12-17