SNPsyn: detection and exploration of SNP–SNP interactions

Nucleic Acids Research, Jul 2011

SNPsyn (http://snpsyn.biolab.si) is an interactive software tool for the discovery of synergistic pairs of single nucleotide polymorphisms (SNPs) from large genome-wide case-control association studies (GWAS) data on complex diseases. Synergy among SNPs is estimated using an information-theoretic approach called interaction analysis. SNPsyn is both a stand-alone C++/Flash application and a web server. The computationally intensive part is implemented in C++ and can run in parallel on a dedicated cluster or grid. The graphical user interface is written in Adobe Flash Builder 4 and can run in most web browsers or as a stand-alone application. The SNPsyn web server hosts the Flash application, receives GWAS data submissions, invokes the interaction analysis and serves result files. The user can explore details on identified synergistic pairs of SNPs, perform gene set enrichment analysis and interact with the constructed SNP synergy network.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/39/suppl_2/W444.full.pdf

SNPsyn: detection and exploration of SNP–SNP interactions

W444–W449 Nucleic Acids Research, 2011, Vol. 39, Web Server issue doi:10.1093/nar/gkr321 Published online 16 May 2011 SNPsyn: detection and exploration of SNP–SNP interactions Tomaz Curk1,*, Gregor Rot1 and Blaz Zupan1,2,* 1 Faculty of Computer and Information Science, University of Ljubljana, Trzaska cesta 25, SI-1000 Ljubljana, Slovenia and 2Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA Received March 5, 2011; Revised April 15, 2011; Accepted April 20, 2011 ABSTRACT INTRODUCTION Current genome-wide case-control association studies (GWAS) focus on identifying a set of single nucleotide polymorphisms (SNPs) that are most associated with the disease under study. While individual SNPs are important indicators of main genetic components of complex diseases, they explain only a fraction of the genetic risk (1). Because of the low or at best modest information content of individual SNPs, it has been suggested (2) that uncovering synergy among genes may improve the predictive accuracy of models. A recent report by Gerke et al. (3) also suggests that synergistic combinations may carry information about the phenotype that cannot be discovered from observations of individual SNPs alone. *To whom correspondence should be addressed. Tel: +386 1 4768 267; Fax: +386 1 4264 647; Email: Correspondence may also be addressed to Blaz Zupan. Tel: +386 1 4768 402; Fax: +386 1 4264 647; Email: ß The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. SNPsyn (http://snpsyn.biolab.si) is an interactive software tool for the discovery of synergistic pairs of single nucleotide polymorphisms (SNPs) from large genome-wide case-control association studies (GWAS) data on complex diseases. Synergy among SNPs is estimated using an information-theoretic approach called interaction analysis. SNPsyn is both a stand-alone C++/Flash application and a web server. The computationally intensive part is implemented in C++ and can run in parallel on a dedicated cluster or grid. The graphical user interface is written in Adobe Flash Builder 4 and can run in most web browsers or as a stand-alone application. The SNPsyn web server hosts the Flash application, receives GWAS data submissions, invokes the interaction analysis and serves result files. The user can explore details on identified synergistic pairs of SNPs, perform gene set enrichment analysis and interact with the constructed SNP synergy network. An unequivocal proof of existence of SNP synergy would push the modeling efforts from trying to add effects of individual most informative SNPs towards models that include non-additive SNP interactions, in this way providing important insight into complex diseases and underlying molecular mechanisms. Various approaches to detect synergy have been proposed, which is commonly referred to as positive interaction (4), k-way interaction information (5), epistasis (6,7) or SNP synergy (8). In this article, we use the term ‘synergy’ and present a software tool that implements an information-theoretic approach to synergistic interaction analysis (4,5,8). Contrary to other approaches, interaction analysis does not require the user to specify which gene interaction models to test, but instead it discovers them from data. It assumes an additive model, where the expected amount of information on the phenotype for a combination of SNPs is equal to the sum of information of individual SNPs. Synergy is said to occur when a combination carries more information than the sum of information provided by individual SNPs (4,8). This difference between the ‘whole’ and ‘sum of parts’ cannot be gained from observations of individual SNPs alone, but only by simultaneously observing a combination of SNPs. Various degrees of synergy are associated with different SNP pair models (9). An extreme case is when the outcome is an XOR function of two SNPs. There, each individual SNP does not carry any information on the phenotype, while a simultaneous consideration of the two SNPs produces a perfect association with disease. This extreme case illustrates that, by definition, it is not possible to predict which SNPs will form a synergistic combination by observing individual SNPs alone. Two SNPs must first be combined into a new feature, and only then can the total information content for that particular combination be computed. Consequently, to discover a set of best-interacting SNPs we need to test exhaustively all possible combinations. The number of SNP combinations grows exponentially Nucleic Acids Research, 2011, Vol. 39, Web Server issue W445 Mutual information I(M; P), also called information gain, is based on calculations of entropy and corresponds to the level of association (i.e. shared information) between marker M and phenotype P. Given the value of marker M, mutual information estimates how well can we predict the value of phenotype P. The new feature f(M1, M2) may be derived by Cartesian product of values of SNPs M1 and M2 or by other methods for feature construction, e.g. Kramers method (11) or constructive induction by feature decomposition (12). For reasons of simplicity and speed, SNPsyn uses Cartesian product. Pairs of SNPs with positive synergy (Syn > 0) are called synergistic. Negative synergy (Syn < 0) indicates that the two SNPs carry redundant information, an effect typically observed among highly correlated SNPs. For further details on interaction analysis see Jakulin and Bratko (4) and a review by Anastassiou (8). METHODS AND IMPLEMENTATION Compact data format SNPsyn aims to optimize the computational time and at the same time provides an interaction-rich graphical user interface. The computationally intensive data analysis is implemented in C++. This computational library implements functions for calculating mutual information and information gain of individual and pairs of SNPs and synergy of pairs of SNPs. The library also includes functions for random data sampling and shuffling, estimation of probability distribution, calculation of false discovery rate [FDR, (10)] and functions for the subdivision of the analysis into independent subtasks that can run in parallel. Example scripts to perform the analysis in parallel on a cluster or grid are included in the distribution package. SNPsyn’s C++ library can be used to build custom applications for interaction analysis. A command-line interface to the library is provided, and is actually used by SNPsyn’s web server to perform interaction analysis. Results of interaction analysis are presented to the user through an interactive web application with a graphical user interface (GUI). The (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/39/suppl_2/W444.full.pdf
Article home page: http://nar.oxfordjournals.org/content/39/suppl_2/W444.abstract

Tomaz Curk, Gregor Rot, Blaz Zupan. SNPsyn: detection and exploration of SNP–SNP interactions, Nucleic Acids Research, 2011, pp. W444-W449, 39/suppl 2, DOI: 10.1093/nar/gkr321