"Does replication groups scoring reduce false positive rate in SNP interaction discovery?: Response" (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-11-403.pdf

"Does replication groups scoring reduce false positive rate in SNP interaction discovery?: Response"

BMC Genomics C"oDrreospeonsderneceplication groups scoring reduce false positive rate in SNP interaction discovery?: Response" Javier Gayn 0 Antonio Gonzlez-Prez 0 Agustn Ruiz 0 0 Neocodex , Avda. Charles Darwin 6, 41092 Sevilla , Spain A response to Toplak et al: Does replication groups scoring reduce false positive rate in SNP interaction discovery? BMC Genomics 2010, 11:58. Background: The genomewide evaluation of genetic epistasis is a computationally demanding task, and a current challenge in Genetics. HFCC (Hypothesis-Free Clinical Cloning) is one of the methods that have been suggested for genomewide epistasis analysis. In order to perform an exhaustive search of epistasis, HFCC has implemented several tools and data filters, such as the use of multiple replication groups, and direction of effect and control filters. A recent article has claimed that the use of multiple replication groups (as implemented in HFCC) does not reduce the false positive rate, and we hereby try to clarify these issues. Results/Discussion: HFCC uses, as an analysis strategy, the possibility of replicating findings in multiple replication groups, in order to select a liberal subset of preliminary results that are above a statistical criterion and consistent in direction of effect. We show that the use of replication groups and the direction filter reduces the false positive rate of a study, although at the expense of lowering the overall power of the study. A post-hoc analysis of these selected signals in the combined sample could then be performed to select the most promising results. Conclusion: Replication of results in independent samples is generally used in scientific studies to establish credibility in a finding. Nonetheless, the combined analysis of several datasets is known to be a preferable and more powerful strategy for the selection of top signals. HFCC is a flexible and complete analysis tool, and one of its analysis options combines these two strategies: A preliminary multiple replication group analysis to eliminate inconsistent false positive results, and a post-hoc combined-group analysis to select the top signals. - Background Epistasis, the interaction among genetic loci, is a frequent phenomenon in nature [1]. However, the detection of epistatic effects in observational data has not been an easy task because of the lack of appropriate samples and methodologies [2,3]. Thanks to the recent collection of large genetic datasets, we are now at a position where the study of epistasis in humans is becoming possible. Nonetheless, the genomewide evaluation of genetic epistasis is a computationally and statistically demanding task, due to the large number of possible combinations of loci that can be formed. For example, for a genomewide analysis with * Correspondence: 1 Neocodex, Avda. Charles Darwin 6, 41092 Sevilla, Spain Full list of author information is available at the end of the article 100,000 SNPs, there are 5 109 two-locus combinations, and 1.7 1014 three-locus combinations. For 1 million SNPs, there are 5 1011 two-locus and 1.7 1017 threelocus combinations. The exhaustive search for epistasis across this large data space is a challenge for today's genehunters. In this context, a variety of software has been released to tackle this issue ([4] for review). HFCC (Hypothesis-Free Clinical Cloning) [4] is one of these tools that have made possible genomewide epistasis analysis. This software uses case-control samples to test for single-locus or multilocus genetic association. Multi-locus combinations that are significantly associated with a trait are then subjected to a variety of post-hoc tests to determine the degree of non-additivity of the marker combination, that is, to separate additive multi-marker combinations from more epistatic interactions. Those genetic effects that are due to epistatic interactions are one of the priorities in our analysis, because they complement those effects detectable by single-locus analysis. Because HFCC performs an exhaustive search of the entire data space, several optional tools have been implemented to overcome this multiple testing problem, such as multiple replication groups, the direction filter, the control filter, the tracking filter, etc. For example, the case-control sample can be simultaneously analyzed in replication groups, to select only significant results in each group. There is also a complementary direction filter, which selects only those results which are consistent across groups, that is, they are significant and with the same direction of effect in each group. A recent article by Toplak et al. [5] has been inspired by the following statement in HFCC's article [[4], page 3]: "... a multi-group analysis strategy ... allows the replication of consistent results, and it also aids the elimination of false positive results, a very attractive quality for genome-wide analysis of large number of genetic markers." These authors have interpreted the above statement as claiming that using replication groups, by itself, reduces "the false positive rate" [[5], page 2], and can therefore "... improve ... any type of feature ranking and selection procedure ..." [[5], page 1]. Our approach to detecting multi-locus effects uses a two-stage analysis strategy. In a first step, a large subset of preliminary results that are associated with the disease are selected. Then, this liberal subset of results is subjected to a post-hoc analysis to select the most promising results [[4], page 7]. Using replication groups is only one of the possible analysis strategies of HFCC, aiming to reduce the number of selected signals, that is, it eliminates a larger amount of the tail of the distribution of the results, which are mostly false positives, together with some true effects that are undistinguishable from unassociated variants [10]. Our original statement claims that using multiple replication groups should reduce the number of signals selected with a liberal statistical threshold (mostly false positives), but does not claim to use this strategy to select the top results of the study. Indeed, to select the top findings, we analyze the combined sample [[4], page 6], which, as we state repeatedly across our article [[4], pages 2,3,4], is known to be the most powerful analysis strategy [6,7]. Therefore it seems Toplak et al. interpreted our article incorrectly, and applied this misinterpretation to test their own hypothesis (replication groups aids prioritization of signals), which was finally rejected by their simulations. In their paper, these authors provide evidence that the analysis of a combined sample is less prone to false positives than the separate analysis of replication samples. However, it is not clear from their article whether they have selected signals consistent in strength and direction, as suggested in our paper and in the guidelines for replication of association results [8], what may compromise to some exte (...truncated)