A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1186%2Fs13073-016-0261-8.pdf

A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics

James et al. Genome Medicine (2016) 8:13 DOI 10.1186/s13073-016-0261-8 RESEARCH Open Access A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics Regis A. James1, Ian M. Campbell2, Edward S. Chen2, Philip M. Boone2, Mitchell A. Rao2, Matthew N. Bainbridge2,3, James R. Lupski2,3,4,5, Yaping Yang2,6, Christine M. Eng2,6, Jennifer E. Posey2 and Chad A. Shaw1,2,7* Abstract Background: Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives. Methods: Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes. Results: We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants. Conclusions: Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge. Keywords: Disease gene discovery, Exome, Semantic similarity, Variant prioritization * Correspondence: 1 Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA 2 Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA Full list of author information is available at the end of the article © 2016 James et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. James et al. Genome Medicine (2016) 8:13 Background Genome-wide technologies, including next-generation sequencing, have become increasingly affordable, rapid, and clinically utilized, particularly in comparison to single gene screening. These revolutionary advances in data acquisition have made large-scale genotyping an essential tool for genetic diagnostics and the identification of novel deleterious variants potentially contributing to disease. They hold great promise for the future of molecular diagnosis and management of patients with genetic disease [1–6]. Such technologies also provide particular opportunity for the identification of causes of rare and orphan diseases, which until recently have suffered from a lack of computational tools to help bridge clinical genomics and medical phenotyping and to facilitate diagnostics [7–10]. Despite the promise of available data, the scale of variation presents an interpretive challenge: an individual patient’s genome can have hundreds of rare and putatively deleterious candidate causal variants [11]. Although in some instances diagnostic conclusions can be made without extensive interpretation (e.g., aneuploidies or nonsense variants in disease genes), the presence of numerous potentially deleterious variants typically requires substantial curation to identify the candidate deleterious variant(s) that best matches the clinical phenotypes of the patient in question [1–6, 12, 13]. The goal of integrated diagnostic approaches is to bring together variant knowledge with clinically ascertained patient phenotype characteristics to reach the best-informed diagnostic conclusions (Fig. 1a). Coincident with the rise of genome-wide data for diagnostics has been the development of standards and catalogs for clinical sign-out [14–16]. Much focus has addressed distinguishing clearly deleterious variants from other variants with less clear contribution to disease. Central to these efforts has been the development of compendia for matching observed variation to wellvetted disease information [11, 17]. Some variants cataloged as “deleterious” can also appear in unaffected individuals, and therefore additional tools have become necessary to identify from among the many candidate variants in affected individuals the specific variants or variant combinations—such as variant pairs for recessive disease—that may explain observed phenotypes [18]. Parallel to the development of catalogs and standards for variant analysis has been the development of systematic tools for representing patient information. The Human Phenotype Ontology (HPO), initially constructed in 2008, is a representation of the features of human disease and the hierarchical relationships that exist among them [19]. A key application of this work is The Phenomizer, a software tool for making comparisons of known diseases to patient phenotypes [20]. This tool uses semantic similarity methods to match patient characteristics, as Page 2 of 17 represen (...truncated)