A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics
James et al. Genome Medicine (2016) 8:13
DOI 10.1186/s13073-016-0261-8
RESEARCH
Open Access
A visual and curatorial approach to clinical
variant prioritization and disease gene
discovery in genome-wide diagnostics
Regis A. James1, Ian M. Campbell2, Edward S. Chen2, Philip M. Boone2, Mitchell A. Rao2, Matthew N. Bainbridge2,3,
James R. Lupski2,3,4,5, Yaping Yang2,6, Christine M. Eng2,6, Jennifer E. Posey2 and Chad A. Shaw1,2,7*
Abstract
Background: Genome-wide data are increasingly important in the clinical evaluation of human disease. However,
the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic
review. Recent work has shown that systematic integration of clinical phenotype data with genotype information
can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive,
analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian
Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype
and variant data into ranked diagnostic alternatives.
Methods: Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of
semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface
for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute
phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling
to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that
contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and
algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally
assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to
filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive
approach for disease gene discovery based on patient phenotypes.
Results: We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating
performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported
variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases
with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen,
eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants.
Conclusions: Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by
more effectively utilizing available phenotype information, catalog data, and genomic knowledge.
Keywords: Disease gene discovery, Exome, Semantic similarity, Variant prioritization
* Correspondence:
1
Program in Structural and Computational Biology and Molecular Biophysics,
Baylor College of Medicine, Houston, TX 77030, USA
2
Department of Molecular & Human Genetics, Baylor College of Medicine,
Houston, TX, USA
Full list of author information is available at the end of the article
© 2016 James et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
James et al. Genome Medicine (2016) 8:13
Background
Genome-wide technologies, including next-generation
sequencing, have become increasingly affordable, rapid,
and clinically utilized, particularly in comparison to single gene screening. These revolutionary advances in data
acquisition have made large-scale genotyping an essential tool for genetic diagnostics and the identification of
novel deleterious variants potentially contributing to disease. They hold great promise for the future of molecular diagnosis and management of patients with genetic
disease [1–6]. Such technologies also provide particular
opportunity for the identification of causes of rare and
orphan diseases, which until recently have suffered from
a lack of computational tools to help bridge clinical genomics and medical phenotyping and to facilitate diagnostics [7–10]. Despite the promise of available data, the
scale of variation presents an interpretive challenge: an
individual patient’s genome can have hundreds of rare
and putatively deleterious candidate causal variants [11].
Although in some instances diagnostic conclusions can be
made without extensive interpretation (e.g., aneuploidies
or nonsense variants in disease genes), the presence of numerous potentially deleterious variants typically requires
substantial curation to identify the candidate deleterious
variant(s) that best matches the clinical phenotypes of
the patient in question [1–6, 12, 13]. The goal of integrated diagnostic approaches is to bring together variant knowledge with clinically ascertained patient
phenotype characteristics to reach the best-informed
diagnostic conclusions (Fig. 1a).
Coincident with the rise of genome-wide data for diagnostics has been the development of standards and catalogs for clinical sign-out [14–16]. Much focus has
addressed distinguishing clearly deleterious variants
from other variants with less clear contribution to disease. Central to these efforts has been the development
of compendia for matching observed variation to wellvetted disease information [11, 17]. Some variants cataloged as “deleterious” can also appear in unaffected individuals, and therefore additional tools have become
necessary to identify from among the many candidate
variants in affected individuals the specific variants or
variant combinations—such as variant pairs for recessive
disease—that may explain observed phenotypes [18].
Parallel to the development of catalogs and standards
for variant analysis has been the development of systematic tools for representing patient information. The Human Phenotype Ontology (HPO), initially constructed in
2008, is a representation of the features of human disease and the hierarchical relationships that exist among
them [19]. A key application of this work is The Phenomizer, a software tool for making comparisons of known
diseases to patient phenotypes [20]. This tool uses semantic similarity methods to match patient characteristics, as
Page 2 of 17
represen (...truncated)