GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants

BMC Bioinformatics, Jan 2017

Background The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. Results We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. Conclusions GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future requirements.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/s12859-016-1430-x.pdf

GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants

Tebel et al. BMC Bioinformatics (2017) 18:19 DOI 10.1186/s12859-016-1430-x SOFTWARE Open Access GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants Katrin Tebel1, Vivien Boldt1,2, Anne Steininger1,2, Matthias Port3, Grit Ebert1,2 and Reinhard Ullmann1,3* Abstract Background: The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. Results: We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. Conclusions: GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future requirements. Keywords: DNA copy number variants, Integrative visualization, Microarray, NGS Background DNA copy number variants represent the greatest source of genetic variability in humans [1] and are the underlying cause of many human diseases. Array CGH is recognized as a first-tier test for DNA copy number variants (CNV) [2] and accordingly, many laboratories have already established their pipelines for pre-processing of array CGH data * Correspondence: 1 Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany 3 Institut für Radiobiologie der Bundeswehr in Verb. mit der Universität Ulm, 80937 Munich, Germany Full list of author information is available at the end of the article and CNV calling. In many cases these pipelines are based on software packages provided by the companies selling DNA microarrays or scanners such as BlueFuse [3], CytoSure [4] or CytoGenomics [5]. Yet, the scope of these tools is focused on the identification of CNVs and their evaluation in the context of gene content and frequency of a given variant in the healthy population. Comparative analysis, which integrates data obtained from multiple patients, or other experiment types are hardly supported, in particular when they are based on different array platforms or NGS technology. © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Tebel et al. BMC Bioinformatics (2017) 18:19 Such kind of meta-analysis needs the implementation of additional commercial or free software. Each of the currently existing software solutions have their particular strength and focus. Some are particularly useful for the identification of genomic regions significantly associated with a given phenotype [4, 6–12] or have implemented algorithms specifically designed to detect and query copy number changes in SNP data sets [8, 13]. Others provide a gene centered view on copy number aberrations [14, 15] or examine CNVs in a clinical context [16]. Only a few free software packages offer a comprehensive spectrum of visualization and analysis tools for multidimensional array data operable via a graphical user interface [12–17]. What these tools have in common is that they have been designed with the intention to analyze microarray data. NGS data are usually displayed in alternative data browsers such as the Integrative Genome Viewer - IGV [18], or the Integrated Genome Browser – IGB [19]. These browsers also support visualization of array data when present in the appropriate format. However, as in the case of the IGV, analysis of array data that goes beyond visualization requires the export to the GenePattern software [20], where several web-based features for DNA copy number analysis are provided. In light of the increasing relevance of multi-dimensional data analysis several commercial softwares have been brought to market, including Partek [21], GenomicWorkbench [22], Genedata Expressionist for Genomic Profiling [23], Array Studio [24], GenomeStudio [25], CGH Fusion [26], Nexus Expression [27], CLC Workbench [28] and Subio [29]. Yet, these programs are neither open source nor free in most instances. Thus, considerable licensing fees have to be paid and advancement of this software is solely dependent on the company. Proceeding on the experiences with our previous analysis software CGHPRO [30], we aimed to create a versatile tool that facilitates the meta-analysis of array CGH results and corresponding data from other experiment types and platforms. We designed GenomeCAT under the premise that it is easy to install and use, and offers a broad spectrum of flexible visualization and analysis options without the need of specialized computer skills or the obligation to upload sensible patient data to web servers. Implementation Software architecture GenomeCAT is a desktop application developed in Java using the NetBeans Platform. It is an open source software and is provided as a free download. The program has a modular structure, which supports the programaided (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/s12859-016-1430-x.pdf
Article home page: http://www.biomedcentral.com/1471-2105/18/19

Katrin Tebel, Vivien Boldt, Anne Steininger, Matthias Port, Grit Ebert, Reinhard Ullmann. GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants, BMC Bioinformatics, 2017, pp. 19, 18, DOI: 10.1186/s12859-016-1430-x