GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants
Tebel et al. BMC Bioinformatics (2017) 18:19
DOI 10.1186/s12859-016-1430-x
SOFTWARE
Open Access
GenomeCAT: a versatile tool for the
analysis and integrative visualization of
DNA copy number variants
Katrin Tebel1, Vivien Boldt1,2, Anne Steininger1,2, Matthias Port3, Grit Ebert1,2 and Reinhard Ullmann1,3*
Abstract
Background: The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic
diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS
platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional
data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs
from others.
Results: We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of
CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of
multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing
the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from
microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from
different experiment types can be merged into a common data matrix to enable common visualization and analysis.
All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further
statistical calculations in external programs.
Conclusions: GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of
CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized
computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical
user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in
combination with the ability to create a common data matrix makes the program also well suited as an interface
between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the
functionality of GenomeCAT can be easily extended by further R packages or customized plug-ins to meet future
requirements.
Keywords: DNA copy number variants, Integrative visualization, Microarray, NGS
Background
DNA copy number variants represent the greatest source
of genetic variability in humans [1] and are the underlying
cause of many human diseases. Array CGH is recognized
as a first-tier test for DNA copy number variants (CNV)
[2] and accordingly, many laboratories have already established their pipelines for pre-processing of array CGH data
* Correspondence:
1
Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
3
Institut für Radiobiologie der Bundeswehr in Verb. mit der Universität Ulm,
80937 Munich, Germany
Full list of author information is available at the end of the article
and CNV calling. In many cases these pipelines are based
on software packages provided by the companies selling
DNA microarrays or scanners such as BlueFuse [3], CytoSure [4] or CytoGenomics [5]. Yet, the scope of these
tools is focused on the identification of CNVs and their
evaluation in the context of gene content and frequency
of a given variant in the healthy population. Comparative
analysis, which integrates data obtained from multiple
patients, or other experiment types are hardly supported,
in particular when they are based on different array platforms or NGS technology.
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Tebel et al. BMC Bioinformatics (2017) 18:19
Such kind of meta-analysis needs the implementation of
additional commercial or free software. Each of the currently existing software solutions have their particular
strength and focus. Some are particularly useful for the
identification of genomic regions significantly associated
with a given phenotype [4, 6–12] or have implemented algorithms specifically designed to detect and query copy
number changes in SNP data sets [8, 13]. Others provide a
gene centered view on copy number aberrations [14, 15]
or examine CNVs in a clinical context [16]. Only a few
free software packages offer a comprehensive spectrum of
visualization and analysis tools for multidimensional array
data operable via a graphical user interface [12–17]. What
these tools have in common is that they have been designed with the intention to analyze microarray data. NGS
data are usually displayed in alternative data browsers
such as the Integrative Genome Viewer - IGV [18], or the
Integrated Genome Browser – IGB [19]. These browsers
also support visualization of array data when present in
the appropriate format. However, as in the case of the
IGV, analysis of array data that goes beyond visualization
requires the export to the GenePattern software [20],
where several web-based features for DNA copy number
analysis are provided.
In light of the increasing relevance of multi-dimensional
data analysis several commercial softwares have been
brought to market, including Partek [21], GenomicWorkbench [22], Genedata Expressionist for Genomic Profiling
[23], Array Studio [24], GenomeStudio [25], CGH Fusion
[26], Nexus Expression [27], CLC Workbench [28] and
Subio [29]. Yet, these programs are neither open source
nor free in most instances. Thus, considerable licensing
fees have to be paid and advancement of this software is
solely dependent on the company.
Proceeding on the experiences with our previous analysis software CGHPRO [30], we aimed to create a versatile tool that facilitates the meta-analysis of array CGH
results and corresponding data from other experiment
types and platforms. We designed GenomeCAT under the
premise that it is easy to install and use, and offers a broad
spectrum of flexible visualization and analysis options
without the need of specialized computer skills or the obligation to upload sensible patient data to web servers.
Implementation
Software architecture
GenomeCAT is a desktop application developed in Java
using the NetBeans Platform. It is an open source software and is provided as a free download. The program
has a modular structure, which supports the programaided (...truncated)