ContigScape: a Cytoscape plugin facilitating microbial genome gap closing
BMC Genomics
ContigScape: a Cytoscape plugin facilitating microbial genome gap closing
Biao Tang 1 2
Qi Wang 0
Minjun Yang 1
Feng Xie 0
Yongqiang Zhu 1
Ying Zhuo 0
Shengyue Wang 1
Hong Gao 0
Xiaoming Ding 2
Lixin Zhang 0
Guoping Zhao 1 2
Huajun Zheng 1
0 CAS Key Laboratory of Pathogenic Microbiology & Immunology, Institute of Microbiology, Chinese Academy of Sciences , Beijing 100190 , China
1 Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai , Shanghai 201203 , China
2 State Key Laboratory of Genetic Engineering, Department of Microbiology, School of Life Sciences, Fudan University , Shanghai 200433 , China
Background: With the emergence of next-generation sequencing, the availability of prokaryotic genome sequences is expanding rapidly. A total of 5,276 genomes have been released since 2008, yet only 1,692 genomes were complete. The final phase of microbial genome sequencing, particularly gap closing, is frequently the ratelimiting step either because of complex genomic structures that cause sequence bias even with high genomic coverage, or the presence of repeat sequences that may cause gaps in assembly. Results: We have developed a Cytoscape plugin to facilitate gap closing for high-throughput sequencing data from microbial genomes. This plugin is capable of interactively displaying the relationships among genomic contigs derived from various sequencing formats. The sequence contigs of plasmids and special repeats (IS elements, ribosomal RNAs, terminal repeats, etc.) can be displayed as well. Conclusions: Displaying relationships between contigs using graphs in Cytoscape rather than tables provides a more straightforward visual representation. This will facilitate a faster and more precise determination of the linkages among contigs and greatly improve the efficiency of gap closing.
ContigScape; Repeat contig; Microbial; Visualization; Linkage; Gap closing
-
Background
The emergence of next-generation sequencing (NGS)
technology greatly facilitated genome sequencing. The
long reads produced by Roche 454 or PacBio SMRT
makes de novo assembly easier to complete. Despite the
symmetrical representation of sequences produced by 454
or other NGS methods, tens to hundreds of contigs still
exist due to repeat sequences or GC/AT-rich regions in
the genomes. Therefore, determining the order of contigs
and filling in the gaps among them using PCR are two
essential and rate-limiting steps in the final phase of
wholegenome sequencing. The Newbler Assembler developed
by Roche 454 has strict parameters to avoid mis-assembly
and thus results in the breakdown of some contigs. For
example, one read would be separated and placed into
two contigs due to base-calling variation in different
reads, and in some extreme cases, no gap truly existed
between two such contigs. Several existing scaffolders for
high throughput sequencing (HTS) genome assemblies,
such as GRASS [1], SSPACE [2], OPERA [3] and MIP
Scaffolder [4], may provide effective scaffolding; however, they
lack global visualization and have to face the balance
between scaffold length and accuracy. Most visualization
tools, such as Consed [5], DNASTAR lasergene [6] and
Gap [7], which are often used for genome completion and
enable users to verify the assembly of contigs, can only
display a linear relationship of contigs [8]. To provide a
genome-level overview, ABySS-Explorer [9] and TGNet
[10] were developed. TGNet incorporates several scripts
for converting transcripts to facilitate assembly and
represents contigs graphically using points. ABySS-Explorer [9]
is another global viewer of contig assembly. However,
neither program was designed to treat repeat contigs or
display the reads that link contigs and imply the location of
gaps and repeat contigs [8,10] (Table 1). These programs
also lack special functions for microbial genome analysis.
Therefore, we developed ContigScape, a Cytoscape [11]
plugin that can be used to display all relationships of
contigs, including each contig and linked reads in a
microbial genome; the gaps and repetitive sequences can then be
confirmed by users. Our goal is to display the original
relationships of all contigs instead of a manually trimmed
result, as the real association of contigs should be depicted as
a network rather than a linear linkage. Furthermore, repeat
contigs, gaps and even plasmids can be highlighted, filtered,
and customized.
ContigScape is a convenient Java plugin based on
Cytoscape [11], which is an established, free, and
opensource software platform for the visualization and
analysis of molecular interaction networks and can be used
on Windows, Linux and Mac platforms. ContigScape is
a simple and efficient plugin that makes gap closing
during microbial genome sequencing more efficient.
Implementation
Sequencing of samples, de novo assembly of the
genomes, and scaffolding
All genome sequences used in Table 2 had been released
in GenBank and were (...truncated)