ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

BMC Genomics, Apr 2013

Background With the emergence of next-generation sequencing, the availability of prokaryotic genome sequences is expanding rapidly. A total of 5,276 genomes have been released since 2008, yet only 1,692 genomes were complete. The final phase of microbial genome sequencing, particularly gap closing, is frequently the rate-limiting step either because of complex genomic structures that cause sequence bias even with high genomic coverage, or the presence of repeat sequences that may cause gaps in assembly. Results We have developed a Cytoscape plugin to facilitate gap closing for high-throughput sequencing data from microbial genomes. This plugin is capable of interactively displaying the relationships among genomic contigs derived from various sequencing formats. The sequence contigs of plasmids and special repeats (IS elements, ribosomal RNAs, terminal repeats, etc.) can be displayed as well. Conclusions Displaying relationships between contigs using graphs in Cytoscape rather than tables provides a more straightforward visual representation. This will facilitate a faster and more precise determination of the linkages among contigs and greatly improve the efficiency of gap closing.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/1471-2164-14-289.pdf

ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

BMC Genomics ContigScape: a Cytoscape plugin facilitating microbial genome gap closing Biao Tang 1 2 Qi Wang 0 Minjun Yang 1 Feng Xie 0 Yongqiang Zhu 1 Ying Zhuo 0 Shengyue Wang 1 Hong Gao 0 Xiaoming Ding 2 Lixin Zhang 0 Guoping Zhao 1 2 Huajun Zheng 1 0 CAS Key Laboratory of Pathogenic Microbiology & Immunology, Institute of Microbiology, Chinese Academy of Sciences , Beijing 100190 , China 1 Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai , Shanghai 201203 , China 2 State Key Laboratory of Genetic Engineering, Department of Microbiology, School of Life Sciences, Fudan University , Shanghai 200433 , China Background: With the emergence of next-generation sequencing, the availability of prokaryotic genome sequences is expanding rapidly. A total of 5,276 genomes have been released since 2008, yet only 1,692 genomes were complete. The final phase of microbial genome sequencing, particularly gap closing, is frequently the ratelimiting step either because of complex genomic structures that cause sequence bias even with high genomic coverage, or the presence of repeat sequences that may cause gaps in assembly. Results: We have developed a Cytoscape plugin to facilitate gap closing for high-throughput sequencing data from microbial genomes. This plugin is capable of interactively displaying the relationships among genomic contigs derived from various sequencing formats. The sequence contigs of plasmids and special repeats (IS elements, ribosomal RNAs, terminal repeats, etc.) can be displayed as well. Conclusions: Displaying relationships between contigs using graphs in Cytoscape rather than tables provides a more straightforward visual representation. This will facilitate a faster and more precise determination of the linkages among contigs and greatly improve the efficiency of gap closing. ContigScape; Repeat contig; Microbial; Visualization; Linkage; Gap closing - Background The emergence of next-generation sequencing (NGS) technology greatly facilitated genome sequencing. The long reads produced by Roche 454 or PacBio SMRT makes de novo assembly easier to complete. Despite the symmetrical representation of sequences produced by 454 or other NGS methods, tens to hundreds of contigs still exist due to repeat sequences or GC/AT-rich regions in the genomes. Therefore, determining the order of contigs and filling in the gaps among them using PCR are two essential and rate-limiting steps in the final phase of wholegenome sequencing. The Newbler Assembler developed by Roche 454 has strict parameters to avoid mis-assembly and thus results in the breakdown of some contigs. For example, one read would be separated and placed into two contigs due to base-calling variation in different reads, and in some extreme cases, no gap truly existed between two such contigs. Several existing scaffolders for high throughput sequencing (HTS) genome assemblies, such as GRASS [1], SSPACE [2], OPERA [3] and MIP Scaffolder [4], may provide effective scaffolding; however, they lack global visualization and have to face the balance between scaffold length and accuracy. Most visualization tools, such as Consed [5], DNASTAR lasergene [6] and Gap [7], which are often used for genome completion and enable users to verify the assembly of contigs, can only display a linear relationship of contigs [8]. To provide a genome-level overview, ABySS-Explorer [9] and TGNet [10] were developed. TGNet incorporates several scripts for converting transcripts to facilitate assembly and represents contigs graphically using points. ABySS-Explorer [9] is another global viewer of contig assembly. However, neither program was designed to treat repeat contigs or display the reads that link contigs and imply the location of gaps and repeat contigs [8,10] (Table 1). These programs also lack special functions for microbial genome analysis. Therefore, we developed ContigScape, a Cytoscape [11] plugin that can be used to display all relationships of contigs, including each contig and linked reads in a microbial genome; the gaps and repetitive sequences can then be confirmed by users. Our goal is to display the original relationships of all contigs instead of a manually trimmed result, as the real association of contigs should be depicted as a network rather than a linear linkage. Furthermore, repeat contigs, gaps and even plasmids can be highlighted, filtered, and customized. ContigScape is a convenient Java plugin based on Cytoscape [11], which is an established, free, and opensource software platform for the visualization and analysis of molecular interaction networks and can be used on Windows, Linux and Mac platforms. ContigScape is a simple and efficient plugin that makes gap closing during microbial genome sequencing more efficient. Implementation Sequencing of samples, de novo assembly of the genomes, and scaffolding All genome sequences used in Table 2 had been released in GenBank and were (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2164-14-289.pdf

Biao Tang, Qi Wang, Minjun Yang, Feng Xie, Yongqiang Zhu, Ying Zhuo, Shengyue Wang, Hong Gao, Xiaoming Ding, Lixin Zhang, Guoping Zhao, Huajun Zheng. ContigScape: a Cytoscape plugin facilitating microbial genome gap closing, BMC Genomics, 2013, pp. 289, 14, DOI: 10.1186/1471-2164-14-289