A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration

Genome Biology, Feb 2013

Background Notophthalmus viridescens, an urodelian amphibian, represents an excellent model organism to study regenerative processes, but mechanistic insights into molecular processes driving regeneration have been hindered by a paucity and poor annotation of coding nucleotide sequences. The enormous genome size and the lack of a closely related reference genome have so far prevented assembly of the urodelian genome. Results We describe the de novo assembly of the transcriptome of the newt Notophthalmus viridescens and its experimental validation. RNA pools covering embryonic and larval development, different stages of heart, appendage and lens regeneration, as well as a collection of different undamaged tissues were used to generate sequencing datasets on Sanger, Illumina and 454 platforms. Through a sequential de novo assembly strategy, hybrid datasets were converged into one comprehensive transcriptome comprising 120,922 non-redundant transcripts with a N50 of 975. From this, 38,384 putative transcripts were annotated and around 15,000 transcripts were experimentally validated as protein coding by mass spectrometry-based proteomics. Bioinformatical analysis of coding transcripts identified 826 proteins specific for urodeles. Several newly identified proteins establish novel protein families based on the presence of new sequence motifs without counterparts in public databases, while others containing known protein domains extend already existing families and also constitute new ones. Conclusions We demonstrate that our multistep assembly approach allows de novo assembly of the newt transcriptome with an annotation grade comparable to well characterized organisms. Our data provide the groundwork for mechanistic experiments to answer the question whether urodeles utilize proprietary sets of genes for tissue regeneration.

Article PDF cannot be displayed. You can download it here:

http://genomebiology.com/content/pdf/gb-2013-14-2-r16.pdf

A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration

Looso et al. Genome Biology A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration Mario Looso 1 3 Jens Preussner 1 3 Konstantinos Sousounis 0 3 Marc Bruckskotten 1 3 Christian S Michel 1 3 Ettore Lignelli 1 3 Richard Reinhardt 2 3 Sabrina Hffner 1 3 Marcus Krger 1 3 Panagiotis A Tsonis 0 3 Thilo Borchardt 1 3 Thomas Braun 1 3 0 Department of Biology and Center for Tissue Regeneration and Engineering at Dayton, University of Dayton , OH 45469-2320 , USA 1 Max-Planck-Institute for Heart and Lung Research , Ludwigstrasse 43, 61231 Bad Nauheim , Germany 2 Max-Planck Genome Centre Cologne , Carl-von- 3 Linne-Weg 10 , 50829 Koln , Germany Background: Notophthalmus viridescens, an urodelian amphibian, represents an excellent model organism to study regenerative processes, but mechanistic insights into molecular processes driving regeneration have been hindered by a paucity and poor annotation of coding nucleotide sequences. The enormous genome size and the lack of a closely related reference genome have so far prevented assembly of the urodelian genome. Results: We describe the de novo assembly of the transcriptome of the newt Notophthalmus viridescens and its experimental validation. RNA pools covering embryonic and larval development, different stages of heart, appendage and lens regeneration, as well as a collection of different undamaged tissues were used to generate sequencing datasets on Sanger, Illumina and 454 platforms. Through a sequential de novo assembly strategy, hybrid datasets were converged into one comprehensive transcriptome comprising 120,922 non-redundant transcripts with a N50 of 975. From this, 38,384 putative transcripts were annotated and around 15,000 transcripts were experimentally validated as protein coding by mass spectrometry-based proteomics. Bioinformatical analysis of coding transcripts identified 826 proteins specific for urodeles. Several newly identified proteins establish novel protein families based on the presence of new sequence motifs without counterparts in public databases, while others containing known protein domains extend already existing families and also constitute new ones. Conclusions: We demonstrate that our multistep assembly approach allows de novo assembly of the newt transcriptome with an annotation grade comparable to well characterized organisms. Our data provide the groundwork for mechanistic experiments to answer the question whether urodeles utilize proprietary sets of genes for tissue regeneration. - Background The regenerative potential of urodele amphibians and especially newts as adult individuals has been known for more than 200 years. The complete regeneration of entire appendages [1] is one of the landmark abilities of newts accompanied by their ability to regenerate parts of the central nervous system [2,3], the lens [4] and the heart (reviewed in [5,6]). Compared to other animal models [7,8] the potential of the adult red spotted newt for regeneration is remarkable. Newts do not lose the capacity to regenerate the lens even after repetitive tissue damage that continues over several years. Lenses remain indistinguishable in their molecular signature and morphology even after repetitive rounds of regeneration [9]. In sharp contrast, the ability of mammalian species to regenerate declines rapidly during postnatal life, suggesting that the regenerative capacity in mammalians is inversely proportional to the age of an individual. At present, it is still unclear whether regeneration in mammals is a mere extension of embryonic development or represents an independent process. It seems likely that a thorough analysis of the molecular mechanisms of newt tissue regeneration will aid our understanding of regenerative processes and help to develop new therapeutic strategies. Although the regenerative capability of the newt is extraordinary, it has attracted less attention than other model organisms in recent decades. This is partly due to the comparatively long reproductive cycle of newts and their enormous genome size, estimated to reach c 1010 bases, which is about 10-times the size of the human genome. Therefore, no genome sequencing approach has so far been initiated and only about 140 annotated transcript and protein sequences are available in public databases (NCBI, as of September 2011). To overcome these obstacles, several initiatives were launched to obtain more detailed omics data. A set of 11,000 EST sequences [10] was uploaded to public databases and a mass spectrometry-driven proteomics approach was able to identify peptides for more than 1,000 newt proteins [11]. Furthermore, we devised a comprehensive newt data depository providing the ability to store, retrieve, link and visualize sequences, proteins and expression data [12]. This repository allows implementation of comprehensive datasets derived from next generation sequencing experiments and high-throughput proteomics. Sequencing technologies have seen rapid progress in recent years with respect to the amount of base calls and price. Despite these advancements and dramatic price cuts, the large size of the newt genome still plagues de novo genome projects and makes them hardly affordable. An obvious solution to this problem is the analysis of transcriptomics data, but a detailed analysis of such data is difficult in the absence of a comprehensive reference dataset. The availability of a detailed reference transcriptome of the newt Notophthalmus viridescens would also yield functional insights and allow identification of new and known proteins that might be instrumental in tissue regeneration of urodelian amphibians. Here, we present the de novo assembly of the newt transcriptome, based on hybrid sequencing datasets derived from Sanger, 454 Roche and Illumina platforms. Our approach, which generated over 38,000 unique transcripts with high quality annotations, covers embryonic and larval development, different stages of heart, appendage and lens regeneration and a comprehensive collection of tissue-specific transcripts. To exclude sequencing artifacts and verify coding sequences, transcriptome data were matched to a large mass spectrometry-derived proteomics dataset, resulting in the identification of 14,471 newt transcripts with approved protein-coding capacity. Further bioinformatical analysis disclosed several new protein families exclusive to urodelian amphibians, of which some contain known domains from public databases, but also entirely new clusters of proteins sharing sequence motifs not known in other species. We reason that some of the proprietary newt proteins might play important roles in regeneration processes unique to urodeles. Results Library construction and de novo assembly strategy To achieve a broad coverage of the newt transcriptome, we used 48,537 EST clones of a normalized cDNA library derived from regenerating newt hearts (...truncated)


This is a preview of a remote PDF: http://genomebiology.com/content/pdf/gb-2013-14-2-r16.pdf
Article home page: http://genomebiology.com/content/14/2/R16

Mario Looso, Jens Preussner, Konstantinos Sousounis, Marc Bruckskotten, Christian S Michel, Ettore Lignelli, Richard Reinhardt, Sabrina Höffner, Marcus Krüger, Panagiotis A Tsonis, Thilo Borchardt, Thomas Braun. A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration, Genome Biology, 2013, pp. R16, 14, DOI: 10.1186/gb-2013-14-2-r16