A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration
Looso et al. Genome Biology
A de novo assembly of the newt transcriptome combined with proteomic validation identifies new protein families expressed during tissue regeneration
Mario Looso 1 3
Jens Preussner 1 3
Konstantinos Sousounis 0 3
Marc Bruckskotten 1 3
Christian S Michel 1 3
Ettore Lignelli 1 3
Richard Reinhardt 2 3
Sabrina Hffner 1 3
Marcus Krger 1 3
Panagiotis A Tsonis 0 3
Thilo Borchardt 1 3
Thomas Braun 1 3
0 Department of Biology and Center for Tissue Regeneration and Engineering at Dayton, University of Dayton , OH 45469-2320 , USA
1 Max-Planck-Institute for Heart and Lung Research , Ludwigstrasse 43, 61231 Bad Nauheim , Germany
2 Max-Planck Genome Centre Cologne , Carl-von-
3 Linne-Weg 10 , 50829 Koln , Germany
Background: Notophthalmus viridescens, an urodelian amphibian, represents an excellent model organism to study regenerative processes, but mechanistic insights into molecular processes driving regeneration have been hindered by a paucity and poor annotation of coding nucleotide sequences. The enormous genome size and the lack of a closely related reference genome have so far prevented assembly of the urodelian genome. Results: We describe the de novo assembly of the transcriptome of the newt Notophthalmus viridescens and its experimental validation. RNA pools covering embryonic and larval development, different stages of heart, appendage and lens regeneration, as well as a collection of different undamaged tissues were used to generate sequencing datasets on Sanger, Illumina and 454 platforms. Through a sequential de novo assembly strategy, hybrid datasets were converged into one comprehensive transcriptome comprising 120,922 non-redundant transcripts with a N50 of 975. From this, 38,384 putative transcripts were annotated and around 15,000 transcripts were experimentally validated as protein coding by mass spectrometry-based proteomics. Bioinformatical analysis of coding transcripts identified 826 proteins specific for urodeles. Several newly identified proteins establish novel protein families based on the presence of new sequence motifs without counterparts in public databases, while others containing known protein domains extend already existing families and also constitute new ones. Conclusions: We demonstrate that our multistep assembly approach allows de novo assembly of the newt transcriptome with an annotation grade comparable to well characterized organisms. Our data provide the groundwork for mechanistic experiments to answer the question whether urodeles utilize proprietary sets of genes for tissue regeneration.
-
Background
The regenerative potential of urodele amphibians and
especially newts as adult individuals has been known for
more than 200 years. The complete regeneration of
entire appendages [1] is one of the landmark abilities of
newts accompanied by their ability to regenerate parts
of the central nervous system [2,3], the lens [4] and the
heart (reviewed in [5,6]). Compared to other animal
models [7,8] the potential of the adult red spotted newt
for regeneration is remarkable. Newts do not lose the
capacity to regenerate the lens even after repetitive
tissue damage that continues over several years. Lenses
remain indistinguishable in their molecular signature
and morphology even after repetitive rounds of
regeneration [9]. In sharp contrast, the ability of mammalian
species to regenerate declines rapidly during postnatal
life, suggesting that the regenerative capacity in
mammalians is inversely proportional to the age of an
individual. At present, it is still unclear whether regeneration
in mammals is a mere extension of embryonic
development or represents an independent process. It seems
likely that a thorough analysis of the molecular
mechanisms of newt tissue regeneration will aid our
understanding of regenerative processes and help to develop
new therapeutic strategies.
Although the regenerative capability of the newt is
extraordinary, it has attracted less attention than other
model organisms in recent decades. This is partly due to
the comparatively long reproductive cycle of newts and
their enormous genome size, estimated to reach c 1010
bases, which is about 10-times the size of the human
genome. Therefore, no genome sequencing approach has so
far been initiated and only about 140 annotated
transcript and protein sequences are available in public
databases (NCBI, as of September 2011). To overcome these
obstacles, several initiatives were launched to obtain
more detailed omics data. A set of 11,000 EST
sequences [10] was uploaded to public databases and a
mass spectrometry-driven proteomics approach was able
to identify peptides for more than 1,000 newt proteins
[11]. Furthermore, we devised a comprehensive newt
data depository providing the ability to store, retrieve,
link and visualize sequences, proteins and expression
data [12]. This repository allows implementation of
comprehensive datasets derived from next generation
sequencing experiments and high-throughput proteomics.
Sequencing technologies have seen rapid progress in
recent years with respect to the amount of base calls and
price. Despite these advancements and dramatic price
cuts, the large size of the newt genome still plagues de
novo genome projects and makes them hardly affordable.
An obvious solution to this problem is the analysis of
transcriptomics data, but a detailed analysis of such data
is difficult in the absence of a comprehensive reference
dataset. The availability of a detailed reference
transcriptome of the newt Notophthalmus viridescens would also
yield functional insights and allow identification of new
and known proteins that might be instrumental in tissue
regeneration of urodelian amphibians.
Here, we present the de novo assembly of the newt
transcriptome, based on hybrid sequencing datasets derived
from Sanger, 454 Roche and Illumina platforms. Our
approach, which generated over 38,000 unique transcripts
with high quality annotations, covers embryonic and larval
development, different stages of heart, appendage and lens
regeneration and a comprehensive collection of
tissue-specific transcripts. To exclude sequencing artifacts and verify
coding sequences, transcriptome data were matched to a
large mass spectrometry-derived proteomics dataset,
resulting in the identification of 14,471 newt transcripts with
approved protein-coding capacity. Further bioinformatical
analysis disclosed several new protein families exclusive to
urodelian amphibians, of which some contain known
domains from public databases, but also entirely new
clusters of proteins sharing sequence motifs not known in
other species. We reason that some of the proprietary newt
proteins might play important roles in regeneration
processes unique to urodeles.
Results
Library construction and de novo assembly strategy
To achieve a broad coverage of the newt transcriptome,
we used 48,537 EST clones of a normalized cDNA library
derived from regenerating newt hearts (...truncated)