RNA-Seq Based De Novo Transcriptome Assembly and Gene Discovery of Cistanche deserticola Fleshy Stem
May
RNA-Seq Based De Novo Transcriptome Assembly and Gene Discovery of Cistanche deserticola Fleshy Stem
Yuli Li 0 1 2
Xiliang Wang 0 1 2
Tingting Chen 0 1 2
Fuwen Yao 0 1 2
Cuiping Li 0 1 2
Qingli Tang 0 1 2
Min Sun 0 1 2
Gaoyuan Sun 0 1 2
Songnian Hu 0 1 2
Jun Yu 0 1 2
Shuhui Song 0 1 2
0 1 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing, China, 2 Core Genomic Facility , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing , China , 3 University of Chinese Academy of Sciences , Beijing, China, 4 HongKui CongRong Group, Alashan, Inner Mongolia , China
1 Funding: HongKui CongRong Group provided support in the form of salaries for author QT, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of this author is articulated in the 'author contributions' section
2 Academic Editor: Zhong-Hua Chen, University of Western Sydney , AUSTRALIA
Cistanche deserticola is a completely non-photosynthetic parasitic plant with great medicinal value and mainly distributed in desert of Northwest China. Its dried fleshy stem is a crucial tonic in traditional Chinese medicine with roles of mainly improving male sexual function and strengthening immunity, but few mechanistic studies have been conducted partly due to the lack of genomic and transcriptomic resources.
-
Competing Interests: Although one of the authors
(QT) is employed by a commercial company
In this study, we performed deep transcriptome sequencing in fleshy stem of C. deserticola,
and about 80 million reads were generated using Illumina pair-end sequencing on
HiSeq2000 platform. Using trinity assembler, we obtained 95,787 transcript sequences with
transcript lengths ranging from 200bp to 15,698bp, having an average length of 950 bases
and the N50 length of 1,519 bases. 63,957 transcripts were identified actively expressed
with FPKM 0.5, in which 30,098 transcripts were annotated with gene descriptions or
gene ontology terms by sequence similarity analyses against several public databases
(Uniprot, NR and Nt at NCBI, and KEGG). Furthermore, we identified key enzyme genes
involved in biosynthesis of lignin and phenylethanoid glycosides (PhGs) which are known to
be the primary active ingredients. Four phenylalanine ammonia-lyase (PAL) genes, the first
key enzyme in lignin and PhG biosynthesis, were identified based on sequences
comparison and phylogenetic analysis. Two biosynthesis pathways of PhGs were also proposed for
the first time.
In all, we completed a global analysis of the C. deserticola fleshy stem transcriptome using
RNA-seq technology. A collection of enzyme genes related to biosynthesis of lignin and
phenylethanoid glysides were identified from the assembled and annotated transcripts, and
the gene family of PAL was also predicted. The sequence data from this study will provide a
valuable resource for conducting future phenylethanoid glysides biosynthesis researches
and functional genomic studies in this important medicinal plant.
C. deserticola is a worldwide genus of perennial desert plants from the Orobanchaceae family,
and is a completely non-photosynthetic species and usually grows underground holoparasitic
plant [1]. It is parasitized on the roots of psammophyte Haloxylon ammodendron
(Chenopodiaceae) [1, 2], which mainly inhabits deserts and semi-deserts due to its high tolerance to
drought and salinity [1, 3]. C. deserticola shows strong resistance to harsh environmental
conditions and is mainly distributed in Northwest China [46], especially in Inner Mongolia,
Gansu and Xinjiang. It is considered to be an endangered wild species in recent years due to
increased consumption by humans [5, 6]. C. deserticola which is often called desert ginseng is
commonly known as desert-broomrape and the dried fleshy stem has been extensively used as
a traditionally important tonic in China and Japan for many years [4, 710]. It was initially
recorded in Shen Nong Ben Cao Jing (Dictionary of Chinese Materia Medica, 1977) [11]
approximately 1800 years ago and was regarded as one of the main sources of the Chinese medicinal
herba Cistanche.
The extracts of C. deserticola possess a wide range of medicinal functions, especially for use
in improving sexual function, tonifying kidney, protecting liver, aperient activity, enhancing
memory, immunomodulatory, antioxidative activity, anti-inflammatory, antiviral activity etc
[710, 1215]. The major bioactive components of C. deserticola are Phenylethanoid glycosides
(PheGs, PhGs) [2, 9, 10, 14, 15]. To date, more than 20 phenylethanoid glycosides have been
isolated from the succulent stem of C.deserticola [9, 14, 16]. Among them, acteoside and
echinacoside are two main components with significant pharmacological activities [2, 16], and
documented as the quality standards of C. deserticola in the Chinese pharmacopeia (2005 and
2010 editions). Three chemical components of PhGs are organic acid, saccharide and
phenylethanoid, however, the details concerning phenylethanoid biosynthetic pathways remain
poorly understood in C.deserticola.
Despite the commercial and medicinal importance of C.deserticola, the genomic and
transcriptomic data of this species are very limited. There is no ESTs available in the NCBI database
and the complete genome information for this species remains unavailable except for the
chloroplast genome sequence [1]. The limited transcriptomic data hinder the study of PhG
biosynthetic mechanisms. RNA-seq technology can generate sequences of the expressed parts of
targeted genome [17] and identify genes [18] using the NGS technology platforms (such as
Applied Biosystems SOLiD, Illumina HiSeq and Roche 454). It is becoming increasingly popular
in transcriptome de novo assembly [1922], since it is a cost-effective and powerful approach
with high resolution and broad dynamic range [2325], especially that it has an advantage to
explore low abundance transcripts [26]. Because of the various advantages, RNA-seq is
specifically attractive for non-model organisms with limited genetic resources [2729]. But there is
no any detailed research of C. deserticola transcriptome by RNA-seq.
In this study, we globally sequenced the stem transcriptome for C. deserticola using Illumina
Hiseq2000 platform, and got 7.9G raw data. By assembly and annotation, we mined the genes
involved in biosynthesis of PhG and the genes responsible for entire lignin biosynthesis. Our
RNA-seq analysis generated the first C. deserticola consensus trancriptome and provided new
insights into comprehensive understanding of the medicinal value of C. deserticola.
Additionally, the method described here can be widely applied to profile transcriptomes to facilitate the
discovery of genes involved in specific medicinal components biosynthesis pathway in other
medicinal plant with very limited genomic resources.
Mate (...truncated)