Long read and single molecule DNA sequencing simplifies genome assembly and TAL effector gene analysis of Xanthomonas translucens
Peng et al. BMC Genomics (2016) 17:21
DOI 10.1186/s12864-015-2348-9
RESEARCH ARTICLE
Open Access
Long read and single molecule DNA
sequencing simplifies genome assembly
and TAL effector gene analysis of
Xanthomonas translucens
Zhao Peng1, Ying Hu2, Jingzhong Xie1, Neha Potnis3, Alina Akhunova1, Jeffrey Jones3, Zhaohui Liu4,
Frank F. White1,3* and Sanzhen Liu1*
Abstract
Background: The species Xanthomonas translucens encompasses a complex of bacterial strains that cause diseases
and yield loss on grass species including important cereal crops. Three pathovars, X. translucens pv. undulosa, X.
translucens pv. translucens and X. translucens pv.cerealis, have been described as pathogens of wheat, barley, and
oats. However, no complete genome sequence for a strain of this complex is currently available.
Results: A complete genome sequence of X. translucens pv. undulosa strain XT4699 was obtained by using PacBio
long read, single molecule, real time (SMRT) DNA sequences and Illumina sequences. Draft genome sequences of
nineteen additional X. translucens strains, which were collected from wheat or barley in different regions and at
different times, were generated by Illumina sequencing. Phylogenetic relationships among different Xanthomonas
strains indicates that X. translucens are members of a distinct clade from so-called group 2 xanthomonads and three
pathovars of this species, undulosa, translucens and cerealis, represent distinct subclades in the group 1 clade.
Knockout mutation of type III secretion system of XT4699 eliminated the ability to cause water-soaking symptoms
on wheat and barley and resulted in a reduction in populations on wheat in comparison to the wild type strain.
Sequence comparison of X. translucens strains revealed the genetic variation on type III effector repertories among
different pathovars or within one pathovar. The full genome sequence of XT4699 reveals the presence of eight
members of the Transcription-Activator Like (TAL) effector genes, which are phylogenetically distant from previous
known TAL effector genes of group 2 xanthomonads. Microarray and qRT-PCR analyses revealed TAL effectorspecific wheat gene expression modulation.
Conclusions: PacBio long read sequencing facilitates the assembly of Xanthomonas genomes and the multiple TAL
effector genes, which are difficult to assemble from short read platforms. The complete genome sequence of X.
translucens pv. undulosa strain XT4699 and draft genome sequences of nineteen additional X. translucens strains
provides a resource for further genetic analyses of pathogenic diversity and host range of the X. translucens species
complex. TAL effectors of XT4699 strain play roles in modulating wheat host gene expressions.
Keywords: Bacterial leaf streak, X. translucens, PacBio, TAL effectors
* Correspondence: ;
1
Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
Full list of author information is available at the end of the article
© 2015 Peng et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Peng et al. BMC Genomics (2016) 17:21
Background
Bacterial pathogens of the genus Xanthomonas cause
disease symptoms in a wide range of plant species, including many economically important cereal crops [1].
The species X. translucens represents a complex of
strains that are pathogenic on various members of the
Poaceae, including wheat, barley, oat, rye and other grass
species. Bacterial leaf streak (BLS) and black chaff symptoms in the grain spikes on wheat are caused by X.
translucens pv. undulosa strains. Outbreaks of BLS
occur sporadically in central Great Plains and are associated with relatively warm and humid conditions, although the disease has been prevalent in recent
recurrent years in the northern Great Plains [2]. X.
translucens strains have been classified by pathogenicity
types and DNA fingerprinting technologies [3]. Strains
causing disease symptoms on barley and wheat are
named as X. translucens pv. undulosa, while strains only
pathogenic on barley are called X. translucens pv. translucens [4]. Although some strains of X. translucens pv.
cerealis behave similarly as X. translucens pv. undulosa
in pathogenicity types, they are distinguishable by DNA
fingerprinting [3]. Phylogenic analyses of various X.
translucens, do not align with the pathovar designations,
and clarifications await genomic analyses on larger strain
collections. In addition, many strains that were isolated
from other species, often have been reported to cause
disease symptoms on wheat [3]. For example, thirtythree bacterial strains isolated from diseased ornamental
asparagus were identified as X. translucens pv. undulosa
using DNA fingerprinting and cross inoculation [5].
Next-generation sequencing technologies have made
transformational changes over the Sanger sequencing by
improving throughput and reducing cost [6, 7]. The
draft genome sequences provide valuable information on
major genome contents and enable genome comparison
among strains of interest [8, 9]. Currently, draft genome
sequences are available for four strains from the X.
translucens group. Draft genomic data for X. translucens
pv. undulosa strain DAR61454, X. translucens pv. transluencs strain DSM18974, X. translucens pv. cerealis
CFBP2541, and X. translucens pv. graminis ART-Xtg29
were generated by using Illumina or Roche 454 short
read sequencing platforms [8–10]. At the same time,
genome assemblies based on Illumina and Roche 454 sequencing are fragmented, and most assemblies failed to
assemble complex repetitive sequences, including Transcription Activator-Like (TAL) effector genes, which
occur in multiple gene copies and contain multiple simple near-perfect repeats within each gene. TAL effector
genes typically have highly conserved N- and C-terminal
sequences, and harbor 12.5–28.5 units of 102 or 105 bp
repeats in the central regions [1]. Recently, a singlemolecule real-time (SMRT) sequencing technology was
Page 2 of 19
developed by Pacific Bioscience (PacBio) and produces
long sequence reads with no obvious sequencing biases
and may allow better resolution of long repetitive DNA
features in a genome. Due to a high error rate of PacBio
reads, a high sequencing depth (e.g., 50× or higher) is
usually required for a high-quality de novo assembly
[11–13].
In this study, the complete genome sequence of X.
translucens pv. undulosa strain XT4699 was generated
by using high-depth PacBio and Illumina (...truncated)