De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology
Marijke A Luttik
0
Pascale Daran-Lapujade
0
Wanwipa Vongsangnak
Jens Nielsen
Wilbert HM Heijne
Paul Klaassen
Chris J Paddon
Darren Platt
Peter Ktter
Roeland C van Ham
Marcel JT Reinders
1
J
k T Pronk
0
Di
k
Ri
r
1
J
n-M
r
D
r
n
0
0
Department of Biotechnology, Delft University of Technology
,
Julianalaan 67, 2628 BC Delft
,
The Netherlands
1
The Delft Bioinformatics Lab, Department of Intelligent Systems, Delft University of Technology
,
Mekelweg 4, 2628 CD Delft
,
The Netherlands
-
De novo sequencing, assembly and analysis of
the genome of the laboratory strain
Saccharomyces cerevisiae CEN.PK113-7D, a model
for modern industrial biotechnology
Nijkamp et al.
Open Access
De novo sequencing, assembly and analysis of
the genome of the laboratory strain
Saccharomyces cerevisiae CEN.PK113-7D, a model
for modern industrial biotechnology
Jurgen F Nijkamp1,9, Marcel van den Broek2,9, Erwin Datema3,4,12, Stefan de Kok2,7,9, Lizanne Bosman2,9,
Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in
industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide
variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference
strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels
were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes
whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were
caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a
previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant
enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway.
Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific
metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin
prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL
loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines
characteristics of laboratory strains and wild-industrial strains.
Background
The 1000-dollar genome, an iconic goal in human
genomics, is already a reality for the yeast Saccharomyces
cerevisiae (based on September 2011 quotes from several
sequencing companies for sequencing a 12 Mb genome
via paired-end short-read sequencing, at over 40-fold
coverage).
Although a high quality reference genome of the
laboratory strain S. cerevisiae S288C has been available
since 1996 [1], there are four main reasons to (re)
sequence the genomes of other S. cerevisiae strains. First,
the considerable sequence divergence among S. cerevisiae
species may cause practical complications, for example,
the design of oligonucleotide arrays and cassettes for
gene disruption in non-S288C strains. The discovery of
> 250,000 polymorphisms in 71 S. cerevisiae strains
sequenced at low coverage [2] illustrates that this is not a
trivial problem. Secondly, although the genomes of S.
cerevisiae strains appears to be much more strongly
conserved than those of other organisms, such as E. coli [3],
S. cerevisiae strains do show physiologically relevant
differences in their gene complement. For example, the
absence of a functional MALx3 gene in S. cerevisiae
S288C leads to a maltose-negative phenotype, while an
atypical ENA gene complement renders the laboratory
strain CEN.PK113-7D more sensitive to lithium ions [4].
The possible importance of strain-specific genes is
illustrated by the identification of a probable horizontal gene
transfer event in the S. cerevisiae wine strain EC1118,
that led to the acquisition of genes from the spoilage
yeast Zygosaccharomyces bailii [5]. Third, in addition to
the presence or absence of coding regions, differences
can occur in non-coding regions, such as promoter
regions. Knowledge of such differences is essential for the
analysis and modelling of regulatory networks in systems
biology [6]. Finally, laboratory evolution is rapidly gaining
popularity as a tool to analyse genome function and to
select for yeast strains with industrially relevant
properties [7-11]. Genome comparisons based on mapping
short-read data to a distant relative may overlook
structural changes. Hence availability of a well-annotated,
high-quality reference genome is essential to interpret
the changes that occur during laboratory evolution.
Several wild and domestic yeast strains have been
sequenced. At the moment, forty-seven genome projects
for S. cerevisiae have been registered at GenBank from
which twenty-eight contain a de novo assembled (draft)
genome [1,5,12-20].
The isogenic family of CEN.PK strains was developed
by crossing of different laboratory strains of S. cerevisiae
in the 1990s by a consortium of German yeast
researchers [21]. A subsequent multi-laboratory study in which
four S. cerevisiae strains were compared, confirmed that
the CEN.PK strains combine good accessibility to
classical and molecular genetics techniques with excellent
growth characteristics under controlled, industrially
relevant conditions [22]. These strains, and in particular the
haploid MATa strain CEN.PK113-7D, have since become
extremely popular for studies in systems biology [23,24].
Moreover, the excellent growth characteristics of the
CEN.PK strains have resulted in their broad application
in metabolic and evolutionary engineering studies, for
example for the fermentation of pentose sugars [25-28],
production of ethanol [29,30] and spirits [31] production
of lactate and pyruvate [32,33], production of
C4-dicarboxylic acids [34], isoprenoids [35,36], and fungal
polyketide (6-methylsalicylic acid) [37].
Genomic differences between S. cerevisiae
CEN.PK1137D and the S288C strain have been the subject of several
studies. Daran-Lapujade and co-workers [38] performed
a comparative genotyping of the two strains by
hybridization of genomic DNA to oligonucleotide gene-expression
arrays. This work led to the identification of several
genes that were absent in CEN.PK113-7D, but present in
S288C. Schacherer and co-workers [39] employed an
oligonucleotide tiling microarray (Affymetrix S. cerevisiae
Tiling 1.0R array) based on the S288C genome to detect
locations of single nucleotide variation (SNV) in order to
narrow down the amount of sequencing needed using
traditional sequencing approaches and to find genes
absent in CEN.PK113-7D such as RDS1 and EHD3. SNVs
in CEN.PK113-7D compared to S288C have previously
been characterized by mapping next-generation DNA
sequencing data to the S288C reference genome followed
by SNV calling [35]. The use of short read (35-bp)
sequences and a limi (...truncated)