De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.microbialcellfactories.com/content/pdf/1475-2859-11-36.pdf

De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology

Marijke A Luttik 0 Pascale Daran-Lapujade 0 Wanwipa Vongsangnak Jens Nielsen Wilbert HM Heijne Paul Klaassen Chris J Paddon Darren Platt Peter Ktter Roeland C van Ham Marcel JT Reinders 1 J k T Pronk 0 Di k Ri r 1 J n-M r D r n 0 0 Department of Biotechnology, Delft University of Technology , Julianalaan 67, 2628 BC Delft , The Netherlands 1 The Delft Bioinformatics Lab, Department of Intelligent Systems, Delft University of Technology , Mekelweg 4, 2628 CD Delft , The Netherlands - De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology Nijkamp et al. Open Access De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology Jurgen F Nijkamp1,9, Marcel van den Broek2,9, Erwin Datema3,4,12, Stefan de Kok2,7,9, Lizanne Bosman2,9, Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains. Background The 1000-dollar genome, an iconic goal in human genomics, is already a reality for the yeast Saccharomyces cerevisiae (based on September 2011 quotes from several sequencing companies for sequencing a 12 Mb genome via paired-end short-read sequencing, at over 40-fold coverage). Although a high quality reference genome of the laboratory strain S. cerevisiae S288C has been available since 1996 [1], there are four main reasons to (re) sequence the genomes of other S. cerevisiae strains. First, the considerable sequence divergence among S. cerevisiae species may cause practical complications, for example, the design of oligonucleotide arrays and cassettes for gene disruption in non-S288C strains. The discovery of > 250,000 polymorphisms in 71 S. cerevisiae strains sequenced at low coverage [2] illustrates that this is not a trivial problem. Secondly, although the genomes of S. cerevisiae strains appears to be much more strongly conserved than those of other organisms, such as E. coli [3], S. cerevisiae strains do show physiologically relevant differences in their gene complement. For example, the absence of a functional MALx3 gene in S. cerevisiae S288C leads to a maltose-negative phenotype, while an atypical ENA gene complement renders the laboratory strain CEN.PK113-7D more sensitive to lithium ions [4]. The possible importance of strain-specific genes is illustrated by the identification of a probable horizontal gene transfer event in the S. cerevisiae wine strain EC1118, that led to the acquisition of genes from the spoilage yeast Zygosaccharomyces bailii [5]. Third, in addition to the presence or absence of coding regions, differences can occur in non-coding regions, such as promoter regions. Knowledge of such differences is essential for the analysis and modelling of regulatory networks in systems biology [6]. Finally, laboratory evolution is rapidly gaining popularity as a tool to analyse genome function and to select for yeast strains with industrially relevant properties [7-11]. Genome comparisons based on mapping short-read data to a distant relative may overlook structural changes. Hence availability of a well-annotated, high-quality reference genome is essential to interpret the changes that occur during laboratory evolution. Several wild and domestic yeast strains have been sequenced. At the moment, forty-seven genome projects for S. cerevisiae have been registered at GenBank from which twenty-eight contain a de novo assembled (draft) genome [1,5,12-20]. The isogenic family of CEN.PK strains was developed by crossing of different laboratory strains of S. cerevisiae in the 1990s by a consortium of German yeast researchers [21]. A subsequent multi-laboratory study in which four S. cerevisiae strains were compared, confirmed that the CEN.PK strains combine good accessibility to classical and molecular genetics techniques with excellent growth characteristics under controlled, industrially relevant conditions [22]. These strains, and in particular the haploid MATa strain CEN.PK113-7D, have since become extremely popular for studies in systems biology [23,24]. Moreover, the excellent growth characteristics of the CEN.PK strains have resulted in their broad application in metabolic and evolutionary engineering studies, for example for the fermentation of pentose sugars [25-28], production of ethanol [29,30] and spirits [31] production of lactate and pyruvate [32,33], production of C4-dicarboxylic acids [34], isoprenoids [35,36], and fungal polyketide (6-methylsalicylic acid) [37]. Genomic differences between S. cerevisiae CEN.PK1137D and the S288C strain have been the subject of several studies. Daran-Lapujade and co-workers [38] performed a comparative genotyping of the two strains by hybridization of genomic DNA to oligonucleotide gene-expression arrays. This work led to the identification of several genes that were absent in CEN.PK113-7D, but present in S288C. Schacherer and co-workers [39] employed an oligonucleotide tiling microarray (Affymetrix S. cerevisiae Tiling 1.0R array) based on the S288C genome to detect locations of single nucleotide variation (SNV) in order to narrow down the amount of sequencing needed using traditional sequencing approaches and to find genes absent in CEN.PK113-7D such as RDS1 and EHD3. SNVs in CEN.PK113-7D compared to S288C have previously been characterized by mapping next-generation DNA sequencing data to the S288C reference genome followed by SNV calling [35]. The use of short read (35-bp) sequences and a limi (...truncated)