GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/nar/article-pdf/36/suppl_2/W281/7624745/gkn226.pdf

GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries

Published online 28 April 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W281–W285 doi:10.1093/nar/gkn226 GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries Andrew E. Firth1 and Wayne M. Patrick2,* 1 BioSciences Institute, University College Cork, Cork, Ireland and 2Institute of Molecular Biosciences, Massey University, Auckland 0745, New Zealand Received January 24, 2008; Revised April 7, 2008; Accepted April 10, 2008 ABSTRACT There are many methods for introducing random mutations into nucleic acid sequences. Previously, we described a suite of programmes for estimating the completeness and diversity of randomized DNA libraries generated by a number of these protocols. Our programmes suggested some empirical guidelines for library design; however, no information was provided regarding library diversity at the protein (rather than DNA) level. We have now updated our web server, enabling analysis of translated libraries constructed by site-saturation mutagenesis and error-prone PCR (epPCR). We introduce GLUEIncluding Translation (GLUE-IT), which finds the expected amino acid completeness of libraries in which up to six codons have been independently varied (according to any user-specified randomization scheme). We provide two tools for assisting with experimental design: CodonCalculator, for assessing amino acids corresponding to given randomized codons; and AA-Calculator, for finding degenerate codons that encode user-specified sets of amino acids. We also present PEDEL-AA, which calculates amino acid statistics for libraries generated by epPCR. Input includes the parent sequence, overall mutation rate, library size, indel rates and a nucleotide mutation matrix. Output includes amino acid completeness and diversity statistics, and the number and length distribution of sequences truncated by premature termination codons. The web interfaces are available at http:// guinevere.otago.ac.nz/stats.html. questions in molecular evolution. In this approach, random mutagenesis is used to produce a large and diverse library of nucleic acid sequences, which is subsequently interrogated for rare, improved variants. Myriad protocols have been developed to produce the necessary molecular diversity (1–3). However, our ability to generate and screen randomized libraries is dwarfed by the amount of molecular diversity contained in protein sequence space. Even for a small, 100-residue protein, there are more potential amino acid sequences than there are atoms in the observable Universe (4). Increasingly, it is recognized that high-quality libraries are critical to the success of directed evolution experiments (5,6). Previously, we argued that the likelihood of ﬁnding a variant with a desired function in a randomized library is maximized when the library is maximally diverse (7). To the experimentalist, this corresponds to a library containing as few redundant sequences (including copies of the unmutated parental gene) and as many full-length sequences (lacking premature termination codons) as possible. To aid in the design of maximally diverse libraries, we developed a suite of user-friendly programmes for estimating the completeness and diversity that they contain (4,8). These programmes were limited to estimating library diversity at the nucleic acid level, and provided no explicit information regarding the translated products of the randomized genes. In this article, we describe an expanded web server, which enables the analysis of protein diversity in randomized libraries that have been generated by site-saturation mutagenesis and error-prone PCR (epPCR). The nucleotide programmes GLUE (for randomization techniques where all DNA sequence variants are equally likely), PEDEL (Programme for Estimating Diversity in Error-prone PCR Libraries) and DRIVeR (Diversity Resulting from In Vitro Recombination) are still maintained on the website, and have been described previously (4,8). INTRODUCTION In the past 15 years, directed evolution has developed into a broadly applicable strategy for generating new biomolecules with desirable properties, for probing protein structure and function, and for addressing fundamental GLUE-IT One of our previous programmes, GLUE, is broadly applicable to any protocol where all gene variants have *To whom correspondence should be addressed. Tel: +64 9 414 0800, ext. 9694; Fax: +64 9 441 8142; Email: ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. W282 Nucleic Acids Research, 2008, Vol. 36, Web Server issue an equal probability of occurring in a library. The most commonly used example is site-saturation mutagenesis (also referred to as oligonucleotide-directed randomization), in which randomized bases are incorporated into one or more of the primers in a PCR, allowing the generation of diversity at speciﬁc sites in an ampliﬁed gene. Other techniques that result in equally probable daughter variants (at the DNA level) include MAX randomization (9) and versions of DNA shuﬄing that utilize designed oligonucleotides (10–12). GLUE is also a useful estimator of the diversity in libraries generated by incremental truncation strategies, such as Expression of Soluble Proteins by Random Incremental Truncation (ESPRIT) (13), in which variants are close to being equally probable (14). We now introduce GLUE-Including Translation (GLUE-IT), which outputs the expected amino acid level diversity in any site-saturation mutagenesis library with up to six variable codons. The user speciﬁes the fully or partly randomized scheme used for each of the variable codons, and the size of the library that they have constructed (or, more often, the number of clones that they plan to screen). We provide two tools (CodonCalculator and AA-Calculator) to assist in choosing an appropriate randomization scheme for library construction. On specifying a fully or partly randomized codon, XYZ, CodonCalculator will output the possible amino acid variants and the number of times that each is encoded. AA-Calculator performs the opposite function: the user can specify a desired set of amino acids, and AA-Calculator will ﬁnd the degenerate codon(s) that are optimal for encoding them. Up to 50 degenerate codons are listed, ranked according to the fraction of the XYZ-speciﬁed codons that code for the desired amino acids. AA-Calculator therefore oﬀers a user-friendly alternative to downloading and executing the LibDesign algorithm (15), and provides users with a replacement for the Combinatorial Codons programme (16), which (as far as we are aware) is no longer available online. On entering the randomization scheme and library size, GLUE-IT will output a summar (...truncated)