GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries (pdf)

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/36/suppl_2/W281.full.pdf

GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries

Andrew E. Firth 1 Wayne M. Patrick 0 0 Institute of Molecular Biosciences, Massey University , Auckland 0745, New Zealand 1 BioSciences Institute, University College Cork , Cork, Ireland There are many methods for introducing random mutations into nucleic acid sequences. Previously, we described a suite of programmes for estimating the completeness and diversity of randomized DNA libraries generated by a number of these protocols. Our programmes suggested some empirical guidelines for library design; however, no information was provided regarding library diversity at the protein (rather than DNA) level. We have now updated our web server, enabling analysis of translated libraries constructed by site-saturation mutagenesis and error-prone PCR (epPCR). We introduce GLUEIncluding Translation (GLUE-IT), which finds the expected amino acid completeness of libraries in which up to six codons have been independently varied (according to any user-specified randomization scheme). We provide two tools for assisting with experimental design: CodonCalculator, for assessing amino acids corresponding to given randomized codons; and AA-Calculator, for finding degenerate codons that encode user-specified sets of amino acids. We also present PEDEL-AA, which calculates amino acid statistics for libraries generated by epPCR. Input includes the parent sequence, overall mutation rate, library size, indel rates and a nucleotide mutation matrix. Output includes amino acid completeness and diversity statistics, and the number and length distribution of sequences truncated by premature termination codons. The web interfaces are available at http:// guinevere.otago.ac.nz/stats.html. - INTRODUCTION In the past 15 years, directed evolution has developed into a broadly applicable strategy for generating new biomolecules with desirable properties, for probing protein structure and function, and for addressing fundamental questions in molecular evolution. In this approach, random mutagenesis is used to produce a large and diverse library of nucleic acid sequences, which is subsequently interrogated for rare, improved variants. Myriad protocols have been developed to produce the necessary molecular diversity (13). However, our ability to generate and screen randomized libraries is dwarfed by the amount of molecular diversity contained in protein sequence space. Even for a small, 100-residue protein, there are more potential amino acid sequences than there are atoms in the observable Universe (4). Increasingly, it is recognized that high-quality libraries are critical to the success of directed evolution experiments (5,6). Previously, we argued that the likelihood of finding a variant with a desired function in a randomized library is maximized when the library is maximally diverse (7). To the experimentalist, this corresponds to a library containing as few redundant sequences (including copies of the unmutated parental gene) and as many full-length sequences (lacking premature termination codons) as possible. To aid in the design of maximally diverse libraries, we developed a suite of user-friendly programmes for estimating the completeness and diversity that they contain (4,8). These programmes were limited to estimating library diversity at the nucleic acid level, and provided no explicit information regarding the translated products of the randomized genes. In this article, we describe an expanded web server, which enables the analysis of protein diversity in randomized libraries that have been generated by site-saturation mutagenesis and error-prone PCR (epPCR). The nucleotide programmes GLUE (for randomization techniques where all DNA sequence variants are equally likely), PEDEL (Programme for Estimating Diversity in Error-prone PCR Libraries) and DRIVeR (Diversity Resulting from In Vitro Recombination) are still maintained on the website, and have been described previously (4,8). One of our previous programmes, GLUE, is broadly applicable to any protocol where all gene variants have an equal probability of occurring in a library. The most commonly used example is site-saturation mutagenesis (also referred to as oligonucleotide-directed randomization), in which randomized bases are incorporated into one or more of the primers in a PCR, allowing the generation of diversity at specific sites in an amplified gene. Other techniques that result in equally probable daughter variants (at the DNA level) include MAX randomization (9) and versions of DNA shuffling that utilize designed oligonucleotides (1012). GLUE is also a useful estimator of the diversity in libraries generated by incremental truncation strategies, such as Expression of Soluble Proteins by Random Incremental Truncation (ESPRIT) (13), in which variants are close to being equally probable (14). We now introduce GLUE-Including Translation (GLUE-IT), which outputs the expected amino acid level diversity in any site-saturation mutagenesis library with up to six variable codons. The user specifies the fully or partly randomized scheme used for each of the variable codons, and the size of the library that they have constructed (or, more often, the number of clones that they plan to screen). We provide two tools (CodonCalculator and AA-Calculator) to assist in choosing an appropriate randomization scheme for library construction. On specifying a fully or partly randomized codon, XYZ, CodonCalculator will output the possible amino acid variants and the number of times that each is encoded. AA-Calculator performs the opposite function: the user can specify a desired set of amino acids, and AA-Calculator will find the degenerate codon(s) that are optimal for encoding them. Up to 50 degenerate codons are listed, ranked according to the fraction of the XYZ-specified codons that code for the desired amino acids. AA-Calculator therefore offers a user-friendly alternative to downloading and executing the LibDesign algorithm (15), and provides users with a replacement for the Combinatorial Codons programme (16), which (as far as we are aware) is no longer available online. On entering the randomization scheme and library size, GLUE-IT will output a summary of statistics, including the number of possible DNA and amino acid variants that No. of distinct DNA variants are encoded by each randomized codon and the total number of possible amino acid variants in the library. The probability of a particular variant vi being present in the library is 1 (1 pi)L, where pi is the probability of any particular variant in the library being vi, and L is the library size. In the case of six fully randomized (NNN) codons, there are 206 = 6.4 107 possible variants. To quickly calculate the expected number of distinct variants in the library, C Pvi 1 1 piL, variants are grouped according to the number of ways in which they can be encoded. Each individual amino acid can be encoded by between one and six equiprobable codons, so for six randomized codons the (...truncated)