GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries
Published online 28 April 2008
Nucleic Acids Research, 2008, Vol. 36, Web Server issue W281–W285
doi:10.1093/nar/gkn226
GLUE-IT and PEDEL-AA: new programmes for
analyzing protein diversity in randomized libraries
Andrew E. Firth1 and Wayne M. Patrick2,*
1
BioSciences Institute, University College Cork, Cork, Ireland and 2Institute of Molecular Biosciences,
Massey University, Auckland 0745, New Zealand
Received January 24, 2008; Revised April 7, 2008; Accepted April 10, 2008
ABSTRACT
There are many methods for introducing random
mutations into nucleic acid sequences. Previously,
we described a suite of programmes for estimating
the completeness and diversity of randomized DNA
libraries generated by a number of these protocols.
Our programmes suggested some empirical guidelines for library design; however, no information was
provided regarding library diversity at the protein
(rather than DNA) level. We have now updated our
web server, enabling analysis of translated libraries
constructed by site-saturation mutagenesis and
error-prone PCR (epPCR). We introduce GLUEIncluding Translation (GLUE-IT), which finds the
expected amino acid completeness of libraries in
which up to six codons have been independently
varied (according to any user-specified randomization scheme). We provide two tools for assisting
with experimental design: CodonCalculator, for
assessing amino acids corresponding to given
randomized codons; and AA-Calculator, for finding
degenerate codons that encode user-specified
sets of amino acids. We also present PEDEL-AA,
which calculates amino acid statistics for libraries
generated by epPCR. Input includes the parent
sequence, overall mutation rate, library size, indel
rates and a nucleotide mutation matrix. Output
includes amino acid completeness and diversity
statistics, and the number and length distribution of
sequences truncated by premature termination
codons. The web interfaces are available at http://
guinevere.otago.ac.nz/stats.html.
questions in molecular evolution. In this approach,
random mutagenesis is used to produce a large and diverse
library of nucleic acid sequences, which is subsequently
interrogated for rare, improved variants. Myriad protocols
have been developed to produce the necessary molecular
diversity (1–3). However, our ability to generate and screen
randomized libraries is dwarfed by the amount of
molecular diversity contained in protein sequence space.
Even for a small, 100-residue protein, there are more
potential amino acid sequences than there are atoms in the
observable Universe (4).
Increasingly, it is recognized that high-quality libraries
are critical to the success of directed evolution experiments
(5,6). Previously, we argued that the likelihood of finding
a variant with a desired function in a randomized library
is maximized when the library is maximally diverse (7).
To the experimentalist, this corresponds to a library
containing as few redundant sequences (including copies
of the unmutated parental gene) and as many full-length
sequences (lacking premature termination codons) as
possible. To aid in the design of maximally diverse libraries, we developed a suite of user-friendly programmes
for estimating the completeness and diversity that they
contain (4,8). These programmes were limited to estimating library diversity at the nucleic acid level, and provided
no explicit information regarding the translated products
of the randomized genes. In this article, we describe an
expanded web server, which enables the analysis of protein
diversity in randomized libraries that have been generated
by site-saturation mutagenesis and error-prone PCR
(epPCR). The nucleotide programmes GLUE (for randomization techniques where all DNA sequence variants
are equally likely), PEDEL (Programme for Estimating
Diversity in Error-prone PCR Libraries) and DRIVeR
(Diversity Resulting from In Vitro Recombination) are
still maintained on the website, and have been described
previously (4,8).
INTRODUCTION
In the past 15 years, directed evolution has developed
into a broadly applicable strategy for generating new
biomolecules with desirable properties, for probing protein
structure and function, and for addressing fundamental
GLUE-IT
One of our previous programmes, GLUE, is broadly
applicable to any protocol where all gene variants have
*To whom correspondence should be addressed. Tel: +64 9 414 0800, ext. 9694; Fax: +64 9 441 8142; Email:
ß 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
W282 Nucleic Acids Research, 2008, Vol. 36, Web Server issue
an equal probability of occurring in a library. The most
commonly used example is site-saturation mutagenesis
(also referred to as oligonucleotide-directed randomization), in which randomized bases are incorporated into
one or more of the primers in a PCR, allowing the generation of diversity at specific sites in an amplified gene.
Other techniques that result in equally probable daughter
variants (at the DNA level) include MAX randomization
(9) and versions of DNA shuffling that utilize designed
oligonucleotides (10–12). GLUE is also a useful estimator
of the diversity in libraries generated by incremental truncation strategies, such as Expression of Soluble Proteins
by Random Incremental Truncation (ESPRIT) (13), in
which variants are close to being equally probable (14).
We now introduce GLUE-Including Translation
(GLUE-IT), which outputs the expected amino acid
level diversity in any site-saturation mutagenesis library
with up to six variable codons. The user specifies the fully
or partly randomized scheme used for each of the variable
codons, and the size of the library that they have constructed (or, more often, the number of clones that they
plan to screen).
We provide two tools (CodonCalculator and
AA-Calculator) to assist in choosing an appropriate
randomization scheme for library construction. On specifying a fully or partly randomized codon, XYZ, CodonCalculator will output the possible amino acid variants and
the number of times that each is encoded. AA-Calculator
performs the opposite function: the user can specify a
desired set of amino acids, and AA-Calculator will find the
degenerate codon(s) that are optimal for encoding them.
Up to 50 degenerate codons are listed, ranked according
to the fraction of the XYZ-specified codons that code
for the desired amino acids. AA-Calculator therefore offers
a user-friendly alternative to downloading and executing
the LibDesign algorithm (15), and provides users with a
replacement for the Combinatorial Codons programme
(16), which (as far as we are aware) is no longer
available online.
On entering the randomization scheme and library size,
GLUE-IT will output a summar (...truncated)