JenPep: a database of quantitative functional peptide data for immunology
Martin J. Blythe
0
Irini A. Doytchinova
0
Darren R. Flower
0
0
Edward Jenner Institute for Vaccine Research
,
Compton, Berkshire RG0 7NN
,
UK
Motivation: The compilation of quantitative binding data underlies attempts to derive tools for the accurate prediction of epitopes in cellular immunology and is part of our concerted goal to develop practical computational vaccinology. Results: JenPep is a family of relational databases supporting the growing community of immunoinformaticians. It contains quantitative data on peptide binding to Major Histocompatibility Complexes (MHCs) and to Transmembrane Peptide Transporter (TAP), as well as an annotated list of T-cell epitopes. Availability: The database is available via the Internet. An HTML interface allowing searching of the database can be found at the following address: http://www.jenner.ac.uk/ JenPep. Contact:
INTRODUCTION
As the field of Bioinformatics has grown and matured
into a new branch of science, new sub-disciplines have
emerged within it. Immunoinformatics, the application of
informatics and modelling techniques to molecules of the
immune system, is one of the most exciting of these newly
emergent sub-disciplines. One of the principal goals of
immunoinformatics is to develop computer aided vaccine
design, or computational vaccinology, and apply it to the
quest for new vaccines. At the heart of computational
vaccinology is the problem of epitope prediction. The
focus of our present work is the development of a new
database system in cellular, or T-cell, immunology.
A specialized type of immune cell mediates cellular
immunity: the T-cell. These cells constantly patrol the body
hunting for foreign proteins originating from pathogenic
organisms such as viruses or bacteria. T-cells express a
particular kind of receptor: the T-Cell Receptor (TCR),
which exhibits a wide range of selectivities and affinities.
TCRs bind to Major Histocompatibility Complex (MHC)
proteins presented on the surfaces of other cells. These
proteins bind small peptide fragments, or epitopes, derived
To whom correspondence should be addressed.
from both host and pathogen proteins. It is recognition of
such complexes that lies at the heart of both the adaptive,
and memory, cellular immune response.
The overall process leading to the cell-surface
presentation of epitopes, derived from antigenic protein, is
complex and not yet fully understood. There are two main
antigen presentation pathways: classes I and II. Class I MHCs
are expressed by most nucleated cells, albeit with some
exceptions. T-cells, whose surfaces are rich in CD8
coreceptor protein, recognize class I MHCs. Class II MHCs
are only expressed on so-called professional antigen
presenting cells and are recognized by T-cells whose
surfaces are rich in CD4 co-receptors. Class I peptides are
typically, but not exclusively, derived from intracellular
proteins, such as viruses. These proteins are targeted to
the proteasome, which cleaves them into short peptides
of 811 amino acids in length. These peptides are bound
by the Transmembrane Peptide Transporter (TAP), which
translocates them from the cell cytoplasm into the
Endoplasmic Reticulum (ER), where they are in turn bound by
MHC protein. For class II, receptor mediated ingestion of
extracellular protein derived from a pathogen is targeted to
an endosomal compartment where the proteins are cleaved
by cathepsins, to produce peptides of 1520 amino acids.
Class II MHCs then bind these peptides. Peptide bound
MHCs are presented on the surface of the cell where they
are recognized, as T-cell epitopes, by T-cells. MHC
proteins are polymorphic, each exhibiting slightly different
peptide selectivities. The combination of MHC and TCR
selectivities determines the power and scope of peptide
recognition in the immune system and thus the
recognition of foreign and self-antigenic peptides.
Experimental work has established that only peptides
that bind with high affinity to MHC molecules are
recognized as T-cell epitopes by TCRs (Sette et al.,
1994a,b). Weaker or non-binding peptides are simply
not recognized. Expressed in terms of a competition
assay, the IC50 must be less than 500 nM. IC50 values
are binding affinities measured using a
radioisotopelabeled reference peptide. Prediction of MHC binding is
thus a pre-requisite to the prediction of T-cell epitopes.
Most attempts to predict binding peptides have attempted
to simplify the task by using a classification scheme,
dividing peptides into non-binders, low affinity binders,
medium affinity binders, or high affinity binders. Again,
in terms of IC50 values: non-binders show no affinity, low
binders > 500 nm, 500 nm > medium binders > 50 nm,
and high binders < 50 nm. However, more recent work
has turned to the development of fully quantitative models
(Rognan et al., 1999; Doytchiniva and Flower, 2001,
2002). To achieve this we must have access to a database
of allele-specific quantitative binding data. It is only from
data of this type that we can build statistically accurate
models for the prediction of binding. To accurately model
the process we need to focus on well characterized data
for the binding of peptides to TAP and to MHCs, and their
subsequent functioning as T-cell epitopes. Certain groups
have access to some of these data, but currently there is no
publicly available database or compilation. As part of our
attempts to develop computational vaccinology, we have
set about constructing such a database, which we have
called JenPep. The following paper describes version 1.0
of this database.
SYSTEMS AND METHODS
Database size and structure
Version 1.0 of JenPep is composed of three component
sub-databases: a compilation of quantitative measures of
binding for peptides to classes I and II MHCs; a
compendium of dominant and subdominant T-cell epitopes,
and a similar set of quantitative data for peptide binding
to TAP peptide transporter. This compilation was derived
through exhaustive, semi-manual searching of the primary
literature. We have used extensive searching of available
literature databases, using keyword and author searches,
retrospective searching, citation matching of key authors
(particularly those describing the development of an assay
system), to identify new papers detailing experimental
quantitative measured values.
The database is organized on the basis of peptides, which
are defined by their sequence and length. A schematic of
the database structure is included in Figure 1.
Peptide origin. Information on the origin of the peptide
is taken from the reference paper and, failing that, from
results obtained using BLAST (Altschul et al., 1997). A
hypertext link is made to the corresponding SWISS-PROT
entry. The reference sequence is taken from that most
closely matching the peptide as published.
Restriction allele. Information on the MHC restriction
allele is given for all entries except those in the TAP
database. MHC nomenclature has been standardized to
the best of our a (...truncated)