TMPDB: a database of experimentally-characterized transmembrane topologies
406–409
Nucleic Acids Research, 2003, Vol. 31, No. 1
DOI: 10.1093/nar/gkg018
# 2003 Oxford University Press
TMPDB: a database of experimentally-characterized
transmembrane topologies
Masami Ikeda1,2, Masafumi Arai1, Toshikatsu Okuno2 and Toshio Shimizu1,*
1
Department of Electronic Information System Engineering, Faculty of Science and Technology,
Hirosaki University, Hirosaki 036-8561, Japan and 2Science of Bioresources Program,
The United Graduate School of Agricultural Sciences, Iwate University, Morioka 020-8550, Japan
Received August 1, 2002; Revised September 9, 2002; Accepted September 20, 2002
ABSTRACT
INTRODUCTION
Transmembrane (TM) proteins serve extremely important
functions in life as pump, channel, receptor, energy transducer
etc., and have been reported recently to share 20–30% of
genes in a whole genome (1–4). Nevertheless, the number
of three-dimensional (3D) structures with high-resolution is far
below one hundred at present, in contrast to more than 18 000
3D structures for soluble proteins registered in PDB (5).
It is because TM protein molecules are difficult to crystallize
due to their amphiphilic characteristics—hydrophobic TM
segments (TMSs) and hydrophilic loops. The functions of
TM proteins, however, can be inferred rather easily from their
TM topology (i.e., the number of TMSs, TMS position and
orientation of TMS to the membrane lipid bilayer) without
knowing their 3D structures because of rather simple structural
characteristics (6).
In this context, a number of TM topology prediction methods
have been developed to determine the structure and function of
CONSTRUCTION OF TMPDB
We have collected 1074 articles reporting TM topology, by
using MEDLINE (27) search with the keywords, ‘transmembrane’ and ‘topology’ (895 articles), by searching directly
without using MEDLINE (46 articles), and by referring to the
reference position line (RP) of the entries with the following
annotations: ‘X-RAY CRYSTALLOGRAPHY’, ‘STRUCTURE BY NEUTRON DIFFRACTION’, ‘STRUCTURE BY
ELECTRON CRYO-MICROSCOPY’, ‘STRUCTURE BY
NMR’ or ‘TOPOLOGY’ in SWISS-PROT and TrEMBL (28)
(133 articles). By checking the content of each collected
article, we extracted the experimentally-characterized 302 TM
topology models. To obtain the complete sequence annotation
that the articles often lack, we crosschecked the sequences in
question to public databases such as DDBJ (29), SWISSPROT, PIR (30) and PDB (5), using the protein name or the
partial sequence as a clue. By combining the information
*To whom correspondence should be addressed. Tel: þ81 172393638; Fax: þ81 172393638; Email:
TMPDB is a database of experimentally-characterized
transmembrane (TM) topologies. TMPDB release 6.2
contains a total of 302 TM protein sequences, in
which 276 are a-helical sequences, 17 b-stranded,
and 9 a-helical sequences with short pore-forming
helices buried in the membrane. The TM topologies in
TMPDB were determined experimentally by means of
X-ray crystallography, NMR, gene fusion technique,
substituted cysteine accessibility method, N-linked
glycosylation experiment and other biochemical
methods. TMPDB would be useful as a test and/or
training dataset in improving the proposed TM
topology prediction methods or developing novel
methods with higher performance, and as a guide for
both the bioinformaticians and biologists to better
understand TM proteins. TMPDB and its subsets are
freely available at the following web site: http://
bioinfo.si.hirosaki-u.ac.jp/TMPDB/.
TM proteins from their amino acid sequences (2,7–22).
However, the proposed prediction methods have not attained
the desired accuracies for this purpose. The recent reports of
evaluating prediction performance by using experimentallycharacterized TM topology datasets have revealed that even the
best methods predict the TM topology with accuracies of only
around 60% (23–25). This could be attributed mainly to the
lack of well-characterized topology data to be used for training
or tuning TM topology prediction methods. Thus, more highquality TM topology data are required to evaluate the existing
prediction methods more precisely.
For this reason, we have constructed a transmembrane
protein database, TMPDB (19,24,26) which is a collection of
TM proteins with topologies based on definite experimental
evidence such as X-ray crystallography, NMR, gene fusion
technique, substituted cysteine accessibility method, Asp (N)linked glycosylation experiment and other biochemical
methods. TMPDB would serve the requirements of both
bioinformaticians and biologists, as a test and/or training
dataset, for improving the existing TM topology prediction
methods and developing novel prediction methods with higher
performance as well as for gaining better understanding of TM
proteins.
Nucleic Acids Research, 2003, Vol. 31, No. 1
407
Table 1. Distributions of the number of transmembrane segments in TMPDB_alpha (276 sequences comprising of 165 prokaryotic and 111 eukaryotic),
TMPDB_alpha_non-redundant (231 sequences comprising of 138 prokaryotic and 93 eukaryotic), TMPDB_alpha-buried (9 sequences comprising of 6 prokaryotic
and 3 eukaryotic), TMPDB_alpha-buried_non-redundant (7 sequences comprising of 4 prokaryotic and 3 eukaryotic), TMPDB_beta (17 prokaryotic sequences)
and TMPDB_beta_non-redundant (15 prokaryotic sequences) datasets
Dataset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
22
24
41
45
86
18
11
29
8
2
10
12
10
22
8
2
10
15
7
22
7
8
15
11
7
18
4
1
5
7
8
15
5
0
5
24
6
30
2
0
2
2
1
3
1
0
1
0
0
0
0
2
2
0
0
0
0
0
0
0
1
1
33
39
72
17
10
27
8
2
10
12
6
18
6
2
8
14
7
21
6
6
12
9
7
16
4
1
5
6
4
10
2
0
2
17
6
23
2
0
2
2
0
2
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
1
1
1
0
1
1
1
2
0
0
0
0
0
0
0
0
0
2
1
3
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
1
2
0
0
0
0
0
0
0
0
0
1
1
2
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
2
0
2
0
1
0
0
0
5
0
3
2
0
0
1
0
1
0
0
0
2
0
1
0
1
0
0
0
4
0
3
2
0
contained in the articles and other information of the crossreferenced public databases, we constructed TMPDB in the
SWISS-PROT format.
There are 21 cases in total in TMPDB in which two or more
articles report topology models for a single sequence, which
are almost the same as each other with only a small TMSposition difference (at most 5 amino acids). For these cases, we
selected the topology model based on the highest-quality
experiment among the reported ones.
TMPDB CURRENT HOLDINGS
The latest release of TMPDB contains 302 TM protein
sequences: 276 a-helical sequences (TMPDB_alpha dataset),
17 b-stranded sequences (TMPDB_beta dataset) and 9
a-helical sequences with short pore-forming a-helices buried
in the membrane (e.g., aquaporin 1) (TMPDB_alpha-buried
dataset). The dataset of TMPDB_alpha comprises 165
prokaryotic and 111 eukaryoti (...truncated)