TMPDB: a database of experimentally-characterized transmembrane topologies

Nucleic Acids Research, Jan 2003

TMPDB is a database of experimentally-characterized transmembrane (TM) topologies. TMPDB release 6.2 contains a total of 302 TM protein sequences, in which 276 are α-helical sequences, 17 β-stranded, and 9 α-helical sequences with short pore-forming helices buried in the membrane. The TM topologies in TMPDB were determined experimentally by means of X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, N-linked glycosylation experiment and other biochemical methods. TMPDB would be useful as a test and/or training dataset in improving the proposed TM topology prediction methods or developing novel methods with higher performance, and as a guide for both the bioinformaticians and biologists to better understand TM proteins. TMPDB and its subsets are freely available at the following web site: http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/31/1/406.full.pdf

TMPDB: a database of experimentally-characterized transmembrane topologies

406–409 Nucleic Acids Research, 2003, Vol. 31, No. 1 DOI: 10.1093/nar/gkg018 # 2003 Oxford University Press TMPDB: a database of experimentally-characterized transmembrane topologies Masami Ikeda1,2, Masafumi Arai1, Toshikatsu Okuno2 and Toshio Shimizu1,* 1 Department of Electronic Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan and 2Science of Bioresources Program, The United Graduate School of Agricultural Sciences, Iwate University, Morioka 020-8550, Japan Received August 1, 2002; Revised September 9, 2002; Accepted September 20, 2002 ABSTRACT INTRODUCTION Transmembrane (TM) proteins serve extremely important functions in life as pump, channel, receptor, energy transducer etc., and have been reported recently to share 20–30% of genes in a whole genome (1–4). Nevertheless, the number of three-dimensional (3D) structures with high-resolution is far below one hundred at present, in contrast to more than 18 000 3D structures for soluble proteins registered in PDB (5). It is because TM protein molecules are difficult to crystallize due to their amphiphilic characteristics—hydrophobic TM segments (TMSs) and hydrophilic loops. The functions of TM proteins, however, can be inferred rather easily from their TM topology (i.e., the number of TMSs, TMS position and orientation of TMS to the membrane lipid bilayer) without knowing their 3D structures because of rather simple structural characteristics (6). In this context, a number of TM topology prediction methods have been developed to determine the structure and function of CONSTRUCTION OF TMPDB We have collected 1074 articles reporting TM topology, by using MEDLINE (27) search with the keywords, ‘transmembrane’ and ‘topology’ (895 articles), by searching directly without using MEDLINE (46 articles), and by referring to the reference position line (RP) of the entries with the following annotations: ‘X-RAY CRYSTALLOGRAPHY’, ‘STRUCTURE BY NEUTRON DIFFRACTION’, ‘STRUCTURE BY ELECTRON CRYO-MICROSCOPY’, ‘STRUCTURE BY NMR’ or ‘TOPOLOGY’ in SWISS-PROT and TrEMBL (28) (133 articles). By checking the content of each collected article, we extracted the experimentally-characterized 302 TM topology models. To obtain the complete sequence annotation that the articles often lack, we crosschecked the sequences in question to public databases such as DDBJ (29), SWISSPROT, PIR (30) and PDB (5), using the protein name or the partial sequence as a clue. By combining the information *To whom correspondence should be addressed. Tel: þ81 172393638; Fax: þ81 172393638; Email: TMPDB is a database of experimentally-characterized transmembrane (TM) topologies. TMPDB release 6.2 contains a total of 302 TM protein sequences, in which 276 are a-helical sequences, 17 b-stranded, and 9 a-helical sequences with short pore-forming helices buried in the membrane. The TM topologies in TMPDB were determined experimentally by means of X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, N-linked glycosylation experiment and other biochemical methods. TMPDB would be useful as a test and/or training dataset in improving the proposed TM topology prediction methods or developing novel methods with higher performance, and as a guide for both the bioinformaticians and biologists to better understand TM proteins. TMPDB and its subsets are freely available at the following web site: http:// bioinfo.si.hirosaki-u.ac.jp/TMPDB/. TM proteins from their amino acid sequences (2,7–22). However, the proposed prediction methods have not attained the desired accuracies for this purpose. The recent reports of evaluating prediction performance by using experimentallycharacterized TM topology datasets have revealed that even the best methods predict the TM topology with accuracies of only around 60% (23–25). This could be attributed mainly to the lack of well-characterized topology data to be used for training or tuning TM topology prediction methods. Thus, more highquality TM topology data are required to evaluate the existing prediction methods more precisely. For this reason, we have constructed a transmembrane protein database, TMPDB (19,24,26) which is a collection of TM proteins with topologies based on definite experimental evidence such as X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, Asp (N)linked glycosylation experiment and other biochemical methods. TMPDB would serve the requirements of both bioinformaticians and biologists, as a test and/or training dataset, for improving the existing TM topology prediction methods and developing novel prediction methods with higher performance as well as for gaining better understanding of TM proteins. Nucleic Acids Research, 2003, Vol. 31, No. 1 407 Table 1. Distributions of the number of transmembrane segments in TMPDB_alpha (276 sequences comprising of 165 prokaryotic and 111 eukaryotic), TMPDB_alpha_non-redundant (231 sequences comprising of 138 prokaryotic and 93 eukaryotic), TMPDB_alpha-buried (9 sequences comprising of 6 prokaryotic and 3 eukaryotic), TMPDB_alpha-buried_non-redundant (7 sequences comprising of 4 prokaryotic and 3 eukaryotic), TMPDB_beta (17 prokaryotic sequences) and TMPDB_beta_non-redundant (15 prokaryotic sequences) datasets Dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 22 24 41 45 86 18 11 29 8 2 10 12 10 22 8 2 10 15 7 22 7 8 15 11 7 18 4 1 5 7 8 15 5 0 5 24 6 30 2 0 2 2 1 3 1 0 1 0 0 0 0 2 2 0 0 0 0 0 0 0 1 1 33 39 72 17 10 27 8 2 10 12 6 18 6 2 8 14 7 21 6 6 12 9 7 16 4 1 5 6 4 10 2 0 2 17 6 23 2 0 2 2 0 2 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 1 1 1 0 1 1 1 2 0 0 0 0 0 0 0 0 0 2 1 3 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 2 0 0 0 0 0 0 0 0 0 1 1 2 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 0 2 0 1 0 0 0 5 0 3 2 0 0 1 0 1 0 0 0 2 0 1 0 1 0 0 0 4 0 3 2 0 contained in the articles and other information of the crossreferenced public databases, we constructed TMPDB in the SWISS-PROT format. There are 21 cases in total in TMPDB in which two or more articles report topology models for a single sequence, which are almost the same as each other with only a small TMSposition difference (at most 5 amino acids). For these cases, we selected the topology model based on the highest-quality experiment among the reported ones. TMPDB CURRENT HOLDINGS The latest release of TMPDB contains 302 TM protein sequences: 276 a-helical sequences (TMPDB_alpha dataset), 17 b-stranded sequences (TMPDB_beta dataset) and 9 a-helical sequences with short pore-forming a-helices buried in the membrane (e.g., aquaporin 1) (TMPDB_alpha-buried dataset). The dataset of TMPDB_alpha comprises 165 prokaryotic and 111 eukaryoti (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/31/1/406.full.pdf
Article home page: http://nar.oxfordjournals.org/content/31/1/406.abstract

Masami Ikeda, Masafumi Arai, Toshikatsu Okuno, Toshio Shimizu. TMPDB: a database of experimentally-characterized transmembrane topologies, Nucleic Acids Research, 2003, pp. 406-409, 31/1, DOI: 10.1093/nar/gkg020