GLIDA: GPCR—ligand database for chemical genomics drug discovery—database and tools update
Yasushi Okuno
1
Akiko Tamon
0
Hiroaki Yabuuchi
1
Satoshi Niijima
1
Yohsuke Minowa
1
Koichiro Tonomura
1
Ryo Kunimoto
1
Chunlai Feng
1
0
Bio Science Group
, IT Solution Div.1,
Industry Solution Business Unit
, Mitsui Knowledge Industry, Osaka city,
Japan
1
Department of PharmacoInformatics, Center for Integrative Education of Pharmacy Frontier, Graduate School of Pharmaceutical Sciences, Kyoto University
G-protein coupled receptors (GPCRs) represent one of the most important families of drug targets in pharmaceutical development. GLIDA is a public GPCR-related Chemical Genomics database that is primarily focused on the integration of information between GPCRs and their ligands. It provides interaction data between GPCRs and their ligands, along with chemical information on the ligands, as well as biological information regarding GPCRs. These data are connected with each other in a relational database, allowing users in the field of Chemical Genomics research to easily retrieve such information from either biological or chemical starting points. GLIDA includes a variety of similarity search functions for the GPCRs and for their ligands. Thus, GLIDA can provide correlation maps linking the searched homologous GPCRs (or ligands) with their ligands (or GPCRs). By analyzing the correlation patterns between GPCRs and ligands, we can gain more detailed knowledge about their conserved molecular recognition patterns and improve drug design efforts by focusing on inferred candidates for GPCR-specific drugs. This article provides a summary of the GLIDA database and user facilities, and describes recent improvements to database design, data contents, ligand classification programs, similarity search options and graphical interfaces. GLIDA is publicly available at http://pharminfo.pharm.kyoto-u.ac.jp/ services/glida/. We hope that it will prove very useful for Chemical Genomics research and GPCRrelated drug discovery.
-
The family of G-protein coupled receptors (GPCRs)
represents one of the most important classes of
pharmaceutical targets (1). Among the more than 1000 GPCRs
encoded in the human genome, more than 400 are of
potential therapeutic interest (2). Currently the drugs
available on the market address only 30 GPCRs, which
represent a small fraction of the GPCR target family.
A large majority of human-derived GPCRs still remain
promising drug targets, and thus a key goal of GPCR
research related to drug design is to identify new ligands
for such target GPCRs.
With the unprecedented accumulation of genomic
information, databases and bioinformatics have become
essential tools to guide GPCR research (3). The GPCRDB
(2) and IUPHAR receptor database (IUPHAR-RD)
(4) are representatives of widely used public databases
covering GPCRs. These databases, which provide
substantial data on the GPCR proteins and pharmacological
information on receptor proteins containing GPCRs, are
mainly focused on biological aspects of the GPCR gene
products or proteins. In spite of the significance of ligand
compounds as drug leads, the relationships between
GPCRs and their ligands and/or chemical information
on the ligands themselves are not yet fully covered.
On the other hand, there is increasing interest in
publicly collecting and applying chemical as well as
biological information in the post-genome era (58).
This new trend is called Chemical Genomics, and it
aims to identify all possible chemical ligands and drugs for
all targets families (9,10). There is a vast amount of
information on the interactions between small molecules
and proteins/genes. However, compoundprotein
interactions have not yet been analyzed on a large scale, and
there are no effective methods to extract meaningful
information from the data in a comprehensive manner.
Therefore, we need to integrate chemoinformatics and
bioinformatics into a common computational platform
for mining of Chemical Genomics data (11).
GLIDA (GPCR-Ligand DAtabase) is a public
GPCRrelated Chemical Genomics database designed to
simultaneously mine biological information on GPCRs and
chemical information on their ligands. It provides various
analytical data regarding GPCRligand correlations by
incorporating bioinformatics and chemoinformatics
techniques, and thus it should prove very useful for
GPCRrelated drug discovery from the viewpoint of Chemical
Genomics research. There have been several major
improvements to GLIDA since it was last described in
Ref. (12): (i) there are more increments in the entries of the
ligands and the corresponding ligandGPCR pairs; (ii) the
ligands are originally classified using a new strategy;
(iii) additional options are available within the similarity
search program for the GPCRs and ligands and (iv) the
graphical interface to display the correlation maps
between GPCRs and ligands has been enhanced.
GLIDA contains three types of primary data: biological
information on GPCRs, chemical information on their
ligands and information on binding of the GPCRligand
pairs. The GPCR entries were acquired from human,
mouse and rat entries deposited in the GPCRDB because
these three species include sufficient information regarding
ligands, and rats and mice are representative model
animals used in drug discovery research. The
ligandbinding information was manually collected and curated
using various public web sites and commercial databases
such as the IUPHAR-RD, PubMed (5), PubChem (5),
DrugBank (13), Ki Database (14) and MDL ISIS/Base
2.5. Table 1 indicates the size and scope of the GLIDA
database. In particular, we have dramatically expanded
the entry number of ligands and the corresponding
ligandGPCR pairs. The latest GLIDA version includes
24 077 ligand entries and 39 140 GPCRligand pair
entries, representing nearly 35-fold and 20-fold increases,
respectively, since the last publication of GLIDA in 2006.
The total number of GPCR entries remains unchanged,
but entries with associated ligand information have
increased slightly, suggesting that it is difficult to
de-orphan the GPCRs whose ligands have not yet been
identified (15).
GPCR and ligand data
The database lists general information on GPCR and
ligand data, respectively. The general information table
listing GPCRs contains gene names, family names, protein
sequences (in fasta format) and links to other biological
databases, such as GPCRDB, UniProt (16),
IUPHARRD, Entrez Gene (17) and KEGG (18). The ligand result
page provides a general information table containing
names, molecular structures, CAS registry numbers,
formulas, molecular weights, structure files and links to
aMolecular structures consist of MDL MOL files and original files
converted into KEGG atom types. The numbers of MDL MOL files
and KEGG-type files are 23 216 and 23 214, respectively. PCA
calculation was performed for 23 214 KEGG-type files.
bThis cluster number (300) is different from the number of the
selected principal components (314). No compounds were assigned to
14 principal components.
Pub (...truncated)