SuperTarget and Matador: resources for exploring drug-target relationships
0
Institute for Laboratory Medicine
, Windscheidstr, 18,
10627 Berlin, Germany
1
EMBLBiocomputing, Meyerhofstrae 1,
69117 Heidelberg
2
Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charite University Medicine Berlin
, Arnimallee 22,
14195 Berlin
3
Max-Delbru ck-Center for MolecularMedicine (MDC)
, 13092 Berlin-Buch,
Germany
4
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego
, 9500 Gilman Drive, La Jolla CA 92093,
USA
The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de
-
Within the past two decades our knowledge about
drugs, their mechanisms of action and target proteins
has increased rapidly. Nevertheless, knowledge on their
molecular effects is far from complete. For some drugs
even the primary targets are still unknown, for example,
Diloxanide, Niclosamide and Ambroxol are administered
successfully although their effect on human metabolism is
still not clarified at a molecular level (1). Even if the
medical effect has been explained by a certain molecular
interaction, most drugs interact with several additional
targets, which may either strengthen the therapeutic
effect or cause unwanted adverse drug effects (2).
Moreover, our knowledge on drugs and their targets is
highly fragmented, most of it residing in millions of
medical articles and textbooks, which precludes systematic
studies.
Several databases exist that collect binding data
on small molecules, in particular drugs and proteins.
The largest such resource is DrugBank (3), which contains
2600 drug-target relations for 900 FDA-approved drugs
and additional annotations for 3200 experimental drugs.
Another notable database is the Therapeutic Target
Database (TTD) (4), which holds target information on
about 1000 small molecule drugs. Unfortunately,
DrugBank only provides references on the target,
although generally not on the interactions, which makes
it difficult to obtain information on the experimental
context under which an interaction was observed.
Moreover, the drugs in the TTD are not cross-linked
with compound databases such as PubChem, ChemDB
or the commercial CAS Registry, and the targets are
not linked to protein databases such as UniProt or PDB.
This makes it difficult to retrieve information such as
the chemical structure of the drug, its physiochemical
properties, the sequence or 3D structure of its target or
the biological pathways that it affects.
In order to be able to derive further information
about drug-target relations, we have developed a
onestop data warehouse, SuperTarget that provides this
functionality and integrates drug-target relations from
different resources using heterogeneous retrieval methods.
We consider a drug-target relation as a specific interaction
of a small chemical compound administered to treat or
diagnose a disease and a macromolecule, namely protein,
DNA or RNA. The first release of SuperTarget contains
a core dataset of about 7300 drug-target relations of which
4900 interactions have been subjected to a more extensive
manual annotation effort to incorporate additional
binding information as well as indirect interactions.
The resulting data on 775 drugs is provided separately
as Matador (Manually Annotated Targets And Drugs
Online Resource).
DRUG-TARGET RELATIONSHIPS
Drug-target relationships described in SuperTarget
were obtained in three different ways. Starting with 2400
drugs and their synonyms from the SuperDrug Database
(5), the text mining tool EbiMed (6) was used to extract
relevant text passages containing potential drug-target
relations from about 15 millions public abstracts listed
in PubMed. Many thousands of false positive or irrelevant
relations were eliminated by manual curation.
In parallel, potential drug-target relations were
automatically extracted from Medline by searching for
synonyms of drugs, proteins and Medical Subject Headings
(MeSH terms) describing groups of proteins (7). MeSH
terms were used to capture and down-weight interactions
that are not explicitly described in the abstracts e.g. for
protein families or protein complexes. In the case of
families, the specific interacting family member might not
be known yet, whereas in the case of complexes, the drug
might interact with more than one subunit. Proteins
associated to MeSH terms were assigned by a
semiautomated procedure relying on mappings provided by
MeSH and synonyms of proteins that are aggregated
in the STRING resource (8). Proteins that were often
mentioned in abstracts, but could not be automatically
assigned to families, were manually assigned. Depending
on the size and nature of the families, the confidence of
an interaction between drugs and individual proteins
was decreased. More heterogeneous families are assigned
a lower confidence. The most probable candidates
were identified using a benchmarking scheme (8) and
manually curated.
In a last step, relations from other databases, namely
DrugBank (3), KEGG (9), PDB (10), SuperLigands
(11) and TTD (4), were checked for drug-target
interactions not identified with the preceding steps.
If those interactions could be confirmed by literature
listed in PubMed, the references were included in
SuperTarget otherwise the describing database is
referenced.
In consideration of the large number of entries we
cannot rule out that some of the data is erroneous,
change over time or is too unspecific. In the case of doubt
we refer to the referenced relation source.
To be able to obtain more information on the
drug-target relations, SuperTarget provides links to
physicochemical properties and further structural information
of drugs. Proven or potential target proteins are
represented by sequences as stored in UniProt (12), by
functional ann (...truncated)