SuperTarget and Matador: resources for exploring drug-target relationships (pdf)

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/36/suppl_1/D919.full.pdf

SuperTarget and Matador: resources for exploring drug-target relationships

0 Institute for Laboratory Medicine , Windscheidstr, 18, 10627 Berlin, Germany 1 EMBLBiocomputing, Meyerhofstrae 1, 69117 Heidelberg 2 Structural Bioinformatics Group, Institute of Molecular Biology and Bioinformatics, Charite University Medicine Berlin , Arnimallee 22, 14195 Berlin 3 Max-Delbru ck-Center for MolecularMedicine (MDC) , 13092 Berlin-Buch, Germany 4 Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , 9500 Gilman Drive, La Jolla CA 92093, USA The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de - Within the past two decades our knowledge about drugs, their mechanisms of action and target proteins has increased rapidly. Nevertheless, knowledge on their molecular effects is far from complete. For some drugs even the primary targets are still unknown, for example, Diloxanide, Niclosamide and Ambroxol are administered successfully although their effect on human metabolism is still not clarified at a molecular level (1). Even if the medical effect has been explained by a certain molecular interaction, most drugs interact with several additional targets, which may either strengthen the therapeutic effect or cause unwanted adverse drug effects (2). Moreover, our knowledge on drugs and their targets is highly fragmented, most of it residing in millions of medical articles and textbooks, which precludes systematic studies. Several databases exist that collect binding data on small molecules, in particular drugs and proteins. The largest such resource is DrugBank (3), which contains 2600 drug-target relations for 900 FDA-approved drugs and additional annotations for 3200 experimental drugs. Another notable database is the Therapeutic Target Database (TTD) (4), which holds target information on about 1000 small molecule drugs. Unfortunately, DrugBank only provides references on the target, although generally not on the interactions, which makes it difficult to obtain information on the experimental context under which an interaction was observed. Moreover, the drugs in the TTD are not cross-linked with compound databases such as PubChem, ChemDB or the commercial CAS Registry, and the targets are not linked to protein databases such as UniProt or PDB. This makes it difficult to retrieve information such as the chemical structure of the drug, its physiochemical properties, the sequence or 3D structure of its target or the biological pathways that it affects. In order to be able to derive further information about drug-target relations, we have developed a onestop data warehouse, SuperTarget that provides this functionality and integrates drug-target relations from different resources using heterogeneous retrieval methods. We consider a drug-target relation as a specific interaction of a small chemical compound administered to treat or diagnose a disease and a macromolecule, namely protein, DNA or RNA. The first release of SuperTarget contains a core dataset of about 7300 drug-target relations of which 4900 interactions have been subjected to a more extensive manual annotation effort to incorporate additional binding information as well as indirect interactions. The resulting data on 775 drugs is provided separately as Matador (Manually Annotated Targets And Drugs Online Resource). DRUG-TARGET RELATIONSHIPS Drug-target relationships described in SuperTarget were obtained in three different ways. Starting with 2400 drugs and their synonyms from the SuperDrug Database (5), the text mining tool EbiMed (6) was used to extract relevant text passages containing potential drug-target relations from about 15 millions public abstracts listed in PubMed. Many thousands of false positive or irrelevant relations were eliminated by manual curation. In parallel, potential drug-target relations were automatically extracted from Medline by searching for synonyms of drugs, proteins and Medical Subject Headings (MeSH terms) describing groups of proteins (7). MeSH terms were used to capture and down-weight interactions that are not explicitly described in the abstracts e.g. for protein families or protein complexes. In the case of families, the specific interacting family member might not be known yet, whereas in the case of complexes, the drug might interact with more than one subunit. Proteins associated to MeSH terms were assigned by a semiautomated procedure relying on mappings provided by MeSH and synonyms of proteins that are aggregated in the STRING resource (8). Proteins that were often mentioned in abstracts, but could not be automatically assigned to families, were manually assigned. Depending on the size and nature of the families, the confidence of an interaction between drugs and individual proteins was decreased. More heterogeneous families are assigned a lower confidence. The most probable candidates were identified using a benchmarking scheme (8) and manually curated. In a last step, relations from other databases, namely DrugBank (3), KEGG (9), PDB (10), SuperLigands (11) and TTD (4), were checked for drug-target interactions not identified with the preceding steps. If those interactions could be confirmed by literature listed in PubMed, the references were included in SuperTarget otherwise the describing database is referenced. In consideration of the large number of entries we cannot rule out that some of the data is erroneous, change over time or is too unspecific. In the case of doubt we refer to the referenced relation source. To be able to obtain more information on the drug-target relations, SuperTarget provides links to physicochemical properties and further structural information of drugs. Proven or potential target proteins are represented by sequences as stored in UniProt (12), by functional ann (...truncated)