Journal of Cheminformatics

http://link.springer.com/journal/13321

List of Papers (Total 893)

Dimorphite-DL: an open-source program for enumerating the ionization states of drug-like small molecules

Small-molecule protonation can promote or discourage protein binding by altering hydrogen-bond, electrostatic, and van-der-Waals interactions. To improve virtual-screen pose and affinity predictions, researchers must account for all major small-molecule ionization states. But existing programs for calculating these states have notable limitations such as high cost, restrictive...

rBAN: retro-biosynthetic analysis of nonribosomal peptides

Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary...

Programming languages in chemistry: a review of HTML5/JavaScript

This is one part of a series of reviews concerning the application of programming languages in chemistry, edited by Dr. Rajarshi Guha. This article reviews the JavaScript technology as it applies to the chemistry discipline. A discussion of the history, scope and technical details of the programming language is presented.

Avoiding hERG-liability in drug design via synergetic combinations of different (Q)SAR methodologies and data sources: a case study in an industrial setting

In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources...

OGER++: hybrid multi-type entity recognition

BackgroundWe present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching...

QBMG: quasi-biogenic molecule generator with deep recurrent neural network

Biogenic compounds are important materials for drug discovery and chemical biology. In this work, we report a quasi-biogenic molecule generator (QBMG) to compose virtual quasi-biogenic compound libraries by means of gated recurrent unit recurrent neural networks. The library includes stereo-chemical properties, which are crucial features of natural products. QMBG can reproduce...

Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively...

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools

BackgroundChemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining...

A retrosynthetic analysis algorithm implementation

The need for synthetic route design arises frequently in discovery-oriented chemistry organizations. While traditionally finding solutions to this problem has been the domain of human experts, several computational approaches, aided by the algorithmic advances and the availability of large reaction collections, have recently been reported. Herein we present our own implementation...

Configurable web-services for biomedical document annotation

The need to efficiently find and extract information from the continuously growing biomedical literature has led to the development of various annotation tools aimed at identifying mentions of entities and relations. Many of these tools have been integrated in user-friendly applications facilitating their use by non-expert text miners and database curators. In this paper we...

A probabilistic molecular fingerprint for big data settings

BackgroundAmong the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up to four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥ 1024D) to perform well, resulting in ECFP4...

JPlogP: an improved logP predictor trained using predicted data

The partition coefficient between octanol and water (logP) has been an important descriptor in QSAR predictions for many years and therefore the prediction of logP has been examined countless times. One of the best performing models is to predict the logP using multiple methods and average the result. We have used those averaged predictions to develop a training-set which was...

A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications

The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to...

Chemlistem: chemical named entity recognition using recurrent neural networks

Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional...

MER: a shell script and annotation server for minimal named entity recognition and linking

Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires...

An automated framework for NMR chemical shift calculations of small organic molecules

When using nuclear magnetic resonance (NMR) to assist in chemical identification in complex samples, researchers commonly rely on databases for chemical shift spectra. However, authentic standards are typically depended upon to build libraries experimentally. Considering complex biological samples, such as blood and soil, the entirety of NMR spectra required for all possible...

A new chemoinformatics approach with improved strategies for effective predictions of potential drugs

BackgroundFast and accurate identification of potential drug candidates against therapeutic targets (i.e., drug–target interactions, DTIs) is a fundamental step in the early drug discovery process. However, experimental determination of DTIs is time-consuming and costly, especially for testing the associations between the entire chemical and genomic spaces. Therefore...

Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints

BackgroundInteraction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of...

Exploring non-linear distance metrics in the structure–activity space: QSAR models for human estrogen receptor

BackgroundQuantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals. These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves. To address these challenges...

“MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies

Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product...

The influence of solid state information and descriptor selection on statistical models of temperature dependent aqueous solubility

Predicting the equilibrium solubility of organic, crystalline materials at all relevant temperatures is crucial to the digital design of manufacturing unit operations in the chemical industries. The work reported in our current publication builds upon the limited number of recently published quantitative structure–property relationship studies which modelled the temperature...

Machine learning for the prediction of molecular dipole moments obtained by density functional theory

Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were...

Probing the chemical–biological relationship space with the Drug Target Explorer

Modern phenotypic high-throughput screens (HTS) present several challenges including identifying the target(s) that mediate the effect seen in the screen, characterizing ‘hits’ with a polypharmacologic target profile, and contextualizing screen data within the large space of drugs and screening models. To address these challenges, we developed the Drug–Target Explorer. This tool...