Journal of Cheminformatics

http://link.springer.com/journal/13321

List of Papers (Total 818)

Efficiency of different measures for defining the applicability domain of classification models

The goal of defining an applicability domain for a predictive classification model is to identify the region in chemical space where the model’s predictions are reliable. The boundary of the applicability domain is defined with the help of a measure that shall reflect the reliability of an individual prediction. Here, the available measures are differentiated into those that flag ...

Data driven polypharmacological drug design for lung cancer: analyses for targeting ALK, MET, and EGFR

Drug design of protein kinase inhibitors is now greatly enabled by thousands of publicly available X-ray structures, extensive ligand binding data, and optimized scaffolds coming off patent. The extensive data begin to enable design against a spectrum of targets (polypharmacology); however, the data also reveal heterogeneities of structure, subtleties of chemical interactions, and ...

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

Background In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large ...

Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D

The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at ...

Efficient conformational sampling and weak scoring in docking programs? Strategy of the wisdom of crowds

Background In drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein–ligand interactions. In the absence of experimental information, docking programs are necessary for ligand positioning, and the choice of a reliable program is essential for the success of such an optimization. The performances of four popular docking ...

chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery

Background Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and ...

QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations

Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, ...

An algorithm to identify functional groups in organic molecules

Background The concept of functional groups forms a basis of organic chemistry, medicinal chemistry, toxicity assessment, spectroscopy and also chemical nomenclature. All current software systems to identify functional groups are based on a predefined list of substructures. We are not aware of any program that can identify all functional groups in a molecule automatically. The ...

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor ...

RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm ...

Electronic lab notebooks: can they replace paper?

Despite the increasingly digital nature of society there are some areas of research that remain firmly rooted in the past; in this case the laboratory notebook, the last remaining paper component of an experiment. Countless electronic laboratory notebooks (ELNs) have been created in an attempt to digitise record keeping processes in the lab, but none of them have become a ‘key ...

Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy

In mass spectrometry-based untargeted metabolomics, rarely more than 30% of the compounds are identified. Without the true identity of these molecules it is impossible to draw conclusions about the biological mechanisms, pathway relationships and provenance of compounds. The only way at present to address this discrepancy is to use in silico fragmentation software to identify ...

CPANNatNIC software for counter-propagation neural network to assist in read-across

Background CPANNatNIC is software for development of counter-propagation artificial neural network models. Besides the interface for training of a new neural network it also provides an interface for visualisation of the results which was developed to aid in interpretation of the results and to use the program as a tool for read-across. Results The work presents the details of the ...

Searching for bioactive conformations of drug-like ligands with current force fields: how good are we?

Drug-like ligands obtained from protein–ligand complexes deposited in the Protein Databank were subjected to conformational searching using various force fields and solvation settings. For each ligand, the resulting conformer pool was examined for the presence of the bioactive (crystal pose-like) conformation. Similarity of conformers toward the crystal-pose was quantified as the ...

Scaffold Hunter: a comprehensive visual analytics framework for drug discovery

The era of big data is influencing the way how rational drug discovery and the development of bioactive molecules is performed and versatile tools are needed to assist in molecular design workflows. Scaffold Hunter is a flexible visual analytics framework for the analysis of chemical compound data and combines techniques from several fields such as data mining and information ...

ChemSAR: an online pipelining platform for molecular SAR modeling

Background In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, ...

Comparative analyses of structural features and scaffold diversity for purchasable compound libraries

Large purchasable screening libraries of small molecules afforded by commercial vendors are indispensable sources for virtual screening (VS). Selecting an optimal screening library for a specific VS campaign is quite important to improve the success rates and avoid wasting resources in later experimental phases. Analysis of the structural features and molecular diversity for ...

Assessment of the significance of patent-derived information for the early identification of compound–target interaction hypotheses

Background Patents are an important source of information for effective decision making in drug discovery. Encouragingly, freely accessible patent-chemistry databases are now in the public domain. However, at present there is still a wide gap between relatively low coverage-high quality manually-curated data sources and high coverage data sources that use text mining and automated ...

SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines

Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug–target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug–target pairs are not ...

A possible extension to the RInChI as a means of providing machine readable process data

The algorithmic, large-scale use and analysis of reaction databases such as Reaxys is currently hindered by the absence of widely adopted standards for publishing reaction data in machine readable formats. Crucial data such as yields of all products or stoichiometry are frequently not explicitly stated in the published papers and, hence, not reported in the database entry for those ...

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix

Background The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. Results In ...

Critical Assessment of Small Molecule Identification 2016: automated methods

Background The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest (www.​casmi-contest.​org) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to ...

Nonpher: computational method for design of hard-to-synthesize structures

In cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a ...