Journal of Cheminformatics

http://link.springer.com/journal/13321

List of Papers (Total 819)

A review of parameters and heuristics for guiding metabolic pathfinding

Recent developments in metabolic engineering have led to the successful biosynthesis of valuable products, such as the precursor of the antimalarial compound, artemisinin, and opioid precursor, thebaine. Synthesizing these traditionally plant-derived compounds in genetically modified yeast cells introduces the possibility of significantly reducing the total time and resources ...

G.A.M.E.: GPU-accelerated mixture elucidator

GPU acceleration is useful in solving complex chemical information problems. Identifying unknown structures from the mass spectra of natural product mixtures has been a desirable yet unresolved issue in metabolomics. However, this elucidation process has been hampered by complex experimental data and the inability of instruments to completely separate different compounds. ...

Beware of ligand efficiency (LE): understanding LE data in modeling structure-activity and structure-economy relationships

Background On the one hand, ligand efficiency (LE) and the binding efficiency index (BEI), which are binding properties (B) averaged versus the heavy atom count (HAC: LE) or molecular weight (MW: BEI), have recently been declared a novel universal tool for drug design. On the other hand, questions have been raised about the mathematical validity of the LE approach. Results In fact, ...

Molecular de-novo design through deep reinforcement learning

This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active ...

Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm

Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these ...

Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information

Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information ...

Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship ...

Efficiency of different measures for defining the applicability domain of classification models

The goal of defining an applicability domain for a predictive classification model is to identify the region in chemical space where the model’s predictions are reliable. The boundary of the applicability domain is defined with the help of a measure that shall reflect the reliability of an individual prediction. Here, the available measures are differentiated into those that flag ...

Data driven polypharmacological drug design for lung cancer: analyses for targeting ALK, MET, and EGFR

Drug design of protein kinase inhibitors is now greatly enabled by thousands of publicly available X-ray structures, extensive ligand binding data, and optimized scaffolds coming off patent. The extensive data begin to enable design against a spectrum of targets (polypharmacology); however, the data also reveal heterogeneities of structure, subtleties of chemical interactions, and ...

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

Background In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large ...

Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D

The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at ...

Efficient conformational sampling and weak scoring in docking programs? Strategy of the wisdom of crowds

Background In drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein–ligand interactions. In the absence of experimental information, docking programs are necessary for ligand positioning, and the choice of a reliable program is essential for the success of such an optimization. The performances of four popular docking ...

chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery

Background Analyzing files containing chemical information is at the core of cheminformatics. Each analysis may require a unique workflow. This paper describes the chemalot and chemalot_knime open source packages. Chemalot is a set of command line programs with a wide range of functionalities for cheminformatics. The chemalot_knime package allows command line programs that read and ...

QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations

Background In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, ...

An algorithm to identify functional groups in organic molecules

Background The concept of functional groups forms a basis of organic chemistry, medicinal chemistry, toxicity assessment, spectroscopy and also chemical nomenclature. All current software systems to identify functional groups are based on a predefined list of substructures. We are not aware of any program that can identify all functional groups in a molecule automatically. The ...

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor ...

RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm ...

Electronic lab notebooks: can they replace paper?

Despite the increasingly digital nature of society there are some areas of research that remain firmly rooted in the past; in this case the laboratory notebook, the last remaining paper component of an experiment. Countless electronic laboratory notebooks (ELNs) have been created in an attempt to digitise record keeping processes in the lab, but none of them have become a ‘key ...

Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy

In mass spectrometry-based untargeted metabolomics, rarely more than 30% of the compounds are identified. Without the true identity of these molecules it is impossible to draw conclusions about the biological mechanisms, pathway relationships and provenance of compounds. The only way at present to address this discrepancy is to use in silico fragmentation software to identify ...

CPANNatNIC software for counter-propagation neural network to assist in read-across

Background CPANNatNIC is software for development of counter-propagation artificial neural network models. Besides the interface for training of a new neural network it also provides an interface for visualisation of the results which was developed to aid in interpretation of the results and to use the program as a tool for read-across. Results The work presents the details of the ...

Searching for bioactive conformations of drug-like ligands with current force fields: how good are we?

Drug-like ligands obtained from protein–ligand complexes deposited in the Protein Databank were subjected to conformational searching using various force fields and solvation settings. For each ligand, the resulting conformer pool was examined for the presence of the bioactive (crystal pose-like) conformation. Similarity of conformers toward the crystal-pose was quantified as the ...

Scaffold Hunter: a comprehensive visual analytics framework for drug discovery

The era of big data is influencing the way how rational drug discovery and the development of bioactive molecules is performed and versatile tools are needed to assist in molecular design workflows. Scaffold Hunter is a flexible visual analytics framework for the analysis of chemical compound data and combines techniques from several fields such as data mining and information ...

ChemSAR: an online pipelining platform for molecular SAR modeling

Background In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, ...