Journal of Cheminformatics

http://link.springer.com/journal/13321

List of Papers (Total 786)

Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data

Background A large proportion of polar anthropogenic compounds routinely released into the environment comprises homologue series, i.e., sets of chemicals differing in a repeating chemical unit. Using analytical techniques such as liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS), these compounds are readily measurable as signal sets with characteristic ...

The polypharmacology browser: a web-based multi-fingerprint target prediction tool using ChEMBL bioactivity data

Background Several web-based tools have been reported recently which predict the possible targets of a small molecule by similarity to compounds of known bioactivity using molecular fingerprints (fps), however predictions in each case rely on similarities computed from only one or two fps. Considering that structural similarity and therefore the predicted targets strongly depend on ...

Technical implications of new IUPAC elements in cheminformatics

The symbols for the new IUPAC elements named in November 2016 can introduce subtle ambiguities within cheminformatics software. The ambiguities are described and demonstrated by highlighting inconsistencies between software when handling existing element symbols.

Database fingerprint (DFP): an approach to represent molecular databases

Background Molecular fingerprints are widely used in several areas of chemoinformatics including diversity analysis and similarity searching. The fingerprint-based analysis of chemical libraries, in particular of large collections, usually requires the molecular representation of each compound in the library that may lead to issues of storage space and redundant calculations. In ...

The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability

A new metric for the evaluation of model performance in the field of virtual screening and quantitative structure–activity relationship applications is described. This metric has been termed the power metric and is defined as the fraction of the true positive rate divided by the sum of the true positive and false positive rates, for a given cutoff threshold. The performance of this ...

Automatic procedure for generating symmetry adapted wavefunctions

Automatic detection of point groups as well as symmetrisation of molecular geometry and wavefunctions are useful tools in computational quantum chemistry. Algorithms for developing these tools as well as an implementation are presented. The symmetry detection algorithm is a clustering algorithm for symmetry invariant properties, combined with logical deduction of possible symmetry ...

Mapping and classifying molecules from a high-throughput structural database

High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to ...

SkinSensDB: a curated database for skin sensitization assays

Skin sensitization is an important toxicological endpoint for chemical hazard determination and safety assessment. Prediction of chemical skin sensitizer had traditionally relied on data from rodent models. The development of the adverse outcome pathway (AOP) and associated alternative in vitro assays have reshaped the assessment of skin sensitizers. The integration of multiple ...

NPCARE: database of natural products and fractional extracts for cancer regulation

Background Natural products have increasingly attracted much attention as a valuable resource for the development of anticancer medicines due to the structural novelty and good bioavailability. This necessitates a comprehensive database for the natural products and the fractional extracts whose anticancer activities have been verified. Description NPCARE ...

A novel descriptor based on atom-pair properties

Background Molecular descriptors have been widely used to predict biological activities and physicochemical properties or to analyze chemical libraries on the basis of similarity. Although fingerprints and properties are generally used as descriptors, neither is perfect for these purposes. A fingerprint can distinguish between molecules, whereas a property may not do the same in ...

A metadata-driven approach to data repository design

The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the ...

S2RSLDB: a comprehensive manually curated, internet-accessible database of the sigma-2 receptor selective ligands

Background Sigma (σ) receptors are accepted as a particular receptor class consisting of two subtypes: sigma-1 (σ1) and sigma-2 (σ2). The two receptor subtypes have specific drug actions, pharmacological profiles and molecular characteristics. The σ2 receptor is overexpressed in several tumor cell lines, and its ligands are currently under investigation for their role in tumor ...

osFP: a web server for predicting the oligomeric states of fluorescent proteins

Background Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first ...

Scaffold analysis of PubChem database as background for hierarchical scaffold-based visualization

Background Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of ...

ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files

Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles ...

Utilizing maximal frequent itemsets and social network analysis for HIV data analysis

Acquired immune deficiency syndrome is a deadly disease which is caused by human immunodeficiency virus (HIV). This virus attacks patients immune system and effects its ability to fight against diseases. Developing effective medicine requires understanding the life cycle and replication ability of the virus. HIV-1 protease enzyme is used to cleave an octamer peptide into peptides ...

Mapping the 3D structures of small molecule binding sites

Background Analysis of the 3D structures of protein–ligand binding sites can provide valuable insights for drug discovery. Binding site comparison (BSC) studies can be employed to elucidate the function of orphan proteins or to predict the potential for polypharmacology. Many previous binding site analyses only consider binding sites surrounding an experimentally observed bound ...

Programmatic conversion of crystal structures into 3D printable files using Jmol

Background Three-dimensional (3D) printed crystal structures are useful for chemistry teaching and research. Current manual methods of converting crystal structures into 3D printable files are time-consuming and tedious. To overcome this limitation, we developed a programmatic method that allows for facile conversion of thousands of crystal structures directly into 3D printable ...

Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to ...

Prediction of reacting atoms for the major biotransformation reactions of organic xenobiotics

Background The knowledge of drug metabolite structures is essential at the early stage of drug discovery to understand the potential liabilities and risks connected with biotransformation. The determination of the site of a molecule at which a particular metabolic reaction occurs could be used as a starting point for metabolite identification. The prediction of the site of ...

A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the ...

LA-iMageS: a software for elemental distribution bioimaging using LA–ICP–MS data

The spatial distribution of chemical elements in different types of samples is an important field in several research areas such as biology, paleontology or biomedicine, among others. Elemental distribution imaging by laser ablation inductively coupled plasma mass spectrometry (LA–ICP–MS) is an effective technique for qualitative and quantitative imaging due to its high spatial ...

DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning

Background Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed ...

Consensus Diversity Plots: a global diversity analysis of chemical libraries

Background Measuring the structural diversity of compound databases is relevant in drug discovery and many other areas of chemistry. Since molecular diversity depends on molecular representation, comprehensive chemoinformatic analysis of the diversity of libraries uses multiple criteria. For instance, the diversity of the molecular libraries is typically evaluated employing ...

Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

Background PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute “neighbor” relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity ...