An ensemble model of QSAR tools for regulatory risk assessment

Journal of Cheminformatics, Sep 2016

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1186%2Fs13321-016-0164-0.pdf

An ensemble model of QSAR tools for regulatory risk assessment

Pradeep et al. J Cheminform An ensemble model of QSAR tools for regulatory risk assessment Prachi Pradeep 0 Richard J. Povinelli 2 Shannon White 3 Stephen J. Merrill 1 0 National Center for Computational Toxicology (ORISE Fellow) , US EPA, Research Triangle Park, NC , USA 1 Department of Mathematics , Statistics, and Computer Science , Marquette University , Milwaukee, WI , USA 2 Electrical and Computer Engineering Department, Marquette University , Milwaukee, WI , USA 3 Lombardi Comprehensive Cancer Center, Georgetown University Medical Center , Washington, DC , USA Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leaveone-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study. Computational toxicology; In silico QSAR tools; Hybrid QSAR models; Ensemble models; Risk assessment Background Chemical risk assessment associated with chemical exposure is necessary for the protection of human and environmental health. Toxicity or adverse effects are major reasons for failure of a potential pharmaceutical, an industrial chemical or a medical device [ 1–3 ]. Regulatory risk assessment is the process that ensures marketing of safe and effective drugs, medical devices and other consumer products. Regulatory decisions are primarily dependent on the short and long term toxic and clinical effects of chemicals. Conventional methods of risk assessment (in vivo experiments and clinical trials) are performed only after product development, and are expensive and time-consuming. Although in vivo experimental studies are the most accurate method for identifying the toxic effects induced by a xenobiotic, time and cost associated with them for new chemical regulation renders them ineffective for regulatory risk assessment. In silico approaches to predictive toxicology focus on building quantitative structure activity relationship (QSAR) models that can mimic the results of in  vivo studies. In silico methods are appealing because they provide a faster alternative to otherwise time-consuming laboratory and clinical testing methods [ 4, 5 ]. Currently, several commercial (free or proprietary) and open source in silico QSAR tools are available that can predict the toxic effects of a chemical based on its chemical structure [ 6, 7 ]. QSAR models are widely used for identification of chemicals that have a desired biological effect (e.g. drug leads) or for early prediction of potential toxic effects in the pharmaceutical industry. In contrast to industrial use, regulatory use of QSAR models is very different. In a regulatory application, QSAR models can be used to: (1) supplement experimental data, (2) support prioritization in the absence of experimental data, and (3) replace experimental animal testing methods [ 8, 9 ]. Several QSAR models have been used and validated by United Sates (US) regulato (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1186%2Fs13321-016-0164-0.pdf

Prachi Pradeep, Richard J. Povinelli, Shannon White, Stephen J. Merrill. An ensemble model of QSAR tools for regulatory risk assessment, Journal of Cheminformatics, 2016, pp. 48, Volume 8, Issue 1, DOI: 10.1186/s13321-016-0164-0