Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase

Journal of Computer-Aided Molecular Design, Oct 2014

Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds’ physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure–activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules’ properties before SAR and quantitative structure–activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs10822-014-9808-1.pdf

Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase

C. David Andersson 0 1 2 3 J. Mikael Hillgren 0 1 2 3 Cecilia Lindgren 0 1 2 3 Weixing Qian 0 1 2 3 Christine Akfur 0 1 2 3 Lotta Berg 0 1 2 3 Fredrik Ekstro m 0 1 2 3 Anna Linusson 0 1 2 3 0 W. Qian Laboratories for Chemical Biology Umea, Umea University , 90187 Umea, Sweden 1 Present Address: J. M. Hillgren Department of Chemistry and Molecular Biology - Medicinal Chemistry, University of Gothenburg , 41296 Goteborg, Sweden 2 C. D. Andersson J. M. Hillgren C. Lindgren W. Qian L. Berg A. Linusson (&) Department of Chemistry, Umea University , 90187 Umea, Sweden 3 C. Akfur F. Ekstrom Swedish Defense Research Agency , CBRN Defense and Security, 90621 Umea, Sweden Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSARmodels, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models. - Many scientific disciplines including medicinal- and environmental chemistry, pharmacology, and toxicology address questions related to the effects of small organic compounds on biological targets, and the relation between the molecules physicochemical properties and the observed response. To investigate the chemical structural reasons behind a specific effect and to predict what chemical features an even more (or less) potent compound should have, it is crucial to define a structureactivity relationship (SAR). A SAR establishes a link between the molecular chemical features and a particular measured effect. In this paper, we focus on the importance of careful considerations of the molecules that are used for SAR and quantitative structureactivity relationship (QSAR) studies. The molecules used to establish a QSAR dictate the quality and usefulness of the model, as it is the properties of the molecules that lead to the biological effect we want to model. A prerequisite for (Q)SAR modelling is that the set of included molecules show substantial and statistically significant differences in the measured (biological) effect. The chance of differences in response likely increases if the molecules structures are sufficiently diversealthough the statistical significance is dependent on the underlying SAR and the experimental errors of the effect measurements. Furthermore, the chemical features of investigated molecules need to be varied in such a way that their effects can be resolved in the subsequent SAR/QSAR studies. Therefore, we recommend careful selections and investigations of the sets of molecules used for SAR/QSAR in order to improve the usefulness of generated models. Here, we have designed and synthesized a set of inhibitors of the enzyme acetylcholinesterase (AChE) to illustrate the benefits of performing a statistical molecular design (SMD) [1] to create a solid molecular base for SAR and QSAR investigations. We also show the benefits of a careful analysis of the molecules properties before modeling, and the assessments of the resulting QSAR in relation to simpler models, here called reference models. In medicinal chemistry projects, chemists commonly have to select compounds to synthesize, typically less than 100, from a substantially larger theoretical pool of potentially interesting molecules. These selected molecules may be designed and synthesized on a linear time scale (one by one) based on medicinal chemistry experience, which may lead to improved compounds in some cases, but this is not a suitable strategy if the objective is to construct a SAR/ QSAR. In such cases, the preferred approach is to design and select sets of molecules that later can be used to investigate the biological effects. In SMD, subsets of molecules are designed based on the principles of design of experiments (DoEs) [2] where chemical features hypothesized to be important for biological effect are varied in a systematic way. SMD offers a way to select subsets of molecules in a sound way from a synthetic- and mathematical point of view, thus aiding chemists to make smart subset selections. Selecting compounds based on, for example, D-optimality [3] or by factorial designs [1, 2], effectively reduces the physicochemical overlap between the molecules keeping the number to a minimum. Simultaneously, the design makes sure that the subset is representative of the full set of conceivable molecules, and that chemical features (synthons or building blocks) return in several molecules to yield a basis for statistically supported conclusions regarding biological effect. More specifically, SMD in SAR analysis makes it possible to investigate non-additive effects of molecule structural or physicochemical features. By designing the molecules through simply combining synthons (building blocks) in a clever way, it can be ensured that structural fragments systematically reappear several times in different combinations among the final molecules. This gives a more robust basis for identifying combination effects and constructing regression models (QSAR). This is achieved because SMD inherently reduces the co-variation of the investigated chemical features increasing the possibility to resolve the impact of each investigated property on the measured biological effect. If two or more chemical features covary, their effects will be confounded and it will be difficult to distinguish what feature that is responsible for the effect. For example, if all flexible molecules are lipophilic, the effect of these two features will be confounded, and it will not be possible to resolve whether the biological effect is dependent mainly on flexibility, lipophilicity, or both. We recommend careful investigations of the correlation patterns of the descriptor-matrices of a set of molecules (i.e., investigation of the covariance of the X-matrix) aimed for SAR and QSAR studies. Unfortunately, this is r (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10822-014-9808-1.pdf
Article home page: https://link.springer.com/article/10.1007/s10822-014-9808-1

C. David Andersson, J. Mikael Hillgren, Cecilia Lindgren, Weixing Qian, Christine Akfur, Lotta Berg, Fredrik Ekström, Anna Linusson. Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase, Journal of Computer-Aided Molecular Design, 2015, pp. 199-215, Volume 29, Issue 3, DOI: 10.1007/s10822-014-9808-1