Benefits of statistical molecular design, covariance analysis, and reference models in QSAR: a case study on acetylcholinesterase
C. David Andersson
0
1
2
3
J. Mikael Hillgren
0
1
2
3
Cecilia Lindgren
0
1
2
3
Weixing Qian
0
1
2
3
Christine Akfur
0
1
2
3
Lotta Berg
0
1
2
3
Fredrik Ekstro m
0
1
2
3
Anna Linusson
0
1
2
3
0
W. Qian Laboratories for Chemical Biology Umea, Umea University
, 90187 Umea,
Sweden
1
Present Address: J. M. Hillgren Department of Chemistry and Molecular Biology - Medicinal Chemistry, University of Gothenburg
, 41296 Goteborg,
Sweden
2
C. D. Andersson J. M. Hillgren C. Lindgren W. Qian L. Berg A. Linusson (&) Department of Chemistry, Umea University
, 90187 Umea,
Sweden
3
C. Akfur F. Ekstrom Swedish Defense Research Agency
, CBRN Defense and Security, 90621 Umea,
Sweden
Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSARmodels, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.
-
Many scientific disciplines including medicinal- and
environmental chemistry, pharmacology, and toxicology
address questions related to the effects of small organic
compounds on biological targets, and the relation between
the molecules physicochemical properties and the
observed response. To investigate the chemical structural
reasons behind a specific effect and to predict what
chemical features an even more (or less) potent compound
should have, it is crucial to define a structureactivity
relationship (SAR). A SAR establishes a link between the
molecular chemical features and a particular measured
effect. In this paper, we focus on the importance of careful
considerations of the molecules that are used for SAR and
quantitative structureactivity relationship (QSAR) studies.
The molecules used to establish a QSAR dictate the quality
and usefulness of the model, as it is the properties of the
molecules that lead to the biological effect we want to
model. A prerequisite for (Q)SAR modelling is that the set
of included molecules show substantial and statistically
significant differences in the measured (biological) effect.
The chance of differences in response likely increases if the
molecules structures are sufficiently diversealthough the
statistical significance is dependent on the underlying SAR
and the experimental errors of the effect measurements.
Furthermore, the chemical features of investigated
molecules need to be varied in such a way that their effects can
be resolved in the subsequent SAR/QSAR studies.
Therefore, we recommend careful selections and investigations
of the sets of molecules used for SAR/QSAR in order to
improve the usefulness of generated models. Here, we have
designed and synthesized a set of inhibitors of the enzyme
acetylcholinesterase (AChE) to illustrate the benefits of
performing a statistical molecular design (SMD) [1] to
create a solid molecular base for SAR and QSAR
investigations. We also show the benefits of a careful analysis of
the molecules properties before modeling, and the
assessments of the resulting QSAR in relation to simpler
models, here called reference models.
In medicinal chemistry projects, chemists commonly
have to select compounds to synthesize, typically less than
100, from a substantially larger theoretical pool of
potentially interesting molecules. These selected molecules may
be designed and synthesized on a linear time scale (one by
one) based on medicinal chemistry experience, which may
lead to improved compounds in some cases, but this is not
a suitable strategy if the objective is to construct a SAR/
QSAR. In such cases, the preferred approach is to design
and select sets of molecules that later can be used to
investigate the biological effects. In SMD, subsets of
molecules are designed based on the principles of design of
experiments (DoEs) [2] where chemical features
hypothesized to be important for biological effect are varied in a
systematic way. SMD offers a way to select subsets of
molecules in a sound way from a synthetic- and
mathematical point of view, thus aiding chemists to make
smart subset selections. Selecting compounds based on,
for example, D-optimality [3] or by factorial designs [1, 2],
effectively reduces the physicochemical overlap between
the molecules keeping the number to a minimum.
Simultaneously, the design makes sure that the subset is
representative of the full set of conceivable molecules, and that
chemical features (synthons or building blocks) return
in several molecules to yield a basis for statistically
supported conclusions regarding biological effect. More
specifically, SMD in SAR analysis makes it possible to
investigate non-additive effects of molecule structural or
physicochemical features. By designing the molecules
through simply combining synthons (building blocks) in a
clever way, it can be ensured that structural fragments
systematically reappear several times in different
combinations among the final molecules. This gives a more
robust basis for identifying combination effects and
constructing regression models (QSAR). This is achieved
because SMD inherently reduces the co-variation of the
investigated chemical features increasing the possibility to
resolve the impact of each investigated property on the
measured biological effect. If two or more chemical
features covary, their effects will be confounded and it will be
difficult to distinguish what feature that is responsible for
the effect. For example, if all flexible molecules are
lipophilic, the effect of these two features will be confounded,
and it will not be possible to resolve whether the biological
effect is dependent mainly on flexibility, lipophilicity, or
both. We recommend careful investigations of the
correlation patterns of the descriptor-matrices of a set of
molecules (i.e., investigation of the covariance of the X-matrix)
aimed for SAR and QSAR studies. Unfortunately, this is
r (...truncated)