Spatial chemical distance based on atomic property fields
A. V. Grigoryan
0
1
I. Kufareva
0
1
M. Totrov
0
1
R. A. Abagyan
0
1
0
M. Totrov Molsoft, LLC, 3366 N Torrey Pines Ct. Suite 300,
La Jolla, CA 92037, USA
1
A. V. Grigoryan I. Kufareva R. A. Abagyan (&) Department of Molecular Biology
, TPC28,
The Scripps Research Institute
, 10550 N Torrey Pines Rd.,
La Jolla, CA 92037, USA
Similarity of compound chemical structures often leads to close pharmacological profiles, including binding to the same protein targets. The opposite, however, is not always true, as distinct chemical scaffolds can exhibit similar pharmacology as well. Therefore, relying on chemical similarity to known binders in search for novel chemicals targeting the same protein artificially narrows down the results and makes lead hopping impossible. In this study we attempt to design a compound similarity/ distance measure that better captures structural aspects of their pharmacology and molecular interactions. The measure is based on our recently published method for compound spatial alignment with atomic property fields as a generalized 3D pharmacophoric potential. We optimized contributions of different atomic properties for better discrimination of compound pairs with the same pharmacology from those with different pharmacology using Partial Least Squares regression. Our proposed similarity measure was then tested for its ability to discriminate pharmacologically similar pairs from decoys on a large diverse dataset of 115 protein-ligand complexes. Compared to 2D Tanimoto and Shape Tanimoto approaches, our new approach led to improvement in the area under the receiver operating characteristic curve values in 66 and 58% of domains respectively. The improvement was particularly high for the previously problematic cases (weak performance of the 2D Tanimoto and Shape Tanimoto measures) with original AUC values below 0.8. In fact for these cases we obtained improvement in 86% of domains compare to 2D Tanimoto measure and 85% compare to Shape Tanimoto measure. The proposed spatial chemical distance measure can be used in virtual ligand screening.
-
Ligand-based approaches to protein family profiling has
been widely studied and used for in silico pharmacology
[1]. Similarity of compound chemical structures often leads
to close pharmacological profiles, including binding to the
same protein targets. By this reason, chemical similarity
criterion is widely used for identification of novel lead
molecules in the development of pharmaceuticals. A
variety of chemical similar measures has been proposed.
However, in many cases compounds with similar
pharmacology escape correct recognition as they appear to be
dissimilar by any existing measure.
In order to navigate in ligand space, one need to
represent the compound using appropriate properties
(descriptors) and then use a master equation to measure a
distance between two compounds.
Descriptors are usually classified according to their
dimensionality ranging from one-dimensional (1-D) to
three-dimensional (3D) properties [2, 3, 10]. Easy and fast
to compute 1-D descriptors describe global properties
which can be derived from chemical formula and classify
compounds or ligands from various target families [35,
10]. To perform fast comparison 1-D linear representations
of compounds are often used. The most popular of this kind
of simplified string is the Simplified Molecular Input Line
Entry System or SMILES [3, 6, 10].
To improve discrimination, 2D topological descriptors
are used. Graph-based methods, such as maximum
common subgraph (MCS) [3, 7, 10] and fingerprint-based
methods [3, 8, 10] are popular for substructure clustering
chemical compounds into subfamilies. Subgraph
isomorphism in large molecular databases is quite often time
consuming to perform on large numbers of structures and it
was for this reason that substructure screening was
developed as a rapid method of filtering out those molecules that
definitely do not contain the substructure of interest [10,
46]. The similarity between two molecules represented by
2D binary fingerprints is most frequently quantified using
the Tanimoto coefficient, which gives a measure of the
number of fragments in common between the two
molecules [3, 9, 10].
It is well known that molecular recognition depends on
the 3D structure and properties of molecule rather than the
underlying substructure(s) [10]. 3D methods are
computationally more expensive than 2D descriptor based methods,
because they require consideration of conformational space
of the molecule. These methods can be divided into
methods that are alignment-independent and methods that
require the molecules to be aligned in 3D space before
similarity function is used [10].
Some computationally expensive
alignment-independent methods use 3D geometrical descriptors represent
them in a binary fingerprint and then use with the Tanimoto
coefficient exactly as for 2D fingerprints [10, 11]. Other
methods are 3D equivalent of the MCS [10, 12, 13]. Many
3D approaches are based on the use of distances matrices
where the value of each element (i, j) equals the
interatomic distance between atoms i and j [10, 14]. Also there
are approaches where the pharmacophore points are used
for similarity comparisons [10, 1517].
Consideration of conformational flexibility of the
molecules as well as their relative orientation is required for
alignment dependent methods [10]. These methods devised
to align the compared structures via maximization of the
similarity function that is used [10, 45]. Many different
ways have been developed to represent molecules and
calculate similarity based on molecular shape and/or field
[1831, 45]. For reviews of molecular similarity methods,
see refs [2, 10, 3236].
The aim of this study is to design a spatial distance
measure between two chemicals that optimizes recognition
of their pharmacological similarity by using their 3D
conformational ensembles and properties pertaining to
molecular interactions. We recently introduced a novel
spatial alignment method based on atomic property fields
(APF) as a generalized 3D pharmacophoric potential [37].
APF is the representation of the ligand by a
multi-component (vector) 3D potential, with the components
corresponding to various physico-chemical atomic properties. In
the present study, the APF alignment is used to measure
spatial chemical similarity/distance between ligands.
A diverse benchmark of 99 proteins (see Supplementary
Table 1 for details) and ligands co-crystallized with these
proteins (with 6 ligands per protein on average) was used to
train APF parameters for better discrimination of
pharmacologically similar pairs from dissimilar ones. All possible
combinations of pairs of ligands from the same receptors as
well as for ligands co-crystallized with certain protein all
possible combinations of pairs with ligands co-crystallized
with 20 different randomly chosen from benchmark other
proteins, were taken and APF representation of larger
ligand was us (...truncated)