Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist's dream
Journal of Cheminformatics
Computer-assisted methods for molecular structure elucidation: realizing a spectroscopist's dream
Mikhail Elyashberg 2
Kirill Blinov 2
Sergey Molodtsov 1
Yegor Smurnyy 2
Antony J Williams 0
Tatiana Churanova 2
0 ChemZoo Inc. , 904 Tamaras Circle, Wake Forest, NC, 27587 , USA
1 Novosibirsk Institute of Organic Chemistry, Siberian Division, Russian Academy of Sciences , 9 Akademik Lavrent'ev Av., Novosibirsk, 630090 Russian Federation
2 Advanced Chemistry Development, Moscow Department , 6 Akademik Bakulev Street, Moscow 117513, Russian Federation
Background: This article coincides with the 40 year anniversary of the first published works devoted to the creation of algorithms for computer-aided structure elucidation (CASE). The general principles on which CASE methods are based will be reviewed and the present state of the art in this field will be described using, as an example, the expert system Structure Elucidator. Results: The developers of CASE systems have been forced to overcome many obstacles hindering the development of a software application capable of drastically reducing the time and effort required to determine the structures of newly isolated organic compounds. Large complex molecules of up to 100 or more skeletal atoms with topological peculiarity can be quickly identified using the expert system Structure Elucidator based on spectral data. Logical analysis of 2D NMR data frequently allows for the detection of the presence of COSY and HMBC correlations of "nonstandard" length. Fuzzy structure generation provides a possibility to obtain the correct solution even in those cases when an unknown number of nonstandard correlations of unknown length are present in the spectra. The relative stereochemistry of big rigid molecules containing many stereocenters can be determined using the StrucEluc system and NOESY/ROESY 2D NMR data for this purpose. Conclusion: The StrucEluc system continues to be developed in order to expand the general applicability, provide improved workflows, usability of the system and increased reliability of the results. It is expected that expert systems similar to that described in this paper will receive increasing acceptance in the next decade and will ultimately be integrated directly to analytical instruments for the purpose of organic analysis. Work in this direction is in progress. In spite of the fact that many difficulties have already been overcome to deliver on the spectroscopist's dream of "fully automated structure elucidation" there is still work to do. Nevertheless, as the efficiency of expert systems is enhanced the solution of increasingly complex structural problems will be achievable.
Background
The potential of creating computer-assisted methods for
the structure elucidation of new organic compounds was
first discussed in the second half of the past century.
Structure elucidation commonly combines information
extracted from several forms of spectra. The molecular
formula of the substance is generally derived from a
massspectrum and structural hypotheses are deduced from
spectral data which may usually include NMR, IR, UV, etc.
spectra. The distinctive feature of this approach is the
inference of the structure of an unknown compound that
is absent from spectral libraries, i.e. without employing
reference structures and their associated spectra. Later,
qualitative spectral analysis without reference data was
extended to quantitative spectral analysis in optical
spectroscopy [
1
]. The solution of such problems can be
facilitated by the retrieval of reference data in combination
with logical-combinatorial processing of the data. A new
area of investigation was developed that is now referred to
as Computer-Aided Structure Elucidation (CASE). CASE
was applied initially to "small molecules" as distinct from
biological macromolecules and biopolymers.
The first reports devoted to CASE systems were published
by four independent groups of researchers exactly forty
years ago [
2-5
]. Since the publication of these seminal
reports an extensive literature regarding computer
methods of structure elucidation has been produced. From the
inception of CASE methods, attention has been directed
to the creation of artificial intelligence or "expert" systems
(ES) based on the analysis of 1D 1H and13C NMR data in
combination with MS and IR spectra. The first studies of
CASE development were described in a series of reviews
[
6,7
] and monographs [
8-10
]. In spite of the efforts of
many scientific groups no system capable of elucidating
large complex molecules was delivered during the first 20
years of intensive efforts. The primary reason for failure
was the lack of structural information that could be
retrieved from 1D NMR spectra to use as input to the
structure generator, the kernel of any expert system. The
first two decades of CASE development should,
nevertheless, be considered as very fruitful since a general strategy
was establis (...truncated)