An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm

Journal of Cheminformatics, Nov 2017

The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time-consuming process and is limited by the available mass spectra of known natural products. Computer-aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP-complete. A dynamic programming (DP) algorithm can solve this NP-complete problem in pseudo-polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link (http://​csccp.​cmdm.​tw/​).

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1186%2Fs13321-017-0244-9.pdf

An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm

Su et al. J Cheminform An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm Bo‑Han Su 3 MengY‑u Shen 3 Yeu‑Chern Harn 2 SanY‑uan Wang 3 Alioune Schurz 0 1 Chieh Lin 3 Olivia A. Lin 0 1 Yufeng J. Tseng 0 1 3 0 Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University , No. 1 Sec. 4, Roosevelt Road, Taipei 106 , Taiwan 1 Graduate Institute of Biomedical Elec‐ tronics and Bioinformatics, National Taiwan University , No. 1 Sec. 4, Roosevelt Road, Taipei 106 , Taiwan 2 Graduate Institute of Networking and Multimedia, National Taiwan University , No. 1 Sec. 4, Roosevelt Road, Taipei 106 , Taiwan 3 Department of Computer Science and Information Engineering, National Taiwan University , No. 1 Sec. 4, Roosevelt Road, Taipei 106 , Taiwan The identification of chemical structures in natural product mixtures is an important task in drug discovery but is still a challenging problem, as structural elucidation is a time‑ consuming process and is limited by the available mass spectra of known natural products. Computer‑ aided structure elucidation (CASE) strategies seek to automatically propose a list of possible chemical structures in mixtures by utilizing chromatographic and spectroscopic methods. However, current CASE tools still cannot automatically solve structures for experienced natural product chemists. Here, we formulated the structural elucidation of natural products in a mixture as a computational problem by extending a list of scaffolds using a weighted side chain list after analyzing a collection of 243,130 natural products and designed an efficient algorithm to precisely identify the chemical structures. The complexity of such a problem is NP‑ complete. A dynamic programming (DP) algorithm can solve this NP‑ complete problem in pseudo‑ polynomial time after converting floating point molecular weights into integers. However, the running time of the DP algorithm degrades exponentially as the precision of the mass spectrometry experiment grows. To ideally solve in polynomial time, we proposed a novel iterative DP algorithm that can quickly recognize the chemical structures of natural products. By utilizing this algorithm to elucidate the structures of four natural products that were experimentally and structurally determined, the algorithm can search the exact solutions, and the time performance was shown to be in polynomial time for average cases. The proposed method improved the speed of the structural elucidation of natural products and helped broaden the spectrum of available compounds that could be applied as new drug candidates. A web service built for structural elucidation studies is freely accessible via the following link (http://csccp.cmdm.tw/). CASE; Natural products; Dynamic programming; Polynomial time Background Examining natural and therapeutic products is crucial for drug development because many chemically synthesized compounds have potentially serious toxicity and adverse effects, while less toxic compounds extracted from natural products could possibly be developed into new drug candidates [ 1 ]. In addition, natural products often open new chemical spaces not explored by synthetic compounds produced by combinatorial chemistry and can further expand the diversity and novelty of molecules by extracting different natural sources, such as the deep and cold seas [ 2, 3 ]. A review by Newman and Cragg [2] indicated that 47% of new anti-cancer drugs from 1950 to 2006 were originally from or derived from natural products. Recently, Butler et al. [ 3 ] reviewed 100 natural products and natural products-derived compounds that were either evaluated in clinical trials or in registration at the end of 2013. They concluded that 50% of the compounds were natural products or semi-synthetic natural products, while the remaining compounds were classified as natural products-derived compounds. The exploration of new lead compounds from natural products and their successful development into clinical trials will continue to be a significant trend in drug discovery over the next few years. However, natural products-based drug discovery faces many challenges [ 4 ], and the exploration of natural products for new drug development was actually disfavored by the pharmaceutical industry in the 2000s [ 5 ]. One of the major hurdles is the extremely time-consuming processes involved in the isolation and structural elucidation of bioactive compounds from natural products composed of complicated mixtures. Because the magnitude of the natural products database is limited, high-throughput screening methods cannot be used to effectively identify potential natural products drugs. Many advances in mass spectrometry (MS) and nuclear magnetic resonance (NMR) automation techniques over the last two decades have accelerated structural elucidation processes for complex natural products mixtures. MS is a common too (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1186%2Fs13321-017-0244-9.pdf

Bo-Han Su, Meng-Yu Shen, Yeu-Chern Harn, San-Yuan Wang, Alioune Schurz, Chieh Lin, Olivia A. Lin, Yufeng J. Tseng. An efficient computer-aided structural elucidation strategy for mixtures using an iterative dynamic programming algorithm, Journal of Cheminformatics, 2017, pp. 57, Volume 9, Issue 1, DOI: 10.1186/s13321-017-0244-9