Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

Nature Communications, Apr 2023

Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often remains a bottleneck in the exploitation of solution NMR spectroscopy for the study of protein structure-dynamics-function relationships. We present an approach to the assignment of backbone triple resonance spectra of proteins. A Bayesian statistical analysis of predicted and observed chemical shifts is used in conjunction with inter-spin connectivities provided by triple resonance spectroscopy to calculate a pseudo-energy potential that drives a simulated annealing search for the most optimal set of resonance assignments. Termed Bayesian Assisted Assignments by Simulated Annealing (BARASA), a C++ program implementation is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins. BARASA is fast, robust, accommodates incomplete and incorrect information, and outperforms current algorithms – especially in cases of sparse data and is sufficiently fast to allow for real-time evaluation during data acquisition.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-023-37219-z.pdf

Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

Article https://doi.org/10.1038/s41467-023-37219-z Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing Received: 13 April 2022 Check for updates 1234567890():,; 1234567890():,; Accepted: 6 March 2023 Anthony C. Bishop 1, Glorisé Torres-Montalvo Kyle Mimun 1 & A. Joshua Wand 1,2,3,4 1 , Sravya Kotaru 2 , Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often remains a bottleneck in the exploitation of solution NMR spectroscopy for the study of protein structuredynamics-function relationships. We present an approach to the assignment of backbone triple resonance spectra of proteins. A Bayesian statistical analysis of predicted and observed chemical shifts is used in conjunction with inter-spin connectivities provided by triple resonance spectroscopy to calculate a pseudo-energy potential that drives a simulated annealing search for the most optimal set of resonance assignments. Termed Bayesian Assisted Assignments by Simulated Annealing (BARASA), a C++ program implementation is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins. BARASA is fast, robust, accommodates incomplete and incorrect information, and outperforms current algorithms – especially in cases of sparse data and is sufficiently fast to allow for real-time evaluation during data acquisition. Nuclear magnetic resonance (NMR) spectroscopy is unique in its ability to provide simultaneous and comprehensive structural and dynamical atomic-scale information about macromolecules such as proteins in solution1–4. Unfortunately, however, an observed resonance frequency in an NMR spectrum cannot yet be directly assigned to the individual atom(s) within the protein from which they arise without the time-intensive collection and analysis of additional spectra. Comprehensive mapping of individual resonances comprising nuclear magnetic resonance (NMR) spectra to specific atoms within a protein molecule is a general prerequisite for the successful analysis of the structure and dynamics of proteins by NMR spectroscopy. Early applications of multi-dimensional homonuclear 1H NMR data to the socalled resonance assignment problem relied heavily on human intervention. The first comprehensive approach was the sequential assignment method, which centered on identification of J-coupled spin systems5 that are then assembled through connections provided by short distances revealed by the nuclear Overhauser effect (NOE) interactions between sequential residues using the identity of side chains to error-check against the primary structure6,7. The subsequent main chain directed (MCD) assignment strategy8,9 formalized selfcorrecting cyclic patterns of backbone 1H-1H NOE interactions and provided a more robust algorithmic framework that relieved somewhat the complexity of identifying side chain resonances10,11. While the MCD approach did lead to the first fully automated assignment of 1H resonances to backbone hydrogens11, automation of 1H-based resonance assignments was generally frustrated by the overwhelming spectral degeneracy of multidimensional 1H spectra of proteins and the interference of technical attributes such as a prominent diagonal. The introduction of heteronuclear triple resonance spectroscopy12–17 completely changed the landscape of the resonance assignment task 1 Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA. 2Graduate Group in Biochemistry & Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19014, USA. 3Department of Chemistry, Texas A&M University, College Station, TX e-mail: 77843, USA. 4Department of Molecular & Cellular Medicine, Texas A&M University, College Station, TX 77843, USA. Nature Communications | (2023)14:1556 1 Article by providing much greater resolution, generally higher quality data, and, most importantly, definitive rules with very precise meanings for making connectivities (correlations) between backbone resonances. Triple resonance assignments of the protein backbone permit access, either directly or by tethering to side chain resonance assignments, to a wide range of dynamic phenomena17,18 and structural information19–21. Automated triple resonance algorithms have led to effectively complete backbone resonance assignments of smaller proteins with little human intervention and greatly aided the assignment of larger systems22–24. Yet, even with the advent of transverse relaxation optimized spectroscopy (TROSY)25, the comprehensive assignment of systems larger than 30 kDa remains remarkably rare. The limitations are quite analogous to that summarized for earlier assignment strategies based exclusively on 1H-1H scalar and NOE interactions: increasing ambiguity in connectivities due to degeneracy, loss of resonances due to relaxation or artifact, and other confounding spectral attributes are simply not sufficiently accommodated by current automated assignment strategies. Here, we strive to overcome the issue of data sparseness and ambiguity by appealing to the statistics of Bayes to utilize available information more effectively via the calculation of explicit probabilities. Importantly, this formalism also allows for a flexible and adaptable incorporation of chemical shift prediction and structural knowledge into the assignment process. By implementing the Bayesian analysis within a simulated annealing engine, we develop a robust and efficient search for optimal solutions. Protein assignment algorithms utilizing simulated annealing have been developed in the past26. However, the stochastic algorithm described here takes advantage of readily available pre-existing structural models, both experimentallydetermined and predicted, and in doing so more effectively exploits the rich information contained within structure-based predicted chemical shifts. We demonstrate how these invaluable restraints greatly aid the resonance assignment process, especially in cases where data may be otherwise sparse or even incorrect. We also compare the overall performance of BARASA against three highly cited assignment algorithms on a variety of experimental datasets. Results and discussion Bayesian assisted resonance assignments by simulated annealing (BARASA) We designed an algorithm, termed BARASA, which utilizes a simulated annealing approach27 to efficiently search the immense solution space for the optimal set of resonance assignments starting with a set of raw crosspeaks derived from triple resonance type spectra. The objective is to find the correct mapping of individual resonances to specific atoms within the protein molecule. The algorithm first assembles an initial set of spin systems based on an analysis of crosspeak lists and the connectivity rules (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41467-023-37219-z.pdf
Article home page: https://www.nature.com/articles/s41467-023-37219-z

Bishop, Anthony C., Torres-Montalvo, Glorisé, Kotaru, Sravya, Mimun, Kyle, Wand, A. Joshua. Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing, Nature Communications, DOI: 10.1038/s41467-023-37219-z