Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing
Article
https://doi.org/10.1038/s41467-023-37219-z
Robust automated backbone triple
resonance NMR assignments of proteins
using Bayesian-based simulated annealing
Received: 13 April 2022
Check for updates
1234567890():,;
1234567890():,;
Accepted: 6 March 2023
Anthony C. Bishop 1, Glorisé Torres-Montalvo
Kyle Mimun 1 & A. Joshua Wand 1,2,3,4
1
, Sravya Kotaru
2
,
Assignment of resonances of nuclear magnetic resonance (NMR) spectra to
specific atoms within a protein remains a labor-intensive and challenging task.
Automation of the assignment process often remains a bottleneck in the
exploitation of solution NMR spectroscopy for the study of protein structuredynamics-function relationships. We present an approach to the assignment of
backbone triple resonance spectra of proteins. A Bayesian statistical analysis of
predicted and observed chemical shifts is used in conjunction with inter-spin
connectivities provided by triple resonance spectroscopy to calculate a
pseudo-energy potential that drives a simulated annealing search for the most
optimal set of resonance assignments. Termed Bayesian Assisted Assignments
by Simulated Annealing (BARASA), a C++ program implementation is tested
against systems ranging in size to over 450 amino acids including examples of
intrinsically disordered proteins. BARASA is fast, robust, accommodates
incomplete and incorrect information, and outperforms current algorithms –
especially in cases of sparse data and is sufficiently fast to allow for real-time
evaluation during data acquisition.
Nuclear magnetic resonance (NMR) spectroscopy is unique in its
ability to provide simultaneous and comprehensive structural and
dynamical atomic-scale information about macromolecules such as
proteins in solution1–4. Unfortunately, however, an observed resonance
frequency in an NMR spectrum cannot yet be directly assigned to the
individual atom(s) within the protein from which they arise without the
time-intensive collection and analysis of additional spectra. Comprehensive mapping of individual resonances comprising nuclear magnetic resonance (NMR) spectra to specific atoms within a protein
molecule is a general prerequisite for the successful analysis of the
structure and dynamics of proteins by NMR spectroscopy. Early
applications of multi-dimensional homonuclear 1H NMR data to the socalled resonance assignment problem relied heavily on human intervention. The first comprehensive approach was the sequential
assignment method, which centered on identification of J-coupled spin
systems5 that are then assembled through connections provided by
short distances revealed by the nuclear Overhauser effect (NOE)
interactions between sequential residues using the identity of side
chains to error-check against the primary structure6,7. The subsequent
main chain directed (MCD) assignment strategy8,9 formalized selfcorrecting cyclic patterns of backbone 1H-1H NOE interactions and
provided a more robust algorithmic framework that relieved somewhat the complexity of identifying side chain resonances10,11. While the
MCD approach did lead to the first fully automated assignment of 1H
resonances to backbone hydrogens11, automation of 1H-based resonance assignments was generally frustrated by the overwhelming
spectral degeneracy of multidimensional 1H spectra of proteins and
the interference of technical attributes such as a prominent diagonal.
The introduction of heteronuclear triple resonance spectroscopy12–17
completely changed the landscape of the resonance assignment task
1
Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA. 2Graduate Group in Biochemistry & Molecular Biophysics,
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19014, USA. 3Department of Chemistry, Texas A&M University, College Station, TX
e-mail:
77843, USA. 4Department of Molecular & Cellular Medicine, Texas A&M University, College Station, TX 77843, USA.
Nature Communications | (2023)14:1556
1
Article
by providing much greater resolution, generally higher quality data,
and, most importantly, definitive rules with very precise meanings for
making connectivities (correlations) between backbone resonances.
Triple resonance assignments of the protein backbone permit access,
either directly or by tethering to side chain resonance assignments, to
a wide range of dynamic phenomena17,18 and structural information19–21.
Automated triple resonance algorithms have led to effectively
complete backbone resonance assignments of smaller proteins with
little human intervention and greatly aided the assignment of larger
systems22–24. Yet, even with the advent of transverse relaxation optimized spectroscopy (TROSY)25, the comprehensive assignment of
systems larger than 30 kDa remains remarkably rare. The limitations
are quite analogous to that summarized for earlier assignment strategies based exclusively on 1H-1H scalar and NOE interactions:
increasing ambiguity in connectivities due to degeneracy, loss of
resonances due to relaxation or artifact, and other confounding
spectral attributes are simply not sufficiently accommodated by current automated assignment strategies.
Here, we strive to overcome the issue of data sparseness and
ambiguity by appealing to the statistics of Bayes to utilize available
information more effectively via the calculation of explicit probabilities. Importantly, this formalism also allows for a flexible and
adaptable incorporation of chemical shift prediction and structural
knowledge into the assignment process. By implementing the Bayesian
analysis within a simulated annealing engine, we develop a robust and
efficient search for optimal solutions. Protein assignment algorithms
utilizing simulated annealing have been developed in the past26.
However, the stochastic algorithm described here takes advantage of
readily available pre-existing structural models, both experimentallydetermined and predicted, and in doing so more effectively exploits
the rich information contained within structure-based predicted chemical shifts. We demonstrate how these invaluable restraints greatly
aid the resonance assignment process, especially in cases where data
may be otherwise sparse or even incorrect. We also compare the
overall performance of BARASA against three highly cited assignment
algorithms on a variety of experimental datasets.
Results and discussion
Bayesian assisted resonance assignments by simulated annealing (BARASA)
We designed an algorithm, termed BARASA, which utilizes a simulated
annealing approach27 to efficiently search the immense solution space
for the optimal set of resonance assignments starting with a set of raw
crosspeaks derived from triple resonance type spectra. The objective
is to find the correct mapping of individual resonances to specific
atoms within the protein molecule. The algorithm first assembles an
initial set of spin systems based on an analysis of crosspeak lists and
the connectivity rules (...truncated)