Using sequence data to predict the self-assembly of supramolecular collagen structures.

Biophysical Journal, Aug 2022

Collagen fibrils are the major constituents of the extracellular matrix, which provides structural support to vertebrate connective tissues. It is widely assumed that the superstructure of collagen fibrils is encoded in the primary sequences of the molecular ...

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463645/pdf/

Using sequence data to predict the self-assembly of supramolecular collagen structures.

Article Using sequence data to predict the self-assembly of supramolecular collagen structures Anna M. Puszkarska,1 Daan Frenkel,1 Lucy J. Colwell,1,2 and Melinda J. Duer1,* 1 Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom and 2Google Research, Mountain View, California ABSTRACT Collagen fibrils are the major constituents of the extracellular matrix, which provides structural support to vertebrate connective tissues. It is widely assumed that the superstructure of collagen fibrils is encoded in the primary sequences of the molecular building blocks. However, the interplay between large-scale architecture and small-scale molecular interactions makes the ab initio prediction of collagen structure challenging. Here, we propose a model that allows us to predict the periodic structure of collagen fibers and the axial offset between the molecules, purely on the basis of simple predictive rules for the interaction between amino acid residues. With our model, we identify the sequence-dependent collagen fiber geometries with the lowest free energy and validate the predicted geometries against the available experimental data. We propose a procedure for searching for optimal staggering distances. Finally, we build a classification algorithm and use it to scan 11 data sets of vertebrate fibrillar collagens, and predict the periodicity of the resulting assemblies. We analyzed the experimentally observed variance of the optimal stagger distances across species, and find that these distances, and the resulting fibrillar phenotypes, are evolutionary well preserved. Moreover, we observed that the energy minimum at the optimal stagger distance is broad in all cases, suggesting a further evolutionary adaptation designed to improve the assembly kinetics. Our periodicity predictions are not only in good agreement with the experimental data on collagen molecular staggering for all collagen types analyzed, but also for synthetic peptides. We argue that, with our model, it becomes possible to design tailor-made, periodic collagen structures, thereby enabling the design of novel biomimetic materials based on collagen-mimetic trimers. SIGNIFICANCE The pathway for protein self-assembly is determined by the free energy landscape coded in the noncovalent interactions between the building blocks. We use this basic principle to develop a model that describes the mechanisms involved in the staggering of collagen molecules in fibrillar assemblies. In this work we present a simple, parameter-free model for collagen fibril design that allows us to predict the structure of self-assembling collagen fibers on the basis of the amino acid sequence of the constituent a-chain subunits. We develop a classification algorithm and use it to scan through large data sets of collagen molecules to predict the periodicity of the resulting assemblies. We argue that the interaction model presented in this work provides a foundation for engineering of novel collagen molecules with specific material properties for targeted applications. INTRODUCTION The material properties of connective tissues, such as tendon, skin, bone, and cartilage, are largely controlled by fibrillar assemblies of collagen proteins. Collagen molecules are long (z 300 nm), rope-like structures, formed from three monomeric a-chains twisted together into a triple helix (1). In vertebrates, there are at least 10 distinct collagen molecules, each comprising 3 monomers, drawn Submitted March 9, 2022, and accepted for publication July 12, 2022. *Correspondence: from 12 different a-chains, encoded by 11 genes. The primary structure of the individual a-chains determines the geometrical and biophysical parameters of the collagen helix, which in turn govern the organization of molecules within the fibril, thereby establishing interactions necessary for quaternary structures to form. Collagen fibrils are composed of hundreds of aligned helices. The major collagens, types I, II, and III, form wide, long, unbranched fibrils, which are the dominant components of structural tissue, typically in conjunction with smaller quantities of the minor collagens, types V and XI, which are thought to act as fibril nucleators (1). TEM studies of these fibrils show periodic dark-light bands along their Editor: Markus Buehler. https://doi.org/10.1016/j.bpj.2022.07.019  2022 Biophysical Society. This is an open access article under the CC BY-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/4.0/). Biophysical Journal 121, 3023–3033, August 16, 2022 3023 Puszkarska et al. length with periodicity Dz67 nm, attributed to the constituent molecules being longitudinally staggered relative to their neighbors by integer multiples of D (2–5). Such fibrils are found in tendons, cornea, skin, and cartilage (6–8). However, not all collagen molecular species assemble into these classical periodic fibrils. Regulatory or developmental collagen proteins do not form wide, striated fibrils under physiological conditions. These polymers are incorporated into the structurally defined suprastructure as a result of heterotypic interactions (collagen type V and XI) (9). In addition, some collagens form thin, nonbanded assemblies (type XXIV and XXVII) (10–13). To unravel the design principles of collagen assembly, we must find a mapping between the primary sequence of the collagen trimer and the phenotypic, structural features of the collagen fibril. Given the primary sequence of the a-chain subunits, is it possible to predict the value of the axial offset between assembled polymers? Previous work has provided evidence for a link between sequence and the supramolecular structure of collagen assemblies (14–17). In fact, interaction-based scoring systems for linear sequences have been proposed in (14,15,17). In what follows, we use a more physically detailed model to arrive at a simple theoretical tool to predict the observed molecular geometry. Given the size of each collagen monomer of around 3000 amino acid residues, and the lack of detailed structural data, a fully atomistic (free) energy optimization procedure to model collagenous assemblies would be prohibitively expensive. Consequently, we take a coarse-grained approach to estimate the free energy of assembly. We make use of well-established empirical estimates of the strength of residue-residue interactions, based on so-called statistical contact potentials (CPs). We integrate these CPs in a simplified representation of collagen molecular structure. The resulting model allows us to estimate the relative stability of various collagen ar- rangements. We analyzed the primary structures of collagen proteins that can be classified into various functional types, across several vertebrate organisms. We used primary sequence data for collagen types for which experimental data regarding the phenotype of higher-order structure are available (Table 1), to establish a procedure for periodicity p (...truncated)


This is a preview of a remote PDF: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463645/pdf/
Article home page: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463645

A. Puszkarska, D. Frenkel, L. Colwell, M. Duer. Using sequence data to predict the self-assembly of supramolecular collagen structures., Biophysical Journal, 2022, pp. 3023, Volume 121, Issue 16, DOI: 10.1016/j.bpj.2022.07.019