“De-novo” amino acid sequence elucidation of protein G′e by combined “Top-Down” and “Bottom-Up” mass spectrometry (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs13361-014-1053-2.pdf

“De-novo” amino acid sequence elucidation of protein G′e by combined “Top-Down” and “Bottom-Up” mass spectrometry

J. Am. Soc. Mass Spectrom. “De-novo” amino acid sequence elucidation of protein G′e by combined “Top-Down” and “Bottom-Up” mass spectrometry Yelena Yefremova 1 Mahmoud Al-Majdoub 1 Kwabena F. M. Opuni 1 Cornelia Koy 1 Weidong Cui 0 Yuetian Yan 0 Michael L. Gross 0 Michael O. Glocker 1 0 Department of Chemistry , Washington University in St. Louis, St. Louis, MO , USA 1 Proteome Center Rostock, University Rostock Medical Center , Rostock , Germany Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called “His-tag” as well as an N-terminal partial α-Ngluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G′ comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G′ (185 amino acids), we named this protein “protein G′e.” By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-Ngluconoylations, was confirmed with 100% sequence coverage. After the protein G′e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G′e in E. coli. A dissociation constant (Kd) value of 9.4 nM for protein G′e was determined thermophoretically, showing that the Nterminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins. - S of the last century, it gained great importance in suggesting ince DNA sequencing was introduced in the mid-seventies amino acid sequences of proteins by simple translation of the gene sequence [1]. However, significant possibilities of amino acid sequence aberrations due to mutations, amino acid substitutions in (recombinant) proteins (e.g., by wobbling, [2]), or by altering the expression system, are inherent to this DNA-based protein sequence determination approach [3, 4]. Unexpected post-translational modifications (PTMs) are not accessible. Continuously growing possibilities of mass spectrometrybased fragmentation techniques, such as collisional induced dissociation (CID) and electron capture dissociation (ECD), enormously facilitate direct sequence determination of even fairly large intact proteins by so-called “top-down” protein sequencing [5]. Consequently, this mass spectrometry-driven amino acid sequencing approach opens the opportunity to revise DNAderived sequence information of many proteins [6, 7]. The importance of these MS-based sequencing avenues for scientific projects has been emphasized by the fact that deviations in previously annotated amino acid sequences of several recombinant proteins have been reported [8–10]. Here we apply this mass spectrometry-driven amino acid sequencing approach to protein G′, a commercially available protein with great scientific and economic importance that is available from many companies around the world. Protein G was discovered as a cell-surface protein of different Streptococcus species in 1973 [11], and first amino acid sequences were reported in the mid-eighties [12, 13]. Its astounding binding properties to mammalian immunoglobulin G (IgG) fostered extensive research on functional optimization up to the mid-nineties [14–19]. Depending on the streptococcal strain, protein G contains, in addition to three domains for IgG binding, two or three domains that bind to mammalian serum albumin [20, 21]. Initial difficulties in purification of protein G directly from the streptococcal cell wall were overcome after the DNA sequence of its encoding gene was successfully overexpressed in E. coli [12, 22]. Later, truncated genes (e.g., from the Streptococcus strain G 148) that encoded just for the three IgG binding domains were cloned and expressed in E. coli. The shorter protein was named protein G′ [23] to differentiate it from the full-length protein G. Owing to its extraordinary high binding affinity to immunoglobulins, protein G′ is now widely used in many immunologically and biotechnologically applied techniques world-wide. When coupled to a chromatography resin, protein G′ has become an indispensable workhorse for affinity purification of antibodies and of Ig-tagged recombinant proteins [24]. Versatile applications of protein G′ have been reported numerously (reviewed in [25, 26]), from which only a few shall be mentioned: isolation of IgG fractions from patient samples; immuno-precipitation [27, 28]; depletion of IgG from biological samples [29, 30]; Western blot analysis [31]; affinity membrane chromatography [32]; peptide immunoaffinity enrichment using protein-G′ coated magnetic beads [33]; development of protein G′coupled receptors [34]; and generation of immunosensors [35]. For studying the principles of function and the dynamics of protein G′-binding to IgG, knowledge of its structure is a prerequisite. Hence, the first piece of information, when conducting a study on protein—protein interactions, is to collect the amino acid sequences of both interaction partners. For protein G′ this requirement sounds trivial, as recombinant protein G′-containing products can be found in catalogs of almost every supplier in the biotechnological field, including Sigma-Aldrich, Merck-Millipore, Thermo-Scientific, LifeTechnologies, and Biocat, to name just a few. According to the product information provided by the suppliers, the commercial protein G′ carries three IgG binding domains, which calculate to a molecular mass of ca. 20 kDa. Yet, on sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDSPAGE), protein G′ shows an apparent molecular weight of ca. 35 kDa [12]. Strikingly, despite the huge sales market for protein G′, information about the amino acid sequence of the commercial products is poor. Vendors of recombinant protein G′ are rarely able to provide the amino acid sequence of their product. Upon request, customers are referred to the literature from the 1980s and 1990s. Although the amino-acid sequence that is given in the respective reports stands in agreement with the molecular mass of 20 kDa for protein G′ [23], the mass of the commercial product does not. Applying mass spectrometric analysis to the product in our hands, we found a mass increase of 6 kDa for which no explanation was retrievable. Information about the existence of a His-tag and sometimes of biotinylation did not explain the mass difference. Unfortunately, the aberrant SDS-PAGE migration behavior of protein G′ prevents easy discovery of any (...truncated)