“De-novo” amino acid sequence elucidation of protein G′e by combined “Top-Down” and “Bottom-Up” mass spectrometry
J. Am. Soc. Mass Spectrom.
“De-novo” amino acid sequence elucidation of protein G′e by combined “Top-Down” and “Bottom-Up” mass spectrometry
Yelena Yefremova 1
Mahmoud Al-Majdoub 1
Kwabena F. M. Opuni 1
Cornelia Koy 1
Weidong Cui 0
Yuetian Yan 0
Michael L. Gross 0
Michael O. Glocker 1
0 Department of Chemistry , Washington University in St. Louis, St. Louis, MO , USA
1 Proteome Center Rostock, University Rostock Medical Center , Rostock , Germany
Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called “His-tag” as well as an N-terminal partial α-Ngluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G′ comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G′ (185 amino acids), we named this protein “protein G′e.” By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-Ngluconoylations, was confirmed with 100% sequence coverage. After the protein G′e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G′e in E. coli. A dissociation constant (Kd) value of 9.4 nM for protein G′e was determined thermophoretically, showing that the Nterminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.
-
S of the last century, it gained great importance in suggesting
ince DNA sequencing was introduced in the mid-seventies
amino acid sequences of proteins by simple translation of the
gene sequence [1]. However, significant possibilities of amino
acid sequence aberrations due to mutations, amino acid
substitutions in (recombinant) proteins (e.g., by wobbling, [2]), or by
altering the expression system, are inherent to this DNA-based
protein sequence determination approach [3, 4]. Unexpected
post-translational modifications (PTMs) are not accessible.
Continuously growing possibilities of mass
spectrometrybased fragmentation techniques, such as collisional induced
dissociation (CID) and electron capture dissociation (ECD),
enormously facilitate direct sequence determination of even fairly large
intact proteins by so-called “top-down” protein sequencing [5].
Consequently, this mass spectrometry-driven amino acid
sequencing approach opens the opportunity to revise
DNAderived sequence information of many proteins [6, 7]. The
importance of these MS-based sequencing avenues for
scientific projects has been emphasized by the fact that deviations in
previously annotated amino acid sequences of several
recombinant proteins have been reported [8–10]. Here we apply this
mass spectrometry-driven amino acid sequencing approach to
protein G′, a commercially available protein with great
scientific and economic importance that is available from many
companies around the world.
Protein G was discovered as a cell-surface protein of
different Streptococcus species in 1973 [11], and first amino acid
sequences were reported in the mid-eighties [12, 13]. Its
astounding binding properties to mammalian immunoglobulin G
(IgG) fostered extensive research on functional optimization up
to the mid-nineties [14–19]. Depending on the streptococcal
strain, protein G contains, in addition to three domains for IgG
binding, two or three domains that bind to mammalian serum
albumin [20, 21]. Initial difficulties in purification of protein G
directly from the streptococcal cell wall were overcome after
the DNA sequence of its encoding gene was successfully
overexpressed in E. coli [12, 22]. Later, truncated genes (e.g.,
from the Streptococcus strain G 148) that encoded just for the
three IgG binding domains were cloned and expressed in
E. coli. The shorter protein was named protein G′ [23] to
differentiate it from the full-length protein G. Owing to its
extraordinary high binding affinity to immunoglobulins,
protein G′ is now widely used in many immunologically and
biotechnologically applied techniques world-wide. When
coupled to a chromatography resin, protein G′ has become an
indispensable workhorse for affinity purification of antibodies
and of Ig-tagged recombinant proteins [24]. Versatile
applications of protein G′ have been reported numerously (reviewed in
[25, 26]), from which only a few shall be mentioned: isolation
of IgG fractions from patient samples; immuno-precipitation
[27, 28]; depletion of IgG from biological samples [29, 30];
Western blot analysis [31]; affinity membrane chromatography
[32]; peptide immunoaffinity enrichment using protein-G′
coated magnetic beads [33]; development of protein
G′coupled receptors [34]; and generation of immunosensors [35].
For studying the principles of function and the dynamics of
protein G′-binding to IgG, knowledge of its structure is a
prerequisite. Hence, the first piece of information, when
conducting a study on protein—protein interactions, is to
collect the amino acid sequences of both interaction partners. For
protein G′ this requirement sounds trivial, as recombinant
protein G′-containing products can be found in catalogs of
almost every supplier in the biotechnological field, including
Sigma-Aldrich, Merck-Millipore, Thermo-Scientific,
LifeTechnologies, and Biocat, to name just a few. According to
the product information provided by the suppliers, the
commercial protein G′ carries three IgG binding domains, which
calculate to a molecular mass of ca. 20 kDa. Yet, on sodium
dodecyl sulfate polyacrylamide gel electrophoresis
(SDSPAGE), protein G′ shows an apparent molecular weight of
ca. 35 kDa [12]. Strikingly, despite the huge sales market for
protein G′, information about the amino acid sequence of the
commercial products is poor. Vendors of recombinant protein
G′ are rarely able to provide the amino acid sequence of their
product. Upon request, customers are referred to the literature
from the 1980s and 1990s. Although the amino-acid sequence
that is given in the respective reports stands in agreement with
the molecular mass of 20 kDa for protein G′ [23], the mass of
the commercial product does not. Applying mass spectrometric
analysis to the product in our hands, we found a mass increase
of 6 kDa for which no explanation was retrievable. Information
about the existence of a His-tag and sometimes of biotinylation
did not explain the mass difference. Unfortunately, the aberrant
SDS-PAGE migration behavior of protein G′ prevents easy
discovery of any (...truncated)