Structural Disorder in Eukaryotes
Citation: Pancsa R, Tompa P (
Structural Disorder in Eukaryotes
Rita Pancsa 0
Peter Tompa 0
Laszlo Buday, Hungarian Academy of Sciences, Hungary
0 1 VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium, 2 Institute of Enzymology, Hungarian Academy of Sciences , Budapest , Hungary
Based on early bioinformatic studies on a handful of species, the frequency of structural disorder of proteins is generally thought to be much higher in eukaryotes than in prokaryotes. To refine this view, we present here a comparative prediction study and analysis of 194 fully described eukaryotic proteomes and 87 reference prokaryotes for structural disorder. We found that structural disorder does distinguish eukaryotes from prokaryotes, but its frequency spans a very wide range in the two superkingdoms that largely overlap. The number of disordered binding regions and different Pfam domain types also contribute to distinguish eukaryotes from prokaryotes. Unexpectedly, the highest levels - and highest variability - of predicted disorder is found in protists, i.e. single-celled eukaryotes, often surpassing more complex eukaryote organisms, plants and animals. This trend contrasts with that of the number of domain types, which increases rather monotonously toward more complex organisms. The level of structural disorder appears to be strongly correlated with lifestyle, because some obligate intracellular parasites and endosymbionts have the lowest levels, whereas host-changing parasites have the highest level of predicted disorder. We conclude that protists have been the evolutionary hot-bed of experimentation with structural disorder, in a period when structural disorder was actively invented and the major functional classes of disordered proteins established.
-
Deciphering protein structures has been instrumental in
understanding the molecular principles of life. Yet, the recent
most exciting development in structural biology is the recognition
that many proteins (intrinsically disordered proteins, IDPs) or
regions of proteins (intrinsically disordered regions, IDRs) exist
and function without a well-defined structure [1,2,3]. The
existence and functioning of IDPs/IDRs demand a radical
extension of the structure-function paradigm to encompass their
non-conventional functional modes. The functional advantages of
structural disorder are manifested either directly, in functions
termed entropic chains, or in molecular recognition, in the form of
adaptable binding [4], uncoupling specificity from binding
strength [5] or increasing the speed of interactions [6,7], among
others. Due to these advantages, an elevated level of structural
disorder can be found in proteins involved in signaling and
regulation, and structural disorder is often associated with disease,
such as cancer and neurodegeneration [7].
The functional advantages and functional types of IDPs/IDRs
predisposes them for roles in complex organisms, in broad
agreement with the observed phylogenetic distribution of
structural disorder [8,9,10]. Based on previous studies on a few
genomes available at the time (usually comparing predicted
disorder in 45 eukaryotes to bacteria and archea), it has become
generally accepted that structural disorder is significantly higher in
eukarytoes than in prokaryotes, expressed by the notion that
structural disorder correlates with complexity. Besides these
comparative studies, the level of disorder was only addressed in
particular phylogenetic groups, such as bacteria [11,12], archaea
[13] or a few protists within eukaryotes [14,15]. More recent
studies presented large-scale analyses, without trying to derive
general conclusions [16]. The suggested correlation with
complexity was directly addressed for organisms of known complexity
measures (number of different cell types) [17]. It was found that
disorder has a tendency to increase in evolution, but its correlation
with complexity within eukaryotes is marginal.
Therefore, even these limited studies have raised certain caveats
to the above generalizations, and suggested exceptions to the
seemingly simple and general rule. For example, studies on the
distribution of predicted structural disorder in prokaryotes has
shown wide variations as a function of growth temperature, with
mesophiles an thermophyles covering a very broad range from
,1.5% to ,25% but hyperthermophiles having much less
[11,12]. Archaea were also found to show wide disorder
distribution, with strong genomic variations depending on habitat
and lifestyle [13]. Turning to eukaryotes, Apicomplexan protists
single-celled eukaryotes - have shown unexpectedly high levels of
predicted disorder, way exceeding that of apparently more
complex metazoan organisms [14]. Similar conclusions were
drawn in a study of a handful of early-branching protists [15],
which again showed a high level of predicted disorder surpassing
the average of eukaryotic proteins in SwissProt. It was raised that
structural disorder may be associated with the parasitic lifestyle of
these organisms.
As apparent from this short overview, structural disorder has
not been systematically and comparatively analyzed in eukaryotes.
Apparently, one of the reasons is a very fast advance in sequencing
efforts, due to which about two-thirds of known eukaryotic
genomes became available in the past five years or so.
Furthermore, the results of distinct studies are hard to compare,
because they rely on different disorder predictors usually based on
different principles and having significantly different rates of
confidence [18,19,20]. In addition, often related but different
measures of structural disorder (frequency of disordered residues,
frequency of proteins with a long IDR, or frequency of mostly
disordered proteins) are applied, which again impedes
comparisons and sound generalizations. Therefore, we decided to predict
and compare structural disorder in 194 available eukaryotic
proteomes (and 87 reference prokaryotes) with the IUPred algorithm
[21,22]. We extended and complemented these calculations with
predictions of the prevalence of Pfam domains and comparing
disorder within and outside domains, because: i) disordered regions
often harbor binding motifs for domains [23], ii) disordered regions
often function by acting as linkers between flanking domains, and iii)
structural disorder may also be present in Pfam domains themselves
[24]. The novel data on the phylogenetic distribution of structural
disorder, Pfam domain types, and their varied correlation in
different types of species refine previous limited generalizations and
provide novel insight into the evolutionary and functional
implications of structural disorder.
Eukaryotic, prokaryotic and archaeal proteomes
Most of the eukaryotic proteomes were downloaded from the
complete proteome set of the UniProt database [25], and some
additional ones from the RefSeq database [26]. To avoid
redundancy, we usually used only one proteome for specie (...truncated)