Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes

Jan 2003

Background Comparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task. Results We addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes. Conclusions The notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution.

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2148-3-2.pdf

Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes

Boris G Mirkin 1 Trevor I Fenner 1 Michael Y Galperin 0 Eugene V Koonin 0 0 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD 20894 , USA 1 School of Information Systems and Computer Science, Birkbeck College, University of London , Malet Street, London, WC1E 7HX , UK Background: Comparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task. Results: We addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random - shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes. Conclusions: The notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution. Background As soon as genome sequencing allowed phylogenetic analysis of large protein families, it became clear that different sets of orthologs often produce different tree topologies. The incongruence between tree topologies affects even the most fundamental splits in the history of life, such as the three-domain classification of life forms into bacteria, archaea and eukaryotes [14]. In particular, archaeal genes systematically show different phylogenetic affinities, with the components of translation, transcription and replication systems typically affiliating with eukaryotes, and metabolic enzymes and structural proteins displaying bacterial provenance [5,6]. Initially, the discrepancies between different trees have been attributed primarily to artifacts produced by tree-building methods. However, comparative genomics showed beyond reasonable doubt that lineage-specific gene loss and horizontal gene transfer (HGT) are major evolutionary phenomena, at least in the prokaryotic world [714]. The prominence of gene loss and HGT in the evolution of prokaryotes is apparent even without detailed phylogenetic tree analysis. Orthologous gene sets, such as those compiled in the database of Clusters of Orthologous Groups of proteins (COGs; http://www.ncbi.nlm.nih.gov/COG/), show a wide spread of phyletic patterns (i.e. patterns of presenceabsence of genomes in COGs), with most COGs including only a few lineages, and many having an odd composition, e. g. two bacterial and one archaeal species [12,15,16]. The COG database has been manually curated, with a special emphasis on the correct representation of all analyzed genomes in each COG [15,16]. Therefore, it seems impossible to explain these patterns without invoking massive, lineage-specific gene loss and HGT, and recent quantitative analysis has suggested that these processes contributed to the evolution of a substantial majority of orthologous sets of prokaryotic proteins [17]. Thus, comparative genomics might potentially undermine the very idea of a universal species tree because, inasmuch as HGT is shown to make a substantial contribution to genome evolution, no tree can, in principle, fully reflect the course of evolution of species [7,9,11,18,19]. Attempts to salvage the concept of a species tree, at least in a "weak" form, have been undertaken using comparative analysis of large, in some cases, genome-wide, gene sets. The idea behind these "genometree" approaches is that, in spite of the wide spread of gene loss and HGT, genomes might carry a signal of vertical inheritance and the strength of this phylogenetic signal was likely to be, roughly, inversely proportional to the evolutionary distance between species. The methods employed for genome-tree construction included comparison of gene content of orthologous sets, local gene order, and mean similarity between orthologs, as well as more traditional phylogenetic analysis of large gene sets thought to be minimally subject to gene loss and HGT, e.g., genes for ribosomal proteins and other components of the translation machinery [2029]. Taken together, these analyses suggest that, extensive gen (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/1471-2148-3-2.pdf
Article home page: http://www.biomedcentral.com/1471-2148/3/2

Boris G Mirkin, Trevor I Fenner, Michael Y Galperin, Eugene V Koonin. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes, 2003, pp. 2, 3, DOI: 10.1186/1471-2148-3-2