A genomic scale map of genetic diversity in Trypanosoma cruzi (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-13-736.pdf

A genomic scale map of genetic diversity in Trypanosoma cruzi

Alejandro A Ackermann 0 Leonardo G Panunzi 0 Raul O Cosentino Daniel O Snchez Fernn Agero 0 Equal contributors Instituto de Investigaciones Biotecnologicas - Instituto Tecnologico de Chascomus (IIB-INTECH), Universidad Nacional de San Martin - Consejo de Investigaciones Cientificas y Tecnicas (UNSAM-CONICET), Sede San Martin , B 1650 HMP, San Martin, Buenos Aires , Argentina Background: Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale. Results: Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values. Conclusions: This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org). - Background Trypanosoma cruzi is a protozoan parasite of the order Kinetoplastida, and the causative agent of Chagas Disease, one of the so called neglected diseases that disproportionately affect the poor. The disease is endemic in most Latin American countries, affecting in excess of 8 million people [1]. Chagas disease has a variable clinical outcome. In its acute form it can lead to death (mostly in infants), while in its chronic form, it is a debilitating disease producing different associated pathologies: mega-colon, mega-esophagus and cardiomyopathy, among others. These different clinical outcomes are the result of a complex interplay between environmental factors, the host genetic background and the genetic diversity present in the parasite population. As a result, these different clinical manifestations have been suggested to be, at least in part, due to the genetic diversity of T. cruzi [2-5]. The T. cruzi species has a structured population, with a predominantly clonal mode of reproduction [6], and a considerable phenotypic diversity [7-10]. Through the use of a number of molecular markers the population has been divided in a number of evolutionary lineages, also called discrete typing units. Some markers allow the distinction of two or three major lineages [11-14], while other experimental strategies, such as RAPD and multilocus isoenzyme electrophoresis (MLEE) support the distinction of six subdivisions [15-17] originally designated as DTUs I, IIa, IIb, IIc, IId, and IIe [16]. Recently, this nomenclature was revised as follows: TcI, TcII (former TcIIb), TcIII (IIc), TcIV (TcIIa), TcV (TcIId) and TcVI (TcIIe) [18,19]. Lineages TcV and TcVI (which include the strain used for the first genomic sequence of T. cruzi, CL Brener) have a very high degree of heterozygosity but otherwise very homogeneous population structures with low intralineage diversity [20,21]. The currently favoured hypothesis suggests that these two lineages originated after either one or two independent hybridization events between strains of DTUs TcII and TcIII [21-23]. Knowledge of the genetic variation present in a genome (i.e. between the two alleles of a diploid individual) or in a species (i.e. in the population) is of central importance for a variety of reasons and applications: i) to understand the evolutionary forces underlying the biological and phenotypic properties observed in an individual; ii) to detect cases of apparent horizontal gene transfer; iii) to assess the potential for development of resistance when validating a target for drug development; iv) to prioritize targets for development of diagnostics or vaccines; v) in the design of constructs for genetic knockout experiments in order to increase the success rate when targeting specific alleles; and vi) as genetic markers in association studies or to further probe the population structure. The genome sequence of the CL-Brener clone of T. cruzi was published in 2005 [24], together with those of two other trypanosomatids of medical importance: Trypanosoma brucei (Sleeping sickness, African trypanosomiasis) [25] and Leishmania major (Leishmaniasis) [26]. However, the genome of T. cruzi was a particular case for a number of reasons: it was obtained from a hybrid TcVI strain composed of two divergent parental haplotypes; and it was sequenced using a whole genome shotgun strategy [24]. This choice of strain and sequencing strategy resulted in high sequence coverage from the two parental haplotypes, which were derived from ancestral TcII and TcIII strains. Because of the high allelic variation found within this diploid genome, a significant number of contigs were found to be present twice in the assembly [24]. These divergent haplotypes, which were assembled separately in many cases, were the basis of a recent re-assembly of the genome [27]. As a consequence, it is now possible to identify the genetic diversity present within this diploid genome. More recently a number of whole genome sequencing data have become available from different strains of T. cruzi: the draft genomic sequence of the Sylvio X10 (TcI) strain [28], high-coverage transcriptomic data, from another TcI strain (Westergaard G, and Vazquez MP, manuscript in preparation), as well as 2.5X WGS shotgun data from the Esmeraldo cl3 (TcII) strain. To take advantage of the hybrid genome of the CL-Brener strain, and of other genome and transcriptome da (...truncated)