Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area

eLife, Mar 2015

To improve understanding of the factors influencing tuberculosis transmission and the role of pathogen variation, we sequenced all available specimens from patients diagnosed over 15 years in a whole district in Malawi. Mycobacterium tuberculosis lineages were assigned and transmission networks constructed, allowing ≤10 single nucleotide polymorphisms (SNPs) difference. We defined disease as due to recent infection if the network-determined source was within 5 years, and assessed transmissibility from forward transmissions resulting in disease. High-quality sequences were available for 1687 disease episodes (72% of all culture-positive episodes): 66% of patients linked to at least one other patient. The between-patient mutation rate was 0.26 SNPs/year (95% CI 0.21–0.31). We showed striking differences by lineage in the proportion of disease due to recent transmission and in transmissibility (highest for lineage-2 and lowest for lineage-1) that were not confounded by immigration, HIV status or drug resistance. Transmissions resulting in disease decreased markedly over time.

Article PDF cannot be displayed. You can download it here:

http://elifesciences.org/content/elife/4/e05166.full.pdf

Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area

RESEARCH ARTICLE elifesciences.org Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area JA Guerra-Assunção1, AC Crampin1,2, RMGJ Houben1, T Mzembe2, K Mallard3, F Coll3, P Khan1, L Banda2, A Chiwaya2, RPA Pereira3, R McNerney3, PEM Fine1, J Parkhill4, TG Clark3, JR Glynn1* 1 Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom; 2Karonga Prevention Study, Malawi, Malawi; 3Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom; 4Wellcome Trust Sanger Institute, Hinxton, United Kingdom Abstract To improve understanding of the factors influencing tuberculosis transmission and the role of pathogen variation, we sequenced all available specimens from patients diagnosed over 15 years in a whole district in Malawi. Mycobacterium tuberculosis lineages were assigned and transmission networks constructed, allowing ≤10 single nucleotide polymorphisms (SNPs) difference. We defined disease as due to recent infection if the network-determined source was within 5 years, and assessed transmissibility from forward transmissions resulting in disease. High-quality sequences were available for 1687 disease episodes (72% of all culture-positive episodes): 66% of patients linked to at least one other patient. The between-patient mutation rate was 0.26 SNPs/year (95% CI 0.21–0.31). We showed striking differences by lineage in the proportion of disease due to recent transmission and in transmissibility (highest for lineage-2 and lowest for lineage-1) that were not confounded by immigration, HIV status or drug resistance. Transmissions resulting in disease decreased markedly over time. DOI: 10.7554/eLife.05166.001 *For correspondence: judith. Competing interests: The authors declare that no competing interests exist. Funding: See page 15 Received: 14 October 2014 Accepted: 22 January 2015 Published: 03 March 2015 Reviewing editor: Quarraisha Abdool Karim, University of KwaZulu Natal, South Africa Copyright Guerra-Assunção et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited. Introduction Despite the huge global burden of tuberculosis, the factors influencing transmission remain poorly understood. Compared to other bacteria, the genome of Mycobacterium tuberculosis is stable and genetic variation was thought to be limited, but with increased sequencing, greater diversity has been recognized (Homolka et al., 2010). Based on the genotype, M. tuberculosis has seven lineages: three ‘ancient’ (lineage-1 and two Mycobacterium africanum lineages), and three ‘modern’ (lineages-2, 3, 4) (Comas et al., 2009), and one intermediate (lineage-7), recently described in Ethiopia (Firdessa et al., 2013). The lineages may vary in propensity to transmit and cause disease (Thwaites et al., 2008; Homolka et al., 2010; Parwati et al., 2010; Gagneux, 2012), but results are inconsistent and there is considerable strain-to-strain variation within lineages (Portevin et al., 2011; Mathema et al., 2012). Lineage-2 (Beijing) strains are associated with increasing spread and drug resistance in some areas but not others (European Concerted Action on New Generation Genetic Markers, 2006), and with a lower (Click et al., 2012) or higher (Kong et al., 2007) proportion of extrapulmonary tuberculosis. M. africanum has been associated with lower virulence (de Jong et al., 2008), and lineage-1 with faster sputum smear conversion (Click et al., 2013). In low incidence settings, lineage is often associated with immigrant sub-groups, and while host–pathogen co-evolution has been suggested, it is difficult to disentangle the effects of lineage and host susceptibility on pathogenesis (Reed et al., 2009; Gagneux, 2012; Pareek et al., 2013). Guerra-Assunção et al. eLife 2015;4:e05166. DOI: 10.7554/eLife.05166 1 of 17 Research article Epidemiology and global health eLife digest Tuberculosis is an important public health threat around the globe and is particularly common in developing countries. It is difficult to control the spread of the disease because the bacteria that cause it can spread when an infected individual coughs or sneezes. It may take years for an infected individual to develop symptoms of tuberculosis so it can be hard to trace the source of an outbreak, and people infected with HIV are particularly susceptible to the disease. The bacterium that causes the majority of cases of tuberculosis is called Mycobacterium tuberculosis. There are several different varieties or ‘lineages’ of M. tuberculosis, and it is thought that they may vary in their ability to spread and cause disease. However, the results of previous studies have been inconsistent and there also seems to be a lot of variation between strains within the same lineage. In this study, Guerra-Assunção et al. used an approach called whole genome sequencing alongside more traditional methods to study the spread of tuberculosis in Malawi. They sequenced the genomes of every available sample of M. tuberculosis collected from patients in the Karonga district of Malawi over a 15-year period. This produced high-quality DNA sequence data about the bacteria responsible for almost 1700 cases of disease. Using this massive amount of data, Guerra-Assunção et al. constructed networks that showed how the bacteria had spread in the community. This revealed that there were differences between the ability of the various M. tuberculosis lineages to cause disease and to spread in communities. For example, lineage 1 was less likely than the other lineages to cause disease soon after infecting an individual and was less able to spread. The data also show that the proportion of cases of disease due to recent infection declined substantially during the 15-year period. This indicates that the tuberculosis and HIV control programmes in the area have been successful. Guerra-Assunção et al.’s findings show that it is possible to understand how tuberculosis is transmitted on a large scale. The next challenge is to understand why the lineages differ in their ability to cause disease and spread between individuals. DOI: 10.7554/eLife.05166.002 Since the 1990s, methods such as RFLP based on the insertion element IS6110 (van Embden et al., 1993) have been used to distinguish clusters of patients with shared DNA-fingerprint patterns, suggesting recent transmission (Small et al., 1994), but within the clusters, these methods cannot distinguish who transmitted to whom. Whole genome sequencing provides far greater resolution, and if data are collected in a whole population over several years, single nucleotide polymorphisms (SNPs) can be used to construct transmission networks (Bryant et al., 2013; Walker et al., 2013, 2014). In low-incidence set (...truncated)


This is a preview of a remote PDF: http://elifesciences.org/content/elife/4/e05166.full.pdf
Article home page: http://elifesciences.org/content/4/e05166

JA Guerra-Assunção, AC Crampin, RMGJ Houben, T Mzembe, K Mallard, F Coll, P Khan, L Banda, A Chiwaya, RPA Pereira, R McNerney, PEM Fine, J Parkhill, TG Clark, JR Glynn. Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area, eLife, 2015, DOI: 10.7554/eLife.05166