Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area
RESEARCH ARTICLE
elifesciences.org
Large-scale whole genome sequencing of
M. tuberculosis provides insights into
transmission in a high prevalence area
JA Guerra-Assunção1, AC Crampin1,2, RMGJ Houben1, T Mzembe2, K Mallard3,
F Coll3, P Khan1, L Banda2, A Chiwaya2, RPA Pereira3, R McNerney3, PEM Fine1,
J Parkhill4, TG Clark3, JR Glynn1*
1
Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical
Medicine, London, United Kingdom; 2Karonga Prevention Study, Malawi, Malawi; 3Faculty
of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine,
London, United Kingdom; 4Wellcome Trust Sanger Institute, Hinxton, United Kingdom
Abstract To improve understanding of the factors influencing tuberculosis transmission and the
role of pathogen variation, we sequenced all available specimens from patients diagnosed over 15
years in a whole district in Malawi. Mycobacterium tuberculosis lineages were assigned and
transmission networks constructed, allowing ≤10 single nucleotide polymorphisms (SNPs) difference.
We defined disease as due to recent infection if the network-determined source was within 5 years,
and assessed transmissibility from forward transmissions resulting in disease. High-quality sequences
were available for 1687 disease episodes (72% of all culture-positive episodes): 66% of patients
linked to at least one other patient. The between-patient mutation rate was 0.26 SNPs/year (95% CI
0.21–0.31). We showed striking differences by lineage in the proportion of disease due to recent
transmission and in transmissibility (highest for lineage-2 and lowest for lineage-1) that were not
confounded by immigration, HIV status or drug resistance. Transmissions resulting in disease
decreased markedly over time.
DOI: 10.7554/eLife.05166.001
*For correspondence: judith.
Competing interests: The
authors declare that no
competing interests exist.
Funding: See page 15
Received: 14 October 2014
Accepted: 22 January 2015
Published: 03 March 2015
Reviewing editor: Quarraisha
Abdool Karim, University of
KwaZulu Natal, South Africa
Copyright Guerra-Assunção
et al. This article is distributed
under the terms of the Creative
Commons Attribution License,
which permits unrestricted use
and redistribution provided that
the original author and source are
credited.
Introduction
Despite the huge global burden of tuberculosis, the factors influencing transmission remain poorly
understood. Compared to other bacteria, the genome of Mycobacterium tuberculosis is stable and
genetic variation was thought to be limited, but with increased sequencing, greater diversity has been
recognized (Homolka et al., 2010). Based on the genotype, M. tuberculosis has seven lineages: three
‘ancient’ (lineage-1 and two Mycobacterium africanum lineages), and three ‘modern’ (lineages-2, 3, 4)
(Comas et al., 2009), and one intermediate (lineage-7), recently described in Ethiopia (Firdessa et al.,
2013). The lineages may vary in propensity to transmit and cause disease (Thwaites et al., 2008;
Homolka et al., 2010; Parwati et al., 2010; Gagneux, 2012), but results are inconsistent and there is
considerable strain-to-strain variation within lineages (Portevin et al., 2011; Mathema et al., 2012).
Lineage-2 (Beijing) strains are associated with increasing spread and drug resistance in some areas
but not others (European Concerted Action on New Generation Genetic Markers, 2006), and with
a lower (Click et al., 2012) or higher (Kong et al., 2007) proportion of extrapulmonary tuberculosis.
M. africanum has been associated with lower virulence (de Jong et al., 2008), and lineage-1 with
faster sputum smear conversion (Click et al., 2013). In low incidence settings, lineage is often
associated with immigrant sub-groups, and while host–pathogen co-evolution has been suggested, it
is difficult to disentangle the effects of lineage and host susceptibility on pathogenesis (Reed et al.,
2009; Gagneux, 2012; Pareek et al., 2013).
Guerra-Assunção et al. eLife 2015;4:e05166. DOI: 10.7554/eLife.05166
1 of 17
Research article
Epidemiology and global health
eLife digest Tuberculosis is an important public health threat around the globe and is
particularly common in developing countries. It is difficult to control the spread of the disease
because the bacteria that cause it can spread when an infected individual coughs or sneezes. It may
take years for an infected individual to develop symptoms of tuberculosis so it can be hard to trace
the source of an outbreak, and people infected with HIV are particularly susceptible to the disease.
The bacterium that causes the majority of cases of tuberculosis is called Mycobacterium
tuberculosis. There are several different varieties or ‘lineages’ of M. tuberculosis, and it is thought
that they may vary in their ability to spread and cause disease. However, the results of previous
studies have been inconsistent and there also seems to be a lot of variation between strains within
the same lineage.
In this study, Guerra-Assunção et al. used an approach called whole genome sequencing
alongside more traditional methods to study the spread of tuberculosis in Malawi. They sequenced
the genomes of every available sample of M. tuberculosis collected from patients in the Karonga
district of Malawi over a 15-year period. This produced high-quality DNA sequence data about the
bacteria responsible for almost 1700 cases of disease.
Using this massive amount of data, Guerra-Assunção et al. constructed networks that showed
how the bacteria had spread in the community. This revealed that there were differences between
the ability of the various M. tuberculosis lineages to cause disease and to spread in communities. For
example, lineage 1 was less likely than the other lineages to cause disease soon after infecting an
individual and was less able to spread.
The data also show that the proportion of cases of disease due to recent infection declined
substantially during the 15-year period. This indicates that the tuberculosis and HIV control
programmes in the area have been successful.
Guerra-Assunção et al.’s findings show that it is possible to understand how tuberculosis is
transmitted on a large scale. The next challenge is to understand why the lineages differ in their
ability to cause disease and spread between individuals.
DOI: 10.7554/eLife.05166.002
Since the 1990s, methods such as RFLP based on the insertion element IS6110 (van Embden et al.,
1993) have been used to distinguish clusters of patients with shared DNA-fingerprint patterns,
suggesting recent transmission (Small et al., 1994), but within the clusters, these methods cannot
distinguish who transmitted to whom. Whole genome sequencing provides far greater resolution, and
if data are collected in a whole population over several years, single nucleotide polymorphisms (SNPs)
can be used to construct transmission networks (Bryant et al., 2013; Walker et al., 2013, 2014). In
low-incidence set (...truncated)