Unbiased Identification of Blood-based Biomarkers for Pulmonary Tuberculosis by Modeling and Mining Molecular Interaction Networks.
EBioMedicine 15 (2017) 112–126
Contents lists available at ScienceDirect
EBioMedicine
journal homepage: www.ebiomedicine.com
Research Paper
Unbiased Identification of Blood-based Biomarkers for Pulmonary
Tuberculosis by Modeling and Mining Molecular Interaction Networks
Awanti Sambarey a, Abhinandan Devaprasad a,1, Abhilash Mohan a,1, Asma Ahmed b, Soumya Nayak b,
Soumya Swaminathan c, George D'Souza d, Anto Jesuraj d, Chirag Dhar d, Subash Babu e,
Annapurna Vyakarnam b,f, Nagasuma Chandra a,⁎
a
Department of Biochemistry, IISc, Bangalore 560012, India
Centre for Infectious Disease Research (CIDR), IISc, Bangalore 560012, India
c
National Institute for Research in Tuberculosis, Mayor Sathiyamoorthy Road, Chetpet, Chennai 600031, India
d
St John's Research Institute, St. John's National Academy of Health Sciences, 560034 Bangalore, India
e
NIH-NIRT-ICER, Mayor Sathiyamoorthy Road, Chetpet, Chennai 600031, India
f
Department of Infectious Diseases, King's College London School of Medicine, Guy's Hospital, Great Maze Pond, London, UK
b
a r t i c l e
i n f o
Article history:
Received 23 September 2016
Received in revised form 16 December 2016
Accepted 16 December 2016
Available online 21 December 2016
Keywords:
Tuberculosis
Biomarkers
Network biology
Computational medicine
Diagnostics
a b s t r a c t
Efficient diagnosis of tuberculosis (TB) is met with multiple challenges, calling for a shift of focus from pathogencentric diagnostics towards identification of host-based multi-marker signatures. Transcriptomics offer a list of
differentially expressed genes, but cannot by itself identify the most influential contributors to the disease phenotype. Here, we describe a computational pipeline that adopts an unbiased approach to identify a biomarker signature. Data from RNA sequencing from whole blood samples of TB patients were integrated with a curated
genome-wide molecular interaction network, from which we obtain a comprehensive perspective of variations
that occur in the host due to TB. We then implement a sensitive network mining method to shortlist gene candidates that are most central to the disease alterations. We then apply a series of filters that include applicability to
multiple publicly available datasets as well as additional validation on independent patient samples, and identify
a signature comprising 10 genes — FCGR1A, HK3, RAB13, RBBP8, IFI44L, TIMM10, BCL6, SMARCD3, CYP4F3 and SLPI,
that can discriminate between TB and healthy controls as well as distinguish TB from latent tuberculosis and HIV
in most cases. The signature has the potential to serve as a diagnostic marker of TB.
© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Tuberculosis (TB) now ranks along with HIV as the leading cause of
death due to an infectious agent worldwide, with approximately 10.4
million people estimated to have acquired TB in 2015, resulting in 1.4
million deaths (World Health Organization, 2016). These deaths are
largely preventable by early and efficient diagnosis of the disease. Unfortunately, diagnosis is often delayed due to insensitive and time-consuming methods. Present diagnostic measures rely largely on the
detection of Mtb in patient samples together with radiological assessments, and they have several shortcomings. Sputum cultures are the
current standard for detecting Mtb, but while sensitive, they take 3–
6 weeks to provide conclusive results, thereby delaying the initiation
of treatment. Host-based diagnostic methods provide an alternative
⁎ Corresponding author.
E-mail address: (N. Chandra).
1
Joint second authors.
for early detection of TB onset and enable the monitoring of symptomatic changes. IFN-γ release assays (IGRAs) such as the T-SPOT.TB
(Richeldi, 2006; Pai et al., 2014) or the QuantiFERON test (Sultan et al.,
2010) measure IFN-γ + production in response to stimulation with
Mtb-specific antigens ESAT6 and CFP10 (Mazurek and Villarino, 2003;
Ravn et al., 2005). However, IGRAs cannot discriminate between active
and latent Mtb infection, and are thus inadequate for marking the disease status. In the clinic, IGRAs are used more often to detect latent tuberculosis than for diagnosis of active disease (Herrera et al., 2011).
Existing assays that rely on single-marker readouts, such as that of
serum deaminase levels (Gui and Xiao, 2014), also suffer from inadequate sensitivity and/or specificity, calling for more effective host-related multi-marker signatures that hold promise for applications in
prognostic research and vaccine trials as well as in monitoring treatment responses. There is thus a current need for a shift from investigations on single markers to high-coverage studies that will reveal
signatures consisting of multiple integrated markers (Maertzdorf et
al., 2014). Recent years have witnessed an increase in host omics data
http://dx.doi.org/10.1016/j.ebiom.2016.12.009
2352-3964/© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A. Sambarey et al. / EBioMedicine 15 (2017) 112–126
to identify specific gene variations upon infection with Mtb, including
genetic polymorphisms identified by GWAS and linkage and association
studies that ascribe host susceptibility to infection (Azad et al., 2012),
genome-wide expression variations in patient cohorts as compared to
healthy controls, as well as variations over the course of treatment in
the same patient.
Transcriptomics provide global coverage into host responses, and
are widely used in TB biomarker research (Maertzdorf et al., 2011a;
Joosten et al., 2013). One drawback of microarray technologies is the
lack of absolute and detailed evaluation of gene expression. Modern
deep sequencing technologies provide quantitative and qualitative information on gene expression and genomic composition down to the
single-nucleotide level (Normand and Yanai, 2013). RNA sequencing
(RNA-Seq) is fast gaining foothold, and provides more accurate measurements of transcript levels and their isoforms with greater sensitivity
than microarrays, as it overcomes probe-dependency (Wang et al.,
2009). RNA-seq has been applied to study host variations due to mycobacterial infections and has led to rich insights, an example being dual
RNA sequencing of host and pathogen in Mtb infected Thp-1 cells that
indicated a simultaneous induction of Mycobacterium bovis BCG cholesterol degradation genes and a compensatory upregulation in the host de
novo cholesterol biosynthesis genes (Rienksma et al., 2015). Recently, a
whole blood signature that could predict the risk of developing active
tuberculosis in patients with latent infection was identified by RNAseq data (Zak et al., 2016).
Although the immunological response against Mtb will be primarily
focused in the lung, its pathologic status is reflected in the (...truncated)