QSAR based predictive modeling for anti-malarial molecules.
Open access
www.bioinformation.net
Hypothesis
Volume 13(5)
QSAR based predictive modeling for anti-malarial
molecules
Deepak R. Bharti & Andrew M. Lynn*
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-67; Andrew M. Lynn; E-mail ; *Corresponding Author
Received March 17, 2017; Accepted April 21, 2017, Published May 31, 2017
Abstract:
Malaria is a predominant infectious disease, with a global footprint, but especially severe in developing countries in the African
subcontinent. In recent years, drug-resistant malaria has become an alarming factor, and hence the requirement of new and improved
drugs is more crucial than ever before. One of the promising locations for antimalarial drug target is the apicoplast, as this organelle does
not occur in humans. The apicoplast is associated with many unique and essential pathways in many Apicomplexan pathogens, including
Plasmodium. The use of machine learning methods is now commonly available through open source programs. In the present work, we
describe a standard protocol to develop molecular descriptor based predictive models (QSAR models), which can be further utilized for the
screening of large chemical libraries. This protocol is used to build models using training data sourced from apicoplast specific bioassays.
Multiple model building methods are used including Generalized Linear Models (GLM), Random Forest (RF), C5.0 implementation of a
decision tree, Support Vector Machines (SVM), K-Nearest Neighbour and Naive Bayes. Methods to evaluate the accuracy of the model
building method are included in the protocol. For the given dataset, the C5.0, SVM and RF perform better than other methods, with
comparable accuracy over the test data.
Keywords: Malaria, apicoplast, predictive model building, R statistical package
Background:
Malaria is endemic in many tropical and subtropical regions
causing high mortality and morbidity. In the last 10-15 years, due to
efforts of a global malaria eradication campaign, a significant fall
has been observed in malaria infection cases. However, at the end
of 2015, there were 212 million new cases of malaria and 429
thousand deaths have been reported across the globe. The majority
of death cases have been recorded in Africa (~92 %) and the SouthEast Asia Region (~6%) [1].
Artemisinin derivatives are regarded as most effective drugs
against malaria since the mid-1990s. In 2005, the WHO has
recommended artemisinin-combination therapies (ACTs) be the
first-line treatments for P. falciparum malaria worldwide [2]. The
Artemisinin-derived molecules (ACTs) have a broad spectrum of
activity (more than 120 targets) against many biologically
important pathways of Plasmodium [3]. Despite their effectiveness,
ISSN 0973-2063 (online) 0973-8894 (print)
Bioinformation 13(5): 154-159 (2017)
154
drug-resistant malaria has been emerged in many Asian and
African countries in recent years [4]–[7]. This scenario threatens the
worldwide efforts for complete eradication of malaria and hence it
is imperative to identify more drug targets as well as potent drugs
to regulate the disease before current therapeutic agents lose their
clinical relevance. Studies reveal that one of the most promising
targets is the apicoplast due to its involvement in many essential
biological pathways unique to Plasmodium [8].
An apicoplast is a non-photosynthetic vestigial plastid, bounded by
four membrane layers, which occurs in almost all apicomplexan
parasites. It has a 35 kb circular DNA quite similar to a
cyanobacterial genome, which encodes approximately 55-60 genes
of unknown functionality. However, Its presence is crucial for the
cell [9]. There are various genetic and pharmacological studies,
which confirm its essential role in cell survival. Genome analysis of
apicoplast indicates their role in the biosynthesis of many
©2017
Open access
important products including type II fatty acids, heme and ironsulphur cluster, and isoprenoid precursors [10]. The pathways
related to above products are essentially similar to those of bacteria
due to their endosymbiotic origin and entirely different from the
pathways of the host organism. There were many antimalarial
drugs proposed which targets cellular machinery (proteins/DNA)
essential for cell survival ranging from replication, transcription,
translation (parasite as well as apicoplast), fatty acid biosynthesis,
heme, Iron-sulphur cluster and isoprenoid synthesis (exclusive to
apicoplast). Earlier, targeting products of apicoplast gained
popularity e.g. FASII pathway, but several genetic and
pharmacological studies show evidence for the off-target activity of
the inhibitor [11]. There were some successful attempts of targeting
isoprenoid pathway [12] and heme biosynthesis [13], [14] already
reported. Beside those anabolic pathway-based drug targets, efforts
have been made to obstruct the cellular processes of apicoplast such
as replication [15], transcription [16] and translation [17], as these
processes are known to be quite similar to those of bacteria. Hence,
antibacterial drugs are also considered as potential drugs for the
malaria parasite. Recent reviews have listed various targets and
related drugs [18]–[20]. A detailed view of target proteins
summarizes pathways and drug candidates are listed in table 1. In
the present study we are focus on predictive model building using
bioassay data causing delayed death in malaria parasites. A
delayed death is the very interesting phenomena where parasites
survive, infect and multiplied but progeny is unable to infect host.
With advancement in high-throughput bioassay techniques and
computational resources, managing structural information along
with bioactivity reading has become a well-established practice.
This information can be utilized to screen large chemical libraries
virtually, which reduces the cost and time for identifying potential
drug-like molecules for further screening stages. One approach to
applying this information is predictive model building. In recent
years, numerous successful implementations of machine learning
(ML) techniques are published for virtual screening of biologically
active compounds [21]–[24]. In the present study, we employed
various state of the art machine-learning techniques to build
classification models using publicly available antimalarial bioassay
data with known inhibitory effect against apicoplast formation.
To build a robust predictive model we define best practices for data
cleaning, preprocessing, feature selection and model building,
which are described in this manuscript. A schematic overview of
the model building workflow can be seen in Figure 1, and is
described in detail in the next section. The met (...truncated)