Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster
PLOS ONE
RESEARCH ARTICLE
Heuristic-enabled active machine learning: A
case study of predicting essential
developmental stage and immune response
genes in Drosophila melanogaster
Olufemi Tony Aromolaran1,2*, Itunu Isewon1,2, Eunice Adedeji2,3, Marcus Oswald4,5,
Ezekiel Adebiyi1,2, Rainer Koenig4,5, Jelili Oyelade ID1,2*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
1 Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria, 2 Covenant
University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria, 3 Department of
Biochemistry, Covenant University, Ota, Ogun State, Nigeria, 4 Integrated Research and Treatment Center,
Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany, 5 Institute
of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
* (JO); (OTA)
OPEN ACCESS
Citation: Aromolaran OT, Isewon I, Adedeji E,
Oswald M, Adebiyi E, Koenig R, et al. (2023)
Heuristic-enabled active machine learning: A case
study of predicting essential developmental stage
and immune response genes in Drosophila
melanogaster. PLoS ONE 18(8): e0288023. https://
doi.org/10.1371/journal.pone.0288023
Editor: Jian Xu, East China Normal University
School of Life Sciences, CHINA
Received: April 3, 2023
Accepted: June 18, 2023
Published: August 9, 2023
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0288023
Copyright: © 2023 Aromolaran et al. This is an
open access article distributed under the terms of
the Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The data used for
model evaluation is publicly available on UCI
machine learning repository. The source code for
Abstract
Computational prediction of absolute essential genes using machine learning has gained
wide attention in recent years. However, essential genes are mostly conditional and not
absolute. Experimental techniques provide a reliable approach of identifying conditionally
essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental
methods. Computational techniques such as supervised machine learning, or flux balance
analysis are grossly limited due to the unavailability of required data for training the model or
simulating the conditions for gene essentiality. This study developed a heuristic-enabled
active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster.
We proposed a new sampling selection technique and introduced a heuristic function which
replaces the human component in traditional active learning models. The heuristic function
dynamically selects the unlabelled samples to improve the performance of the classifier in
the next iteration. Testing the proposed model with four benchmark datasets, the proposed
model showed superior performance when compared to traditional active learning models
(random sampling and uncertainty sampling). Applying the model to identify conditionally
essential genes, four novel essential immune response genes and a list of 48 novel genes
that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and
the result evidence our predictions. Immune response and embryonic development related
processes were significantly enriched in the essential immune response and embryonic
developmental genes, respectively. Finally, we propose the predicted essential genes for
future experimental studies and use of the developed tool accessible at http://heal.
covenantuniversity.edu.ng for conditional essentiality predictions.
PLOS ONE | https://doi.org/10.1371/journal.pone.0288023 August 9, 2023
1 / 23
PLOS ONE
the heal application is available at https://github.
com/phemmy2k2/conditional-essentiality. The data
can be accessed through the link to the data
repository as shown below: https://zenodo.org/
record/8117236.
Funding: 1. Deutsche Forschungsgemeinschaft
(https://www.dfg.de/) within the project KO 3678/
5-1, and the German Federal Ministry of Education
and Research (BMBF, Fkz 01EO1002, 01EO1502
and 13N15711) 2. World Bank awarded to
Covenant Applied Informatics and Communication
Africa Centre of Excellence (CApIC-ACE) through
the ACE Impact Project (2019 – 2024) The funders
had no role in study design, data collection and
analysis, decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Predicting essential developmental stage and immune response genes in Drosophila melanogaster
Introduction
A gene is defined as absolute essential if its loss of function causes infertility or death in an
organism or cell. There are a few computational approaches for predicting gene essentiality
including homology search and evolutionary analysis-based approach [1], constraint-based
methods [2], and machine learning (ML) approaches [3, 4]. Conditionally essential genes are
genes that are essential in a particular condition but non-essential in another condition.
Conditional essentiality has predominantly been defined in terms of growth conditions [5,
6]. Recent systematic studies of gene essentiality revealed that two sets of essential genes exist;
core essential genes that are always required for viability, and conditional essential genes that
vary in essentiality in different genetic and environmental contexts [7]. The variability in
essentiality depends on the phenotype being assessed (lethality, reproduction, growth and/or
development), the species in which the gene is encoded and environmental/growth conditions
[8, 9]. Costanzo and colleagues posited that environments often affect genes with a close functional relation to the pathways that are perturbed by a condition [10].
Another cause of variability in gene essentiality is experimental conditions such as temperature, pH, nutrient availability and/or, potentially, exposure to pathogens or microbes. Conditional essentiality has been linked to genetic factors. Some studies that systematically
compared gene essentiality among closely related yeast isolates identified modifier loci that
alter gene essentiality [11, 12]. Genetic factors also give rise to a phenomenon known as synthetic lethality where the loss of one of two genes that perform similar functions could render
non-essen (...truncated)