Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster

PLOS ONE, Aug 2023

Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.

Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster

PLOS ONE RESEARCH ARTICLE Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster Olufemi Tony Aromolaran1,2*, Itunu Isewon1,2, Eunice Adedeji2,3, Marcus Oswald4,5, Ezekiel Adebiyi1,2, Rainer Koenig4,5, Jelili Oyelade ID1,2* a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 1 Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria, 2 Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria, 3 Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria, 4 Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany, 5 Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany * (JO); (OTA) OPEN ACCESS Citation: Aromolaran OT, Isewon I, Adedeji E, Oswald M, Adebiyi E, Koenig R, et al. (2023) Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster. PLoS ONE 18(8): e0288023. https:// doi.org/10.1371/journal.pone.0288023 Editor: Jian Xu, East China Normal University School of Life Sciences, CHINA Received: April 3, 2023 Accepted: June 18, 2023 Published: August 9, 2023 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0288023 Copyright: © 2023 Aromolaran et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The data used for model evaluation is publicly available on UCI machine learning repository. The source code for Abstract Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal. covenantuniversity.edu.ng for conditional essentiality predictions. PLOS ONE | https://doi.org/10.1371/journal.pone.0288023 August 9, 2023 1 / 23 PLOS ONE the heal application is available at https://github. com/phemmy2k2/conditional-essentiality. The data can be accessed through the link to the data repository as shown below: https://zenodo.org/ record/8117236. Funding: 1. Deutsche Forschungsgemeinschaft (https://www.dfg.de/) within the project KO 3678/ 5-1, and the German Federal Ministry of Education and Research (BMBF, Fkz 01EO1002, 01EO1502 and 13N15711) 2. World Bank awarded to Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) through the ACE Impact Project (2019 – 2024) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Predicting essential developmental stage and immune response genes in Drosophila melanogaster Introduction A gene is defined as absolute essential if its loss of function causes infertility or death in an organism or cell. There are a few computational approaches for predicting gene essentiality including homology search and evolutionary analysis-based approach [1], constraint-based methods [2], and machine learning (ML) approaches [3, 4]. Conditionally essential genes are genes that are essential in a particular condition but non-essential in another condition. Conditional essentiality has predominantly been defined in terms of growth conditions [5, 6]. Recent systematic studies of gene essentiality revealed that two sets of essential genes exist; core essential genes that are always required for viability, and conditional essential genes that vary in essentiality in different genetic and environmental contexts [7]. The variability in essentiality depends on the phenotype being assessed (lethality, reproduction, growth and/or development), the species in which the gene is encoded and environmental/growth conditions [8, 9]. Costanzo and colleagues posited that environments often affect genes with a close functional relation to the pathways that are perturbed by a condition [10]. Another cause of variability in gene essentiality is experimental conditions such as temperature, pH, nutrient availability and/or, potentially, exposure to pathogens or microbes. Conditional essentiality has been linked to genetic factors. Some studies that systematically compared gene essentiality among closely related yeast isolates identified modifier loci that alter gene essentiality [11, 12]. Genetic factors also give rise to a phenomenon known as synthetic lethality where the loss of one of two genes that perform similar functions could render non-essen (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0288023&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288023

Olufemi Tony Aromolaran, Itunu Isewon, Eunice Adedeji, Marcus Oswald, Ezekiel Adebiyi, Rainer Koenig, Jelili Oyelade. Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster, PLOS ONE, 2023, Volume 18, Issue 8, DOI: 10.1371/journal.pone.0288023