Identification of Biclusters in Huntington’s Disease Dataset Using a New Variant of Grey Wolf Optimizer

Journal of The Institution of Engineers (India): Series B, Nov 2022

Biclustering is a useful technique to identify subgroups of genes that have same type of expression characteristics with respect to some conditions in microarray gene expression data. This is a complex problem where meta-heuristic algorithms are more suitable to explore the large datasets for finding biclusters of optimal quality. In this paper, there is an attempt for the first time to choose biclusters with respect to shifting and scaling behaviors using Huntington's disease database applying Grey Wolf Optimizer (GWO) along with its proposed modified version namely, Enhanced Search Grey Wolf Optimizer (ES-GWO). ES-GWO incorporates strategies that make the search process more balanced with respect to exploration and exploitation compared to the state-of-the-art techniques (GWO, RM-GWO). The efficacy of ES-GWO is validated on several benchmark instances and compared with the existing meta-heuristic techniques (PSO, HS, Firefly, ABC and DE) based on convergence quality. Finally, from 100 biclusters produced by ES-GWO top 5 were separated. 7 genes common in those 5 biclusters have proved to be biologically significant.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s40031-022-00815-6.pdf

Identification of Biclusters in Huntington’s Disease Dataset Using a New Variant of Grey Wolf Optimizer

J. Inst. Eng. India Ser. B https://doi.org/10.1007/s40031-022-00815-6 ORIGINAL CONTRIBUTION Identification of Biclusters in Huntington’s Disease Dataset Using a New Variant of Grey Wolf Optimizer Joy Adhikary1 · Sriyankar Acharyya1 Received: 25 January 2022 / Accepted: 23 September 2022 © The Institution of Engineers (India) 2022 Abstract Biclustering is a useful technique to identify subgroups of genes that have same type of expression characteristics with respect to some conditions in microarray gene expression data. This is a complex problem where meta-heuristic algorithms are more suitable to explore the large datasets for finding biclusters of optimal quality. In this paper, there is an attempt for the first time to choose biclusters with respect to shifting and scaling behaviors using Huntington’s disease database applying Grey Wolf Optimizer (GWO) along with its proposed modified version namely, Enhanced Search Grey Wolf Optimizer (ES-GWO). ES-GWO incorporates strategies that make the search process more balanced with respect to exploration and exploitation compared to the state-of-the-art techniques (GWO, RM-GWO). The efficacy of ES-GWO is validated on several benchmark instances and compared with the existing meta-heuristic techniques (PSO, HS, Firefly, ABC and DE) based on convergence quality. Finally, from 100 biclusters produced by ES-GWO top 5 were separated. 7 genes common in those 5 biclusters have proved to be biologically significant. Keywords Gene expression data · Meta-heuristics · Grey wolf optimization · Biclustering Introduction Microarray method is widely used to study the expression levels of several genes in organs or cells. It gives * Sriyankar Acharyya 1 Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India large-throughput expression matrices that can be used to compare the expression levels of genes in different clinical conditions [1, 2]. These high dimensional matrices contain gene expression levels that can reflect gene activities related to different physiological status [3]. Microarray datasets corresponding to different diseases are considered and analyzed to find some special subsets of genes having similarity in their expressional behavior. The dataset used here is taken from Huntington’s disease which is a rare type genetic disorder. To develop this disease, a person needs only one copy of an inactive gene. Apart from the genes in the sex chromosomes, a person receives two copies of the entire gene set-one copy from each parent [4]. A parent with a genetic defect can transmit a copy with defect or a healthy copy. Therefore, each child in the family is 50% likely to have a genetic predisposition. It causes further deterioration of nerve cells and acts on different parts of the brain, consequently affecting the movement, behavior, and perception. It becomes difficult to walk, to think, to swallow and to speak. Eventually, that person will need full-time care. Signs and symptoms appear in people in their 30–50 s [4–6]. Biclustering [2] is a kind of data mining approach that identifies subgroups of genes in the high-throughput expression matrices. These subgroups of genes are identical in expression patterns under some selected conditions. A subgroup of genes represents some cellular processes that are only active under some subset of conditions [7]. The biclustering problem can be solved using Swarm Intelligence (SI) which is one of the most popular branches of meta-heuristic algorithms. The method considered here is Grey Wolf Optimizer (GWO) which belongs to the group of SI-based algorithms. GWO is based on social hierarchy mechanism of wolves and their hunting strategies. This research has proposed a new variant of GWO, namely, Enhanced Search Grey Wolf Optimizer (ES-GWO) 13 Vol.:(0123456789) J. Inst. Eng. India Ser. B which has been implemented to identify biclusters based on shifting and scaling characteristics. The proposed variant used randomized movement that provides decent exploration in search process. This research has also incorporated an inertia weight strategy called, weight cosine control factor strategy [8]. It makes search process more balanced. The efficacy of proposed variant has tested on several unimodal and multimodal benchmark functions. Statistical test has performed to validate the result of ES-GWO [9]. The results yielded by the proposed variant have also been validated with the existing methods like Particle Swarm Optimization (PSO) [10], Harmony Search (HS) [11], Firefly [12], Artificial Bee Colony (ABC) [13] and Differential Evolution (DE) [14]. This research attempts for the first time to identify shifting and scaling behavior-based biclusters from Huntington’s disease datasets. Results of ES-GWO on real life dataset have been validated with that of the state-of-theart of GWO (GWO, RM-GWO). The efficacy of ES-GWO has been observed on the both data dataset. In each dataset, ES-GWO successfully finds biological relevant biclusters. Finally, from 100 optimal biclusters identified by ES-GWO top 5 were separated on the basis of cost function. 7 genes common in those top 5 biclusters have been selected and validated (by p-value) to be biologically significant. Related Works Cheng and Church (CC) [15] introduced the concept of bicluster, involving some selected genes and some selected conditions having a good similarity measure. The concept bicluster overcomes several problems associated with traditional clustering methods. Researchers used Mean Squared Residue (MSR) [15] to measure the coherence present in genes and conditions belonging to a bicluster [15]. The strength of Mean Squared Residue (MSR) is only capturing the constant and shifting biclusters [16]. It is not able to detect scaling biclusters. In the investigations [16], researchers used a new cost function called Scaling Mean Squared Residue (SMSR) to detect the scaling patterns effectively. Huang et al. [7] introduced an algorithm named Condition-Based Evolutionary Biclustering (CBEB). It is based on Evolutionary Algorithms (EA). It is used to detect biclusters by the parallelizing search strategy. This work incorporates MSR metric with predefined threshold to obtained better results [7]. Thangavel et al. [17] proposed a hybrid algorithm called, Hybrid PSO-SA-BIC that combines binary PSO and Simulated Annealing together. PSO-SA-BIC identified highly correlated biclusters having 13 larger volume [17]. In the literature [2], researchers proposed an algorithm, named Evolutionary Biclustering based on Expression Patterns (Evo-Bexpa). Researchers used a cost function that measure quality, volume, overlapping amount and gene variance of biclusters based on shifting and scaling pattern. Adhikary et al. [18] reported Random Move Grey Wolf Optimizer (RM-GWO) to find biclusters on Parkinson’s disease dataset. This research has considered RM-GWO [18] as state-of-the art algorithm to va (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s40031-022-00815-6.pdf
Article home page: https://link.springer.com/article/10.1007/s40031-022-00815-6

Adhikary, Joy, Acharyya, Sriyankar. Identification of Biclusters in Huntington’s Disease Dataset Using a New Variant of Grey Wolf Optimizer, Journal of The Institution of Engineers (India): Series B, 2022, pp. 1-12, DOI: 10.1007/s40031-022-00815-6