Identification of Biclusters in Huntington’s Disease Dataset Using a New Variant of Grey Wolf Optimizer
J. Inst. Eng. India Ser. B
https://doi.org/10.1007/s40031-022-00815-6
ORIGINAL CONTRIBUTION
Identification of Biclusters in Huntington’s Disease Dataset Using
a New Variant of Grey Wolf Optimizer
Joy Adhikary1 · Sriyankar Acharyya1
Received: 25 January 2022 / Accepted: 23 September 2022
© The Institution of Engineers (India) 2022
Abstract Biclustering is a useful technique to identify subgroups of genes that have same type of expression characteristics with respect to some conditions in microarray gene
expression data. This is a complex problem where meta-heuristic algorithms are more suitable to explore the large datasets for finding biclusters of optimal quality. In this paper,
there is an attempt for the first time to choose biclusters with
respect to shifting and scaling behaviors using Huntington’s
disease database applying Grey Wolf Optimizer (GWO)
along with its proposed modified version namely, Enhanced
Search Grey Wolf Optimizer (ES-GWO). ES-GWO incorporates strategies that make the search process more balanced
with respect to exploration and exploitation compared to the
state-of-the-art techniques (GWO, RM-GWO). The efficacy
of ES-GWO is validated on several benchmark instances and
compared with the existing meta-heuristic techniques (PSO,
HS, Firefly, ABC and DE) based on convergence quality.
Finally, from 100 biclusters produced by ES-GWO top 5
were separated. 7 genes common in those 5 biclusters have
proved to be biologically significant.
Keywords Gene expression data · Meta-heuristics · Grey
wolf optimization · Biclustering
Introduction
Microarray method is widely used to study the expression levels of several genes in organs or cells. It gives
* Sriyankar Acharyya
1
Department of Computer Science and Engineering, Maulana
Abul Kalam Azad University of Technology, Kolkata, India
large-throughput expression matrices that can be used to
compare the expression levels of genes in different clinical
conditions [1, 2]. These high dimensional matrices contain
gene expression levels that can reflect gene activities related
to different physiological status [3]. Microarray datasets corresponding to different diseases are considered and analyzed
to find some special subsets of genes having similarity in
their expressional behavior. The dataset used here is taken
from Huntington’s disease which is a rare type genetic disorder. To develop this disease, a person needs only one copy
of an inactive gene. Apart from the genes in the sex chromosomes, a person receives two copies of the entire gene
set-one copy from each parent [4]. A parent with a genetic
defect can transmit a copy with defect or a healthy copy.
Therefore, each child in the family is 50% likely to have
a genetic predisposition. It causes further deterioration of
nerve cells and acts on different parts of the brain, consequently affecting the movement, behavior, and perception. It
becomes difficult to walk, to think, to swallow and to speak.
Eventually, that person will need full-time care. Signs and
symptoms appear in people in their 30–50 s [4–6].
Biclustering [2] is a kind of data mining approach that
identifies subgroups of genes in the high-throughput expression matrices. These subgroups of genes are identical in
expression patterns under some selected conditions. A subgroup of genes represents some cellular processes that are
only active under some subset of conditions [7]. The biclustering problem can be solved using Swarm Intelligence (SI)
which is one of the most popular branches of meta-heuristic
algorithms. The method considered here is Grey Wolf Optimizer (GWO) which belongs to the group of SI-based algorithms. GWO is based on social hierarchy mechanism of
wolves and their hunting strategies.
This research has proposed a new variant of GWO,
namely, Enhanced Search Grey Wolf Optimizer (ES-GWO)
13
Vol.:(0123456789)
J. Inst. Eng. India Ser. B
which has been implemented to identify biclusters based on
shifting and scaling characteristics. The proposed variant
used randomized movement that provides decent exploration in search process. This research has also incorporated
an inertia weight strategy called, weight cosine control factor
strategy [8]. It makes search process more balanced. The
efficacy of proposed variant has tested on several unimodal
and multimodal benchmark functions. Statistical test has
performed to validate the result of ES-GWO [9]. The results
yielded by the proposed variant have also been validated
with the existing methods like Particle Swarm Optimization (PSO) [10], Harmony Search (HS) [11], Firefly [12],
Artificial Bee Colony (ABC) [13] and Differential Evolution (DE) [14]. This research attempts for the first time to
identify shifting and scaling behavior-based biclusters from
Huntington’s disease datasets. Results of ES-GWO on real
life dataset have been validated with that of the state-of-theart of GWO (GWO, RM-GWO). The efficacy of ES-GWO
has been observed on the both data dataset. In each dataset,
ES-GWO successfully finds biological relevant biclusters.
Finally, from 100 optimal biclusters identified by ES-GWO
top 5 were separated on the basis of cost function. 7 genes
common in those top 5 biclusters have been selected and
validated (by p-value) to be biologically significant.
Related Works
Cheng and Church (CC) [15] introduced the concept of
bicluster, involving some selected genes and some selected
conditions having a good similarity measure. The concept bicluster overcomes several problems associated with
traditional clustering methods. Researchers used Mean
Squared Residue (MSR) [15] to measure the coherence
present in genes and conditions belonging to a bicluster
[15]. The strength of Mean Squared Residue (MSR) is only
capturing the constant and shifting biclusters [16]. It is not
able to detect scaling biclusters. In the investigations [16],
researchers used a new cost function called Scaling Mean
Squared Residue (SMSR) to detect the scaling patterns
effectively. Huang et al. [7] introduced an algorithm named
Condition-Based Evolutionary Biclustering (CBEB). It
is based on Evolutionary Algorithms (EA). It is used to
detect biclusters by the parallelizing search strategy. This
work incorporates MSR metric with predefined threshold
to obtained better results [7]. Thangavel et al. [17] proposed a hybrid algorithm called, Hybrid PSO-SA-BIC that
combines binary PSO and Simulated Annealing together.
PSO-SA-BIC identified highly correlated biclusters having
13
larger volume [17]. In the literature [2], researchers proposed an algorithm, named Evolutionary Biclustering
based on Expression Patterns (Evo-Bexpa). Researchers
used a cost function that measure quality, volume, overlapping amount and gene variance of biclusters based on
shifting and scaling pattern. Adhikary et al. [18] reported
Random Move Grey Wolf Optimizer (RM-GWO) to find
biclusters on Parkinson’s disease dataset. This research has
considered RM-GWO [18] as state-of-the art algorithm to
va (...truncated)