ElGamal Homomorphic Encryption-Based Privacy Preserving Association Rule Mining on Horizontally Partitioned Healthcare Data
J. Inst. Eng. India Ser. B
https://doi.org/10.1007/s40031-021-00696-1
ORIGINAL CONTRIBUTION
ElGamal Homomorphic Encryption-Based Privacy Preserving
Association Rule Mining on Horizontally Partitioned Healthcare
Data
Nikunj Domadiya1
•
Udai Pratap Rao2
Received: 26 August 2020 / Accepted: 20 October 2021
Ó The Institution of Engineers (India) 2021
Abstract In today’s world, life-threatening diseases have
become a pre-eminent issue in healthcare due to the higher
mortality rate. It is possible to lower this mortality rate by
utilizing healthcare intelligence to detect diseases early.
Patient’s medical data is stored in the EHR system, which
is kept up to date by the healthcare provider. Data mining
techniques like Association Rule Mining can detect a
patient’s disease from their symptoms using digital
healthcare data stored in the EHR system. Association rule
mining’s efficacy can be improved by using global data
from various EHR systems. It mandates that all EHR systems exchange healthcare records to a central server. When
personal health information is made available on an
untrusted server, several privacy laws may be violated. As
a result, the challenge of privacy preserving distributed
healthcare data mining has become a well-known study
field in the healthcare industry. This research uses an
efficient ElGamal homomorphic encryption technique to
protect privacy in a distributed association rule mining.
The proposed approach to discover the risk factor of most
life-threatening diseases like breast cancer and heart disease with its symptoms and discuss the scope for combating COVID-19. Theoretical analysis of the proposed
approach shows that it is efficient and maintains privacy in
an insecure communication environment. An experimental
study with a real dataset shows the proposed approach’s
benefit compared to the local single EHR system results.
& Nikunj Domadiya
1
Computer Engineering Department, L. D. College of
Engineering, Ahmedabad, India
2
Computer Engineering Department, National Institute of
Technology, Surat, India
Keywords Association Rule Mining
Breast Cancer Disease Coronavirus(COVID-19)
Data Mining Privacy Distributed Healthcare Data Mining
Introduction
Human life-threatening diseases are the primary focus of
medical research all around the world [1]. Health
researchers have recently focused a significant deal of
attention on COVID-19, as well as cancer and other lifethreatening diseases. According to the 2015 National Vital
Statistics Report (NVS) [2], cancer and heart disease are
the two most common causes of death. Fatality rate from
cancer and heart disease accounted for 45.3% of all U.S.
deaths in 2010, according to the Department of Health and
Human Services (Fig. 1). As the most deadly disease
among women, breast cancer claims millions of lives each
year in the USA. Figure 1 displays the number of cancer
cases in the USA in 2018 for each of the major kinds of
cancer [3]. As of May-2020, there have been 4,527,815
instances of Coronavirus disease (COVID-19), a rare disease that arose in 2019. Of those cases, 303,438 people
have died [4]. Given the high mortality rate of these lifethreatening disorders, early disease detection through an
examination of the patient’s symptoms is crucial to saving
more lives.
Appropriate treatment and recovery of these lifethreatening illnesses require early identification of the
disease. Diagnostic methods for cancer and heart disease
are expensive, prone to mistake, and time-consuming
[5–7]. Traditionally, disease prediction relied on physician
expertise rather than symptoms patterns hidden in healthcare data [8–15]. As a result, this may result in an
123
J. Inst. Eng. India Ser. B
Fig. 1 Health Statistics report of USA [3]
inaccurate health diagnosis, leading to inappropriate medical treatment, which raises healthcare costs by decreasing
the quality of healthcare services provided to patients [16].
Electronic healthcare record (EHR) systems are utilised in
large hospitals to keep digital records. It maintains a
massive amount of information on patients [17]. Data
acquired in hospitals can be utilised using data mining for
healthcare research and to improve healthcare services.
Association rule mining is a well-known data mining
approach for determining disease and symptom co-relationships [18–24]. Numerous applications of association
rule mining in the healthcare area include forecasting disease based on a patient’s symptoms, determining an adequate treatment for diseases, detecting medication
response, and improving medical fraud detection via data
mining [19, 25–29]. Association rule mining generates IFTHEN rules that medical professionals quickly understand.
As a result, this approach is well-known amongst medical
Fig. 2 Distributed Data Partition Model [49]
123
researchers and doctors for identifying the state of a disease
or the appropriate treatment depending on the symptoms of
the patient. As an outcome, the healthcare system becomes
much more efficient in terms of cost and treatment [30].
Earlier, association rule mining on healthcare data could
only be done on the EHR system of a single hospital
[19, 31]. Only a limited number of patient records could be
stored in a single electronic medical record (EMR). So
association rule mining on the data of a single EHR system
has less accuracy. Dangerous diseases (e.g. cancer and
heart disease) demand more precise association rules
[19, 25]. Accuracy/confidence in association rule mining
can be increased by combining all EHR systems data at a
central server. Patients’ data must be kept private in the
local EHR system since there is a threat to privacy in
healthcare [32, 33]. For accurate data mining, various EHR
systems must share their data while protecting privacy. As
a result, medical researchers have concentrated on
J. Inst. Eng. India Ser. B
association rule mining on distributed healthcare data that
preserves privacy. As demonstrated in Fig. 2, distributed
data is either vertically or horizontally partitioned. Most
large hospitals use the same EHR system schema because
they follow the same standards for patient information
storage in hospitals. As a result, in our study, we have
included data that has been horizontally partitioned among
the collaborative EHR systems [34–37]. With this insight,
we’re working to acquire global association rules while
also safeguarding the privacy of EHR systems worldwide.
UCI repository data on breast cancer and heart disease are
utilised in a proposed approach for evaluating symptoms
associated with both of these lives threatening diseases
[38, 39].
Background and Related Concepts
Distributed Healthcare Data
Horizontally partitioned and vertically partitioned healthcare data are the two types of distribution of healthcare
data among EHR systems.
In horizontally partitioned healthcare data, all EHR
systems have an equivalent schema, but store the records of
different patients. Figure 2 shows the ho (...truncated)