Data Analysis of COVID-19 Hospital Records Using Contextual Patient Classification System
Annals of Data Science
https://doi.org/10.1007/s40745-022-00378-9
Data Analysis of COVID-19 Hospital Records Using
Contextual Patient Classification System
Vrushabh Gada1 · Madhura Shegaonkar1 · Madhura Inamdar1 ·
Sharath Dinesh1 · Darshan Sapariya1 · Vedant Konde1 · Mahesh Warang1 ·
Ninad Mehendale1
Received: 14 April 2021 / Revised: 1 November 2021 / Accepted: 19 February 2022
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Abstract
Humanity today is suffering from one of the most dangerous pandemics in history, the
Coronavirus Disease of 2019 (COVID-19). Although today there is immense advancement in the medical field with the latest technology, the COVID-19 pandemic has
affected us severely. The virus is spreading rapidly, resulting in an escalation in the
number of patients admitted. We propose a contextual patient classification system for
better analysis of the data from the discharge summary available from the research hospital. The classification was done using the Knuth–Morris–Pratt algorithm. We have
also analyzed the data of COVID-19 and non-COVID-19 patients. During the analysis,
studies on the medicines, medical services and tests, pulse count, body temperature,
and the overall effect of age and gender was done. The death versus survival ratio for
the COVID-19 positive patients has also been studied. The classification accuracy of
the contextual patient classification system achieved was 97.4%. The combination of
data analysis and contextual patient classification will be helpful to all the sectors to
be better prepared for any future waves of the COVID-19 pandemic.
Keywords Data analysis · Patient classification system · Contextual search
1 Introduction
A catastrophic virus originated in early December 2019, in the Wuhan province of
China. Later on, it became a worldwide crisis termed Coronavirus Disease of 2019
(COVID-19) by the World Health Organization (WHO) which is still affecting the
world [1]. COVID-19 is still a serious challenge for doctors and hospitals. Even though
B Ninad Mehendale
1
K. J. Somaiya College of Engineering, Mumbai, Maharashtra 400077, India
123
Annals of Data Science
hospitals are trying their best to overcome this difficult situation, this pandemic is
becoming more severe day by day as the number of variants is increasing.
The COVID-19 pandemic has resulted in uncontrollable havoc in India. Since this
was an unexpected circumstance, many local hospitals were not prepared to handle
this crisis. The number of patients getting admitted because of COVID-19 is still
increasing rapidly and this has caused a strain on hospital resources like ventilators,
beds, medication (drugs), ICU beds, oxygen supply, etc. [2]. It makes the situation even
more difficult for doctors and related staff such as nurses, ward boys, etc. This chaotic
situation has majorly affected the patients as well. The proper allocation of resources
has become a tough challenge for hospitals. Because of this, there is a possibility
that many patients may not get proper treatment. If the trends in the current situation
of the COVID-19 pandemic in terms of patient condition and availability of hospital
resources are studied and analyzed correctly, it can help in the organized planning of
any future waves of the COVID-19 pandemic [3].
This will eventually help in quick decision-making and proper allocation of the
hospital resources.Data science is one of the tools to get the trends from a large dataset.
Data science uses scientific methods and algorithms on unstructured data to extract
useful insights, which help different businesses, health care, and other organization to
improve their goods and services [4].
In India, different hospitals have different ways and software for maintaining their
patient records [5]. A centralized system of maintaining the records is required. For
proper resource management, we need the history of a patient to be presented in a wellorganized manner. There are a good deal of software already available that can be used
for hospital resource management if organized data is present. The patient summary
written by doctors varies from doctor to doctor [6]. Hence, we need a context-based
patient classification system that can give segregated data which can be useful for
hospital resource allocation.
In our proposed method, data analysis is done on the anonymous data provided by
a local hospital. This data was present in an unorganized form. The received raw data
from the hospital contained eight different databases as excel sheets. Out of the eight
databases provided by the hospital, seven were used. They named the seven sheets
as patient list, registration list, ward list, medicine list, service list, test list, discharge
summary list. We then organized this data and passed it as an input to the contextual
patient classification system. The organized data was given to the contextual patient
classification system to classify COVID-19 and non-COVID-19 patients. The classification was done using the KMP algorithm [7]. The classification that was done helped
us in performing a comparative analysis between the COVID-19 and non-COVID-19
patient characteristics. We could compare the COVID-19 and non-COVID-19 patients
based on the effect of gender, age, and services provided to them by the hospitals in
terms of treatment. The death versus survival ratio of COVID-19 patients was obtained
based on differences in gender and differences in age. This classification and comparison will help in the early prediction for the resource allocation and treatment process
of COVID-19 patients using the data present in the discharge summary section of the
organized data.
Figure 1a shows the conceptual diagram of the process of data analysis and contextual patient classification system. The data is filtered and arranged in an organized
123
Annals of Data Science
Fig. 1 a Concept diagram of the proposed method for data analysis. The unorganized data obtained from
the hospital was filtered with the help of python programming. The filtration resulted in an organized
representation of the raw data given by the hospital. In the organized data, the data for attributes related
to hospital services, medicines, and discharge details were mapped against the unique MR numbers of
each patient. The organized dataset was given to a contextual patient classification system. The discharge
summary list from the organized dataset was then used to classify COVID-19 and non-COVID-19 patients.
This classification further helped in data analyses and visualizing the differences in various aspects between
COVID-19 and non-COVID-19 patients. b Based on the discharge summary of the patients obtained from
the records provided by the hospital, the patients were segregated into non-COVID-19 and COVID-19. This
was done using a contextual patient classification system that used the KMP algorithm for pattern mapping
manner using (...truncated)