Classification of individuals at risk of heart disease using machine learning
283
CMJ
Original Research
September 2020, Volume: 42, Number: 3
Cumhuriyet Tıp Dergisi (Cumhuriyet Medical Journal)
283-289
http://dx.doi.org/10.7197/cmj.vi.742161
Classification of individuals at risk of heart disease
using machine learning
Makine öğrenmesi kullanarak kalp hastalığı riski olan
bireylerin sınıflandırılması
Betül Akalın1, Ülkü Veranyurt1, Ozan Veranyurt2
Sağlık Bilimleri Üniversitesi, İstanbul, Türkiye
Bahçeşehir Üniversitesi, İstanbul, Türkiye
Corresponding author: Ülkü Veranyurt, PhD, Sağlık Bilimleri Üniversitesi, İstanbul, Türkiye
E-mail:
Received/Accepted: May 24, 2020 / August 17, 2020
Conflict of interest: There is not a conflict of interest.
1
2
SUMMARY
Objective: The aim of this study is to determine whether people have heart
disease by using different machine learning algorithms with the data
provided by the University of Cleveland.
Method: 303 patient data provided by the University of Cleveland were
classified using Gaussian Bayes, K-Nearest Neighbor and Random Forest
Algorithms with and without feature scaling. With each algorithm, the data
is divided into random training and test sets. This process was repeated 50
times for each algorithm. The test results were subjected to the T-test to
check statistical independence.
Results: In this study, 80.52% accuracy with K-Nearest Neighbor
algorithm, 80.52% with Gaussian Bayes and 82.50% with Random Forest
were observed with data scaling. The results of the three algorithms
produced similar values and did not show statistical independence (p> 0.05).
Without data scaling, 65.28% accuracy with the K-Nearest Neighbor
algorithm, 80.52% with Gaussian Bayes and 82.19% with Random Forest
were observed. The test results obtained with three algorithms showed
statistical independence.
Conclusions: Although there were data from 303 patients in the study, over
80% accurate prediction was obtained. The presence of endpoints that
distort the distribution in the data used results in differences in the methods
used. It has been confirmed that much closer estimates can be obtained on a
scaled patient data. This study is an example of the use of artificial
intelligence in detecting cardiac diseases that pose a risk all over the world.
With a more detailed patient data, much higher accuracy rates can be
obtained and included in health management processes in the pre-diagnosis
of heart disease in the future.
Keywords: Illness classification, machine learning in health, heart disease,
heart failure
Betül Akalın
Ülkü Veranyurt
Ozan Veranyurt
ORCID IDs of the authors:
B.A. 0000-0003-0402-2461
Ü.V. 0000-0003-4838-3373
O.V. 0000-0003-3652-2356
ÖZET
Amaç: Bu çalışmanın amacı Cleveland Üniversitesi tarafından sağlanan veriler ile farklı makine öğrenmesi algoritmaları
kullanarak kişilerin kalp hastalığının olup olmadığını tespit etmektir.
Yöntem: Cleveland Üniversitesi tarafından sağlanan 303 kişilik hasta verisi özellik ölçekleme ile ve ölçekleme
olmaksızın Gaussian Bayes, K-Nearest Neighbour ve Random Forest Algoritmaları kullanılarak sınıflandırılmıştır. Her
284
bir algoritma ile veri rastgele eğitim ve test kümelerine bölünmüştür. Bu işlem her bir algoritma için 50 kez tekrar
edilmiştir. Test sonuçları istatistiksel bağımsızlığı kontrol etmek için T-testine tabi tutulmuştur.
Bulgular: Yapılan çalışmada veri ölçeklendirilmesi ile K-Nearest Neighbour algoritması ile %80.52, Gaussian Bayes ile
%80.52 ve Random Forest ile %82.50 doğruluk gözlemlenmiştir. Kullanılan üç algoritmanın sonuçları birbirine benzer
değerler üretmiş ve istatistiksel olarak bağımsızlık göstermemiştir (p >0.05). Veri ölçeklendirmesi olmadan ise K-Nearest
Neighbour algoritması ile %65.28, Gaussian Bayes ile %80.52 ve Random Forest ile %82.19 doğruluk gözlemlenmiştir.
Üç algoritma ile elde edilen test sonuçları istatistiksel olarak bağımsızlık göstermiştir.
Sonuç: Çalışmada 303 hastanın verisi olmasına rağmen %80 üzerinde doğru tahminleme elde edilmiştir. Kullanılan
veride dağılımı bozan uç noktaların olması kullanılan yöntemlerde sonuç farklarına sebep olmaktadır. Ölçeklendirilmiş
bir hasta verisi üzerinde çok daha yakın tahminler elde edilebildiği doğrulanmıştır. Bu çalışma tüm dünyada risk teşkil
eden kalp hastalıklarının tespit yapay zekâ kullanımında bir örnek teşkil etmektedir. Daha detaylı bir hasta verisi ile çok
daha yüksek doğruluk oranları elde edilebilir ve gelecekte kalp hastalığının ön teşhisinde sağlık yönetimi süreçlerine dâhil
edilebilir.
Anahtar sözcükler: Hastalık sınıflandırması, sağlıkta makine öğrenmesi, kalp hastalığı, kalp yetmezliği
INTRODUCTION
Death in heart diseases is one of the most common
causes of death in developing countries. The points
that make the diagnosis of heart diseases the most
difficult are myocardial perfusion single-photon
emission computed tomography (SPECT), and
electrogardiogram (ECG) can be diagnosed by
interpreting1. The experience of the specialist who
examines the medical diagnosis plays an important
role. At this point, machine learning, which is a
sub-branch of artificial intelligence, can learn
similarities in the images, or the determining
features in the diagnosis on patient information2.
Unlike other areas of technology, applications of
artificial intelligence and machine learning on
health are still developing3. Its applications in the
field of cardiology are very limited4. The first uses
of artificial intelligence were basic estimation and
image analysis. Developing hardware technology
has started to increase in health as it allows working
on big data5.
Machine learning6 works with iterations, it tries to
learn common patterns on data without any
assumptions. Machine learning types are
summarized in Table 1.
Table 1: Types of machine learning
No Method
1
Supervised
2
Unsupervised
3
Semi-supervised
4
Reinforcement
Definition
It is the most common form of learning. The data is divided into training
and test sets. The data is marked in advance according to the procedure to
be performed. It is used in operations such as regression, estimation,
classification. Algorithms such as artificial neural networks, Random
Forest are examples.
In this form of learning, the model is not given any class or numerical
information for education. The model tries to produce results through
common points in the data. K-Nearest Neighbor (KNN), hierarchical
clustering are examples.
It is divided into sections with or without result information marked in the
data. Data that is not fully classified is used. Sound perception can be given
as an example.
It is based on the reward principle in behavioral psychology. The decision
making mechanism learns the option that gives the highest reward based
on the result obtained. Today, it is used in areas such as robotics,
personalization in artificial intelligence, medical image processing.
In this study, Random Forest, Gaussian Bayes and
unsupervised algorithm were chosen as supervised.
Random Forest: It is a model that uses decision
trees. The data set (...truncated)