Classification of individuals at risk of heart disease using machine learning (pdf)

Article PDF cannot be displayed. You can download it here:

https://dergipark.org.tr/en/download/article-file/1117983

Classification of individuals at risk of heart disease using machine learning

283 CMJ Original Research September 2020, Volume: 42, Number: 3 Cumhuriyet Tıp Dergisi (Cumhuriyet Medical Journal) 283-289 http://dx.doi.org/10.7197/cmj.vi.742161 Classification of individuals at risk of heart disease using machine learning Makine öğrenmesi kullanarak kalp hastalığı riski olan bireylerin sınıflandırılması Betül Akalın1, Ülkü Veranyurt1, Ozan Veranyurt2 Sağlık Bilimleri Üniversitesi, İstanbul, Türkiye Bahçeşehir Üniversitesi, İstanbul, Türkiye Corresponding author: Ülkü Veranyurt, PhD, Sağlık Bilimleri Üniversitesi, İstanbul, Türkiye E-mail: Received/Accepted: May 24, 2020 / August 17, 2020 Conflict of interest: There is not a conflict of interest. 1 2 SUMMARY Objective: The aim of this study is to determine whether people have heart disease by using different machine learning algorithms with the data provided by the University of Cleveland. Method: 303 patient data provided by the University of Cleveland were classified using Gaussian Bayes, K-Nearest Neighbor and Random Forest Algorithms with and without feature scaling. With each algorithm, the data is divided into random training and test sets. This process was repeated 50 times for each algorithm. The test results were subjected to the T-test to check statistical independence. Results: In this study, 80.52% accuracy with K-Nearest Neighbor algorithm, 80.52% with Gaussian Bayes and 82.50% with Random Forest were observed with data scaling. The results of the three algorithms produced similar values and did not show statistical independence (p> 0.05). Without data scaling, 65.28% accuracy with the K-Nearest Neighbor algorithm, 80.52% with Gaussian Bayes and 82.19% with Random Forest were observed. The test results obtained with three algorithms showed statistical independence. Conclusions: Although there were data from 303 patients in the study, over 80% accurate prediction was obtained. The presence of endpoints that distort the distribution in the data used results in differences in the methods used. It has been confirmed that much closer estimates can be obtained on a scaled patient data. This study is an example of the use of artificial intelligence in detecting cardiac diseases that pose a risk all over the world. With a more detailed patient data, much higher accuracy rates can be obtained and included in health management processes in the pre-diagnosis of heart disease in the future. Keywords: Illness classification, machine learning in health, heart disease, heart failure Betül Akalın Ülkü Veranyurt Ozan Veranyurt ORCID IDs of the authors: B.A. 0000-0003-0402-2461 Ü.V. 0000-0003-4838-3373 O.V. 0000-0003-3652-2356 ÖZET Amaç: Bu çalışmanın amacı Cleveland Üniversitesi tarafından sağlanan veriler ile farklı makine öğrenmesi algoritmaları kullanarak kişilerin kalp hastalığının olup olmadığını tespit etmektir. Yöntem: Cleveland Üniversitesi tarafından sağlanan 303 kişilik hasta verisi özellik ölçekleme ile ve ölçekleme olmaksızın Gaussian Bayes, K-Nearest Neighbour ve Random Forest Algoritmaları kullanılarak sınıflandırılmıştır. Her 284 bir algoritma ile veri rastgele eğitim ve test kümelerine bölünmüştür. Bu işlem her bir algoritma için 50 kez tekrar edilmiştir. Test sonuçları istatistiksel bağımsızlığı kontrol etmek için T-testine tabi tutulmuştur. Bulgular: Yapılan çalışmada veri ölçeklendirilmesi ile K-Nearest Neighbour algoritması ile %80.52, Gaussian Bayes ile %80.52 ve Random Forest ile %82.50 doğruluk gözlemlenmiştir. Kullanılan üç algoritmanın sonuçları birbirine benzer değerler üretmiş ve istatistiksel olarak bağımsızlık göstermemiştir (p >0.05). Veri ölçeklendirmesi olmadan ise K-Nearest Neighbour algoritması ile %65.28, Gaussian Bayes ile %80.52 ve Random Forest ile %82.19 doğruluk gözlemlenmiştir. Üç algoritma ile elde edilen test sonuçları istatistiksel olarak bağımsızlık göstermiştir. Sonuç: Çalışmada 303 hastanın verisi olmasına rağmen %80 üzerinde doğru tahminleme elde edilmiştir. Kullanılan veride dağılımı bozan uç noktaların olması kullanılan yöntemlerde sonuç farklarına sebep olmaktadır. Ölçeklendirilmiş bir hasta verisi üzerinde çok daha yakın tahminler elde edilebildiği doğrulanmıştır. Bu çalışma tüm dünyada risk teşkil eden kalp hastalıklarının tespit yapay zekâ kullanımında bir örnek teşkil etmektedir. Daha detaylı bir hasta verisi ile çok daha yüksek doğruluk oranları elde edilebilir ve gelecekte kalp hastalığının ön teşhisinde sağlık yönetimi süreçlerine dâhil edilebilir. Anahtar sözcükler: Hastalık sınıflandırması, sağlıkta makine öğrenmesi, kalp hastalığı, kalp yetmezliği INTRODUCTION Death in heart diseases is one of the most common causes of death in developing countries. The points that make the diagnosis of heart diseases the most difficult are myocardial perfusion single-photon emission computed tomography (SPECT), and electrogardiogram (ECG) can be diagnosed by interpreting1. The experience of the specialist who examines the medical diagnosis plays an important role. At this point, machine learning, which is a sub-branch of artificial intelligence, can learn similarities in the images, or the determining features in the diagnosis on patient information2. Unlike other areas of technology, applications of artificial intelligence and machine learning on health are still developing3. Its applications in the field of cardiology are very limited4. The first uses of artificial intelligence were basic estimation and image analysis. Developing hardware technology has started to increase in health as it allows working on big data5. Machine learning6 works with iterations, it tries to learn common patterns on data without any assumptions. Machine learning types are summarized in Table 1. Table 1: Types of machine learning No Method 1 Supervised 2 Unsupervised 3 Semi-supervised 4 Reinforcement Definition It is the most common form of learning. The data is divided into training and test sets. The data is marked in advance according to the procedure to be performed. It is used in operations such as regression, estimation, classification. Algorithms such as artificial neural networks, Random Forest are examples. In this form of learning, the model is not given any class or numerical information for education. The model tries to produce results through common points in the data. K-Nearest Neighbor (KNN), hierarchical clustering are examples. It is divided into sections with or without result information marked in the data. Data that is not fully classified is used. Sound perception can be given as an example. It is based on the reward principle in behavioral psychology. The decision making mechanism learns the option that gives the highest reward based on the result obtained. Today, it is used in areas such as robotics, personalization in artificial intelligence, medical image processing. In this study, Random Forest, Gaussian Bayes and unsupervised algorithm were chosen as supervised. Random Forest: It is a model that uses decision trees. The data set (...truncated)