Comparison Of K-Means and K-Medoids Algorithm for Clustering Data UMKM in Pagar Alam City
Jurnal SISFOKOM (Sistem Informasi dan Komputer), Volume 13, Nomor 02, PP 193-199
Comparison Of K-Means and K-Medoids Algorithm
for Data Clustering UMKM in Pagar Alam City
Sendy Ariska [1]*, Desi Puspita [2], Inda Anggraini [3]
Institut Teknologi Pagar Alam
Pagar Alam, Indonesia
[1], [2], [3]
Abstract— The aim of this research is clustering MSME data in
Pagar Alam City using the K-Means and K-Medoids algorithms.
This research is motivated by the lack of further management of
MSME data collection, which can hinder the development and
improvement of Pagar Alam City MSMEs. Meanwhile, this data is
considered necessary for agencies to develop and improve Pagar
Alam City MSMEs. Apart from agencies, this data is also useful for
sub-districts, sub-districts and RT/RW to find out what interests,
talents and potential the community has in what business fields.
MSME data is processed using Rapid Miner and Python, the system
development method in this research uses the Cross Industry
Standard Process for Data Mining (CRISP-DM) method, where the
stages include Business Understanding, Data Understanding, Data
Preparation, Modeling, Evaluation, and Deployment. The test
method uses the Davies-bouldin index, a DBI value that is close to
0 results in good clustering. The results of this research obtained 3
clusters. In 2020 K-Means C0= 1, C1= 3 and C2= 1 sub-district, KMedoids C0= 1, C1= 1 and C2= 3 sub-district. In 2022 K-Means
C0= 1, C1= 3 and C2= 1 sub-district, K-Medoids C0= 1, C1= 3 and
C2= 1 sub-districts. The results of the 2020 sub-district DBI
clustering calculation are DBI k-means = 0.134 and k-medoids =
0.523. In 2022 DBI k-means = 0.277 and k-medoids = 0.496. So it
can be concluded that the K-Means algorithm in the case of
grouping MSMEs in Pagar Alam City has better performance,
because the DBI value is close to 0. From the results of the
grouping it can help provide an overview for related parties in
encouraging or providing assistance to sub-districts that are
included in the low cluster.
Keywords— K-Means, K-Medoids, CRISP-DM, Davies-bouldin
index.
I.
INTRODUCTION
Along with the development of the internet, the data
stored, both in the form of text, images, sound and video, has
also increased very quickly and significantly. As a result, large
volumes of data will become "garbage" in storage if they are
not processed into useful information, requiring a technique or
method called data mining. [1].
Data mining, also called Knowledge Discovery in
Databases (KDD), is defined as the extraction of potential,
implicit and unknown information from a set of data. The
Knowlegde Discovery in Database process involves the results
of the process of extracting trends in a data pattern, then
converting the results accurately into information that is easy
to understand. There are several roles that can be used in data
mining, one of which is clustering [2].
Clustering is a method of analyzing data, which is often
included as one of the data mining methods, the aim of which
is to group data with different characteristics into other
"regions" [3], One method for clustering is K-Means and KMedoids.
K-Means algorithm is a clustering algorithm that
groups data based on the closest cluster center point (Centroid)
to the data [4]. K-Medoids algorithm or Partitioning Around
Medoids (PAM) is a partition clustering method for grouping
a set of (n) objects into a number of (k) clusters [5].
Based on the results of observations and interviews in
the cooperative sector at the Department of Industry, Trade
and UKM Cooperatives in Pagar Alam City. UMKM in Pagar
Alam City have experienced an increase in number and
various types of business fields. Relevant parties at the
Department of Industry, Trade and UKM Cooperatives in
Pagar Alam City collected 2 times UMKM data in 1 year
which was collected directly through face-to-face meetings
with respondents consisting of MSME actors throughout Pagar
Alam City. The latest data for UMKM in Pagar Alam City in
2022 has a total of 2,906 UMKMs from 5 sub-districts. The
UMKM data collected is currently used as accurate reference
data in making government policies to develop Micro, Small
and Medium Enterprises (UMKMs) in Pagar Alam City.
Based on the data obtained, there are various types of business
sectors for UMKM in Pagar Alam City in several sub-districts,
from this data collection there is no further management which
can hinder the development and improvement of UMKM in
Pagar Alam City. Meanwhile, this data is considered
necessary for agencies to develop and improve Pagar Alam
City UMKMs. Apart from agencies, this data is also useful for
sub-districts, sub-districts and RT/RW to be able to find out
what interests, talents and potential the community has in the
MSME business sector, so that it can be used as a strategy to
improve and develop UMKMs in order to support the
economy of the actors. the UMKM business. Therefore, there
is a need for further data management so that it can be used as
policy decisions in improving and developing MSMEs. With
the large number of UMKM data and the number of subdistricts, it is necessary to group the UMKM data in Pagar
Alam City to deiteirminei thei high and low leiveils of thei
numbeir of UMKMs baseid on sub-districts, and thei groups
with thei most UMKM busineiss seictors in Pagar Alam City.
Thein, data mining is neieideid to bei proceisseid into
useiful information using thei clusteiring meithod using a
p-ISSN 2301-7988, e-ISSN 2581-0588
DOI : 10.32736/sisfokom.v13i2.2090, Copyright ©2024
Submitted : February 12, 2024, Revised : March 1, 2024, Accepted : March 19, 2024, Published : June 15, 2024
193
Jurnal SISFOKOM (Sistem Informasi dan Komputer), Volume 13, Nomor 02, PP 193-199
comparison of thei K-Meians and K-Meidoids algorithms to
deiteirminei thei grouping of UMKM data baseid on thei high
and low leiveils of thei numbeir of UMKMs in thei subdistrict and thei numbeir of eixisting busineiss fieilds. By
grouping data on UMKMs in Pagar Alam City, it can heilp
ageincieis to focus on sub-districts which still havei low
numbeirs of UMKMs, to providei assistancei in improving and
deiveiloping UMKMs in Pagar Alam City. Apart from
ageincieis, it is also useiful for sub-districts, sub-districts and
RT/RW to focus on thei inteireists, taleints and poteintial of
thei community in thei higheist numbeir of busineiss fieilds, to
providei assistancei so that theiy can bei furtheir improveid
and deiveilopeid, eispeicially saleis reisults, in ordeir to
support thei eiconomy of UMKM busineiss actors..
Clusteiring proceiss using rapid mineir application
havei a reisult of 53 peirceint for clusteir 1 with a total of 8
data, 40 peirceint of thei data for clusteir 2 and 7 peirceint for
clusteir 3. [6]
Clusteiring proceiss using Rapidmineir application wit
K-Meians and K-meidoids algorithm Baseid on reiseiarch
conducteid by [7] with thei titlei "Comparison of thei KMeians Algorithm and thei K-Meidoid Algorithm for
Grouping MSMEi (...truncated)