Asymmetric \(k\) -Means Clustering of the Asymmetric Self-Organizing Map
Asymmetric k-Means Clustering of the Asymmetric Self-Organizing Map
Dominik Olszewski 0
0 Faculty of Electrical Engineering, Warsaw University of Technology , Warsaw , Poland
An asymmetric approach to clustering of the asymmetric self-organizing map is proposed. The clustering is performed using an improved asymmetric version of the well-known k-means algorithm. The improved asymmetric k-means algorithm is the second proposal of this paper. As a result, we obtain a two-stage fully asymmetric data analysis technique. In this way, we maintain the methodological consistency of the both utilized methods, because they are both formulated in asymmetric versions, and consequently, they both properly adjust to asymmetric relationships in analyzed data. The results of our experiments on real data confirm the effectiveness of the proposed approach. B Dominik Olszewski
Self-organizing map; Asymmetric self-organizing map; Clustering; k-means algorithm; Asymmetric k-means algorithm
1 Introduction
The self-organizing map (SOM) [17] is an example of the artificial neural network
architecture. It was introduced by T. Kohonen in [8] as a generalization and extension of the concepts
proposed in [9]. This approach can be also interpreted as a visualization technique, since the
algorithm may perform a projection from multidimensional space to 2-dimensional space,
this way creating a map structure. The location of points in 2-dimensional grid aims to reflect
the similarities between the corresponding objects in multidimensional space. Therefore,
the SOM algorithm allows for visualization of relationships between objects in
multidimensional space. The asymmetric version of the SOM algorithm was introduced in [10], and the
justification of the asymmetric approach was extended in [11].
The k-means clustering algorithm [1217] is a well-known statistical data analysis tool
used in order to form arbitrary settled number of clusters in an analyzed dataset. The algorithm
aims to separate clusters of possibly most similar objects. An object represented as a vector
of d features can be interpreted as a point in d-dimensional space. Hence, the k-means
algorithm can be formulated as follows: given n points in d-dimensional space, and the
number k of desired clusters, the algorithm seeks a set of k clusters so as to minimize the sum
of squared dissimilarities between each point and its cluster centroid. The name k-means
was introduced in [15], however, the algorithm, itself, was formulated by H. Steinhaus in [16].
An asymmetric version of the k-means clustering algorithm was introduced in [18].
However, the asymmetry in the algorithm from [18] arises caused by usage of dissimilarities,
which are asymmetric by definition (for example, the KullbackLeibler divergence). On
the other hand, the paper [19] proposes an asymmetric k-means algorithm using
symmetric similarities, which are asymmetrized by employing the asymmetric coefficients. This
kind of approach provides a proper adjustment to asymmetric relationships in analyzed data
(explained in detail in Sect. 3). Therefore, in this paper, we utilize the asymmetric version of
the k-means algorithm introduced in [19], we improve it, and employ it for cluster analysis
on the asymmetric SOM.
1.1 Our Proposal
The improvement of the asymmetric k-means algorithm, introduced in this paper, consists in
utilizing the current number of objects of clusters, when computing the asymmetric
similarities in each cycle of the k-means clustering process. In this way, the algorithm can successfully
handle even the datasets containing clusters of considerably different number of objects. The
goal is achieved by incorporating a mechanism of different treating of different clusters in a
dataset, according to theoretical fundamentals of the hierarchy-based asymmetric approach
in data analysis (explained in detail in Sects. 3, 4, and 6). In order to accomplish the
aforementioned purpose, we introduce the cluster coefficients, which convey the information about the
current number of objects in clusters. The novel improved version of the asymmetric k-means
algorithm uses both coefficientsthe asymmetric coefficients, like it was done in [19], and
cluster coefficients, which are the proposal of this paper.
Finally, we combine the asymmetric SOM visualization technique and the improved
asymmetric k-means algorithm in order to perform the two-stage asymmetric cluster analysis.
The general order of data analysis in our work is the following: First, the asymmetric SOM
is generated, and then, the neurons (processing units) in the grid of the asymmetric SOM are
clustered using the proposed asymmetric k-means algorithm. In other words, the clustering
process is carried out in the output space of the asymmetric SOM, i.e., in 2-dimensional
space.
In this way, we maintain the methodological consistency between the asymmetric SOM
and the asymmetric k-means, i.e., both employed methods are asymmetry-sensitive, and
therefore, both can effectively operate on asymmetric data. As a result, we obtain a fully
asymmetric two-stage data analysis approach.
Recapitulating, this paper proposes:
the improvement of the asymmetric k-means algorithm,
the asymmetric k-means clustering of the asymmetric SOM.
It is worthy of mentioning that the asymmetric version of the k-means clustering algorithm
can be recognized as a generalization of this method, which makes it capable to handle data
regardless whether it is symmetric or asymmetric.
A complete theoretical justification of both of the proposals of the present paper is provided
in Sect. 4.
1.2 Remainder of this Paper
The rest of this paper is organized as follows: In Sect. 2, the related work is discussed as
a background for our study; in Sect. 3, the phenomenon of asymmetry in data analysis is
discussed, and the cluster coefficients are introduced; in Sect. 4, a theoretical justification
of the two methods proposed in the present paper is provided; in Sect. 5, the asymmetric
version of the SOM technique is presented; in Sect. 6, the second proposal of the paper (i.e.,
the asymmetric k-means clustering of the asymmetric SOM) is described; in Sect. 7, the
results of the experimental study on real data in four different research fields are reported;
while in Sect. 9, the whole paper is summarized, and certain directions for future research
are given.
2 Related Work
This section presents the state-of-the-art in the field of asymmetric data analysis. However, we
claim that the problem of asymmetry in data analysis has not gained the deserved attention,
and it has been relatively rarely studied in the literature.
One of the first researchers dealing with the asymmetric approach in data analysis was A.
Tversky, who questioned the geometric representation of similarity [20]. He argued that the
notion of similarity had been dominated by geometric models, which represent objects as
points in some coordinate space and that dissimilarities betwee (...truncated)