Asymmetric \(k\) -Means Clustering of the Asymmetric Self-Organizing Map (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs11063-015-9415-8.pdf

Asymmetric \(k\) -Means Clustering of the Asymmetric Self-Organizing Map

Asymmetric k-Means Clustering of the Asymmetric Self-Organizing Map Dominik Olszewski 0 0 Faculty of Electrical Engineering, Warsaw University of Technology , Warsaw , Poland An asymmetric approach to clustering of the asymmetric self-organizing map is proposed. The clustering is performed using an improved asymmetric version of the well-known k-means algorithm. The improved asymmetric k-means algorithm is the second proposal of this paper. As a result, we obtain a two-stage fully asymmetric data analysis technique. In this way, we maintain the methodological consistency of the both utilized methods, because they are both formulated in asymmetric versions, and consequently, they both properly adjust to asymmetric relationships in analyzed data. The results of our experiments on real data confirm the effectiveness of the proposed approach. B Dominik Olszewski Self-organizing map; Asymmetric self-organizing map; Clustering; k-means algorithm; Asymmetric k-means algorithm 1 Introduction The self-organizing map (SOM) [17] is an example of the artificial neural network architecture. It was introduced by T. Kohonen in [8] as a generalization and extension of the concepts proposed in [9]. This approach can be also interpreted as a visualization technique, since the algorithm may perform a projection from multidimensional space to 2-dimensional space, this way creating a map structure. The location of points in 2-dimensional grid aims to reflect the similarities between the corresponding objects in multidimensional space. Therefore, the SOM algorithm allows for visualization of relationships between objects in multidimensional space. The asymmetric version of the SOM algorithm was introduced in [10], and the justification of the asymmetric approach was extended in [11]. The k-means clustering algorithm [1217] is a well-known statistical data analysis tool used in order to form arbitrary settled number of clusters in an analyzed dataset. The algorithm aims to separate clusters of possibly most similar objects. An object represented as a vector of d features can be interpreted as a point in d-dimensional space. Hence, the k-means algorithm can be formulated as follows: given n points in d-dimensional space, and the number k of desired clusters, the algorithm seeks a set of k clusters so as to minimize the sum of squared dissimilarities between each point and its cluster centroid. The name k-means was introduced in [15], however, the algorithm, itself, was formulated by H. Steinhaus in [16]. An asymmetric version of the k-means clustering algorithm was introduced in [18]. However, the asymmetry in the algorithm from [18] arises caused by usage of dissimilarities, which are asymmetric by definition (for example, the KullbackLeibler divergence). On the other hand, the paper [19] proposes an asymmetric k-means algorithm using symmetric similarities, which are asymmetrized by employing the asymmetric coefficients. This kind of approach provides a proper adjustment to asymmetric relationships in analyzed data (explained in detail in Sect. 3). Therefore, in this paper, we utilize the asymmetric version of the k-means algorithm introduced in [19], we improve it, and employ it for cluster analysis on the asymmetric SOM. 1.1 Our Proposal The improvement of the asymmetric k-means algorithm, introduced in this paper, consists in utilizing the current number of objects of clusters, when computing the asymmetric similarities in each cycle of the k-means clustering process. In this way, the algorithm can successfully handle even the datasets containing clusters of considerably different number of objects. The goal is achieved by incorporating a mechanism of different treating of different clusters in a dataset, according to theoretical fundamentals of the hierarchy-based asymmetric approach in data analysis (explained in detail in Sects. 3, 4, and 6). In order to accomplish the aforementioned purpose, we introduce the cluster coefficients, which convey the information about the current number of objects in clusters. The novel improved version of the asymmetric k-means algorithm uses both coefficientsthe asymmetric coefficients, like it was done in [19], and cluster coefficients, which are the proposal of this paper. Finally, we combine the asymmetric SOM visualization technique and the improved asymmetric k-means algorithm in order to perform the two-stage asymmetric cluster analysis. The general order of data analysis in our work is the following: First, the asymmetric SOM is generated, and then, the neurons (processing units) in the grid of the asymmetric SOM are clustered using the proposed asymmetric k-means algorithm. In other words, the clustering process is carried out in the output space of the asymmetric SOM, i.e., in 2-dimensional space. In this way, we maintain the methodological consistency between the asymmetric SOM and the asymmetric k-means, i.e., both employed methods are asymmetry-sensitive, and therefore, both can effectively operate on asymmetric data. As a result, we obtain a fully asymmetric two-stage data analysis approach. Recapitulating, this paper proposes: the improvement of the asymmetric k-means algorithm, the asymmetric k-means clustering of the asymmetric SOM. It is worthy of mentioning that the asymmetric version of the k-means clustering algorithm can be recognized as a generalization of this method, which makes it capable to handle data regardless whether it is symmetric or asymmetric. A complete theoretical justification of both of the proposals of the present paper is provided in Sect. 4. 1.2 Remainder of this Paper The rest of this paper is organized as follows: In Sect. 2, the related work is discussed as a background for our study; in Sect. 3, the phenomenon of asymmetry in data analysis is discussed, and the cluster coefficients are introduced; in Sect. 4, a theoretical justification of the two methods proposed in the present paper is provided; in Sect. 5, the asymmetric version of the SOM technique is presented; in Sect. 6, the second proposal of the paper (i.e., the asymmetric k-means clustering of the asymmetric SOM) is described; in Sect. 7, the results of the experimental study on real data in four different research fields are reported; while in Sect. 9, the whole paper is summarized, and certain directions for future research are given. 2 Related Work This section presents the state-of-the-art in the field of asymmetric data analysis. However, we claim that the problem of asymmetry in data analysis has not gained the deserved attention, and it has been relatively rarely studied in the literature. One of the first researchers dealing with the asymmetric approach in data analysis was A. Tversky, who questioned the geometric representation of similarity [20]. He argued that the notion of similarity had been dominated by geometric models, which represent objects as points in some coordinate space and that dissimilarities betwee (...truncated)