Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

PLOS ONE, Jul 2016

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

RESEARCH ARTICLE Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale Scott Emmons1*, Stephen Kobourov2, Mike Gallant1, Katy Börner1,3 1 School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America, 2 Department of Computer Science, University of Arizona, Tucson, Arizona, United States of America, 3 Indiana University Network Science Institute, Indiana University, Bloomington, Indiana, United States of America * a11111 Abstract Overview OPEN ACCESS Citation: Emmons S, Kobourov S, Gallant M, Börner K (2016) Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. PLoS ONE 11 (7): e0159161. doi:10.1371/journal.pone.0159161 Editor: Constantine Dovrolis, Georgia Institute of Technology, UNITED STATES Received: February 10, 2016 Accepted: June 28, 2016 Published: July 8, 2016 Copyright: © 2016 Emmons et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The code we developed to implement this study, including all scripts, statistics, and analyses, is available and documented at http://cns.iu.edu/2016ClusteringComp and Github at https://github.com/ scottemmons/STHClusterAnalysis. Funding: This research was partially funded by the National Institutes of Health. This research was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU is also supported in part by Lilly Endowment, Inc. The funders had no role in study design, data Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PLOS ONE | DOI:10.1371/journal.pone.0159161 July 8, 2016 1 / 18 Analyzing Network Clustering Algorithms and Cluster Quality Metrics collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: Lilly Endowment, Inc. is a commercial funder of this work through its support for the Indiana University Pervasive Technology Institute. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials. Introduction Clustering is the task of assigning a set of objects to groups (also called classes or categories) so that the objects in the same cluster are more similar (according to a predefined property) to each other than to those in other clusters. This is a fundamental problem in many fields, including statistics, data analysis, bioinformatics, and image processing. Some of the classical clustering methods date back to the early 20th century and the cover a wide spectrum: connectivity clustering, centroid clustering, density clustering, etc. The result of clustering may be a hierarchy or partition with disjoint or overlapping clusters. Cluster attributes such as count (number of clusters), average size, minimum size, maximum size, etc., are often of interest. To evaluate and compare network clustering algorithms, the literature has given much attention to algorithms’ performance on “benchmark graphs” [1–5]. Benchmark graphs are synthetic graphs into which a known clustering can be embedded by construction. The embedded clustering is treated as a “gold standard,” and clustering algorithms are judged on their ability to recover the information in the embedded clustering. In such synthetic graphs there is a clear definition of rank: the best clustering algorithm is the one that recovers the most information, and the worst clustering algorithm is the one that recovers the least information. However, judging clustering algorithms based solely by their performance on benchmark graph tests assumes that the embedded clustering truly is a “gold standard” that captures the entirety of an algorithm’s performance. It ignores other properties of clustering, such as modularity, conductance, and coverage, to which the literature has given much attention in order to decide the best clustering algorithm to use in practice for a particular application [6–8]. Furthermore, previous papers that have evaluated clustering algorithms on benchmark graphs have used a single metric, such as normalized mutual information, to measure the amount of “gold standard” information recovered by each algorithm [3–5]. We have seen no studies that evaluate how the choice of information recovery metric affects the results of benchmark graph cluster analysis. In this paper, we experimentally evaluate the robustness of clustering algorithms by their performance on small (1,000 nodes, 12,400 undirected edges) to large-scale (1M nodes, 13.3M undirected edges) benchmark graphs. We cluster these graphs using a variety of clustering algorithms and simultaneously measure both the information recovery of each clustering and the quality of each clustering with various metrics. Then, we test (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0159161&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0159161

Scott Emmons, Stephen Kobourov, Mike Gallant, Katy Börner. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale, PLOS ONE, 2016, Volume 11, Issue 7, DOI: 10.1371/journal.pone.0159161