Loading [a11y]/accessibility-menu.js
Distance Matrix Pre-Caching and Distributed Computation of Internal Validation Indices in k-medoids Clustering | IEEE Conference Publication | IEEE Xplore

Distance Matrix Pre-Caching and Distributed Computation of Internal Validation Indices in k-medoids Clustering


Abstract:

In this paper we discuss techniques for potential speedups in k-medoids clustering. Specifically, we address the advantages of pre-caching the pairwise distance matrix, h...Show More

Abstract:

In this paper we discuss techniques for potential speedups in k-medoids clustering. Specifically, we address the advantages of pre-caching the pairwise distance matrix, heart of the k-medoids clustering algorithm, not only in order to speedup the execution of the algorithm itself, but also in order to speedup the evaluation of the well-known Silhouette Index and DaviesBouldin Index for clusters' validation. A major disadvantage of such pre-caching is that it might not be suitable for large datasets. To this end, a further contribution consists in proposing parallel and distributed implementations of both the Simplified Silhouette Index and the Davies-Bouldin Index for distributed k-clustering using the Apache Spark framework. Results on real-world pathway maps datasets show the robustness of such distributed implementations, also underlining their effectiveness for structured data.
Date of Conference: 08-13 July 2018
Date Added to IEEE Xplore: 14 October 2018
ISBN Information:
Electronic ISSN: 2161-4407
Conference Location: Rio de Janeiro, Brazil

References

References is not available for this document.