Clustering with Proximity Graphs: Exact and Efficient Algorithms

Michail Kazimianec, Nikolaus Augsten

Source Title: International Journal of Knowledge-Based Organizations (IJKBO)3(4)

ISSN: 2155-6393|EISSN: 2155-6407|EISBN13: 9781466635920|DOI: 10.4018/ijkbo.2013100105

MLA

Kazimianec, Michail, and Nikolaus Augsten. "Clustering with Proximity Graphs: Exact and Efficient Algorithms." IJKBO vol.3, no.4 2013: pp.84-104. http://doi.org/10.4018/ijkbo.2013100105

APA

Kazimianec, M. & Augsten, N. (2013). Clustering with Proximity Graphs: Exact and Efficient Algorithms. International Journal of Knowledge-Based Organizations (IJKBO), 3(4), 84-104. http://doi.org/10.4018/ijkbo.2013100105

Chicago

Kazimianec, Michail, and Nikolaus Augsten. "Clustering with Proximity Graphs: Exact and Efficient Algorithms," International Journal of Knowledge-Based Organizations (IJKBO) 3, no.4: 84-104. http://doi.org/10.4018/ijkbo.2013100105

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

Graph Proximity Cleansing (GPC) is a string clustering algorithm that automatically detects cluster borders and has been successfully used for string cleansing. For each potential cluster a so-called proximity graph is computed, and the cluster border is detected based on the proximity graph. However, the computation of the proximity graph is expensive and the state-of-the-art GPC algorithms only approximate the proximity graph using a sampling technique. Further, the quality of GPC clusters has never been compared to standard clustering techniques like k-means, density-based, or hierarchical clustering. In this article the authors propose two efficient algorithms, PG-DS and PG-SM, for the exact computation of proximity graphs. The authors experimentally show that our solutions are faster even if the sampling-based algorithms use very small sample sizes. The authors provide a thorough experimental evaluation of GPC and conclude that it is very efficient and shows good clustering quality in comparison to the standard techniques. These results open a new perspective on string clustering in settings, where no knowledge about the input data is available.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Clustering with Proximity Graphs: Exact and Efficient Algorithms

MLA

APA

Chicago

Export Reference

Abstract

Request Access