poster

C-Affinity: A Novel Similarity Measure for Effective Data Clustering

Authors:

Jiwon Hong,

Sang-Wook KimAuthors Info & Claims

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Pages 41 - 44

https://doi.org/10.1145/3543873.3587307

Published: 30 April 2023 Publication History

Get Access

Abstract

Clustering is widely employed in various applications as it is one of the most useful data mining techniques. In performing clustering, a similarity measure, which defines how similar a pair of data objects are, plays an important role. A similarity measure is employed by considering a target dataset’s characteristics. Current similarity measures (or distances) do not reflect the distribution of data objects in a dataset at all. From the clustering point of view, this fact may limit the clustering accuracy. In this paper, we propose c-affinity, a new notion of a similarity measure that reflects the distribution of objects in the given dataset from a clustering point of view. We design c-affinity between any two objects to have a higher value as they are more likely to belong to the same cluster by learning the data distribution. We use random walk with restart (RWR) on the k-nearest neighbor graph of the given dataset to measure (1) how similar a pair of objects are and (2) how densely other objects are distributed between them. Via extensive experiments on sixteen synthetic and real-world datasets, we verify that replacing the existing similarity measure with our c-affinity improves the clustering accuracy significantly.

References

[1]

Kevin Beyer 1999. When is “Nearest Neighbor” Meaningful?. In Proc. of ICDT. 217–235.

Crossref

Google Scholar

[2]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

Google Scholar

[3]

David Eppstein, Michael S. Paterson, and F. Frances Yao. 1997. On Nearest-Neighbor Graphs. Discrete & Computational Geometry 17, 3 (1997), 263–282.

Digital Library

Google Scholar

[4]

Martin Ester 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. of KDD, Vol. 96. 226–231.

Google Scholar

[5]

Pasi Fränti and Sami Sieranoja. 2018. K-means Properties on Six Clustering Benchmark Datasets. Appl. Intelligence 48, 12 (2018), 4743–4759.

Digital Library

Google Scholar

[6]

Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.

Digital Library

Google Scholar

[7]

George Karypis, Eui-Hong Han, and Vipin Kumar. 1999. Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer 32, 8 (1999), 68–75.

Digital Library

Google Scholar

[8]

Hans-Peter Kriegel 2011. Density-Based Clustering. WIREs: Data Mining and Knowledge Discovery 1, 3 (2011), 231–240.

Crossref

Google Scholar

[9]

Shraddha Pandit and Suchita Gupta. 2011. A Comparative Study on Distance Measuring Approaches for Clustering. Int. J. of Res. in Comp. Sci. 2, 1 (2011), 29–31.

Crossref

Google Scholar

[10]

Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 22, 8 (2000), 888–905.

Digital Library

Google Scholar

[11]

Sami Sieranoja and Pasi Fränti. 2019. Fast and General Density Peaks Clustering. Pattern Recognition Lett. 128 (2019), 551–558.

Digital Library

Google Scholar

[12]

Douglas Steinley. 2004. Properties of the Hubert-Arable Adjusted Rand Index.Psychological methods 9, 3 (2004), 386.

Google Scholar

[13]

Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In Proc. of ICDM. 613–622.

Digital Library

Google Scholar

[14]

Zhao Yang, René Algesheimer, and Claudio J. Tessone. 2016. A Comparative Analysis of Community Detection Algorithms on Artificial Networks. Scientific reports 6, 1 (2016), 1–18.

Google Scholar

Cited By

View all

Szederjesi-Dragomir A(2024)A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven ContextsStudia Universitatis Babeș-Bolyai Informatica10.24193/subbi.2024.1.0369:1(41-56)Online publication date: 10-Jun-2024
https://doi.org/10.24193/subbi.2024.1.03

Index Terms

C-Affinity: A Novel Similarity Measure for Effective Data Clustering
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Unsupervised learning and clustering

Recommendations

A novel travel-time based similarity measure for hierarchical clustering

The similarity measure plays an important role in agglomerative hierarchical clustering. Following the idea of gravitational clustering which treats all the data points as mass points under a hypothetical gravitational force field, we propose a novel ...
A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Intrinsic dimension induced similarity measure for clustering
ADMA'11: Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II

The goal of clustering is to partition the data points into clusters, such that the data points in the same cluster are similar. Therefore, similarity measure is one of the most critical issues for clustering. In this paper, we present a novel similarity ...

Comments

Information & Contributors

Information

Published In

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

April 2023

1567 pages

ISBN:9781450394192

DOI:10.1145/3543873

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Szederjesi-Dragomir A(2024)A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven ContextsStudia Universitatis Babeș-Bolyai Informatica10.24193/subbi.2024.1.0369:1(41-56)Online publication date: 10-Jun-2024
https://doi.org/10.24193/subbi.2024.1.03

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

A novel travel-time based similarity measure for hierarchical clustering

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Intrinsic dimension induced similarity measure for clustering

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations