skip to main content
10.1145/1601966.1601984acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

A novel measure for validating clustering results applied to road traffic

Published: 28 June 2009 Publication History

Abstract

The clustering validation and clustering interpretation are the two last steps of clustering process. The validation step permits to evaluate the goodness of clustering results using some measures. Valid results are then generally interpreted and used in cluster analysis. The validity measures are classified into three categories: unsupervised measures, supervised measures and relative measures. Several supervised measures have been proposed to perform supervised evaluation such as entropy, purity, F-measure, Jaccard coefficient and Rand statistic. Generally, these measures evaluate results according to class labels. However, they are not always able to distinguish interpretable clusters because most of them depends on the number of labels. This paper proposes a new supervised evaluation measure - called "homogeneity degree"- that permits to merge the steps of validation and interpretation. Our measure is applied to a real traffic data set and is used to interpret some traffic situations. Comparison with other evaluation measures shows the performance of our proposal.

References

[1]
N. Bolshakova and F. Azuaje. Cluster validation techniques for genome expression data. Signal Process., 83(4):825--833, 2003.
[2]
F. Boutin and M. Hascoet. Cluster validity indices for graph partitioning. In Proc. of the Conference on Information Visualization, pages 376--381, Londin, UK, 2004.
[3]
N. Chinchor. Muc-4 evaluation metrics. In Proceedings of the 4th conference on Message understanding (MUC4 '92), pages 22--29, 1992.
[4]
D.-L. Davies and D.-W. Bouldin. Cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(2):224--227, 1979.
[5]
J.-C. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetica, 4:95--104, 1974.
[6]
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2):107--145, 2001.
[7]
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Cluster validity methods: part i. SIGMOD Rec., 31(2):40--45, 2002.
[8]
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Clustering validity checking methods: Part ii. SIGMOD Record, 31(3):19--27, 2002.
[9]
J. Mcqueen. some methods for classification and analysis of multivariate observations. In 5th Berkeley Symp. on Math. Statistics and Probability, pages 281--298, Berkley, USA, 1967.
[10]
P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20(1):53--65, 1987.
[11]
S. Saitta, B. Raphael, and I. F. C. Smith. A Bounded Index for Cluster Validity. In Machine Learning and Data Mining in Pattern Recognition, pages 174--187, 2007.
[12]
C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379--423, 623--656, 1948.
[13]
P.-N. Tan, M. Steinbach, and K. Kumar. Introduction to Data Mining. Pearson Addison Wesley, 2005.
[14]
R. Tibshirani and G. Walther. Cluster validation by prediction strength. Journal of Computational&Graphical Statistics, 14(3):511--528, September 2005.
[15]
C. J. van Rijsbergen. Information Retrieval (2nd ed.). Butterworth, 1979.

Cited By

View all
  • (2012)Interpretability-based validity methods for clustering results evaluationJournal of Intelligent Information Systems10.1007/s10844-011-0185-039:1(109-139)Online publication date: 1-Aug-2012
  • (2011)HS-measureProceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools10.5555/2151688.2151719(274-280)Online publication date: 16-May-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SensorKDD '09: Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
June 2009
150 pages
ISBN:9781605586687
DOI:10.1145/1601966
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. external criteria
  3. supervised measure
  4. validity

Qualifiers

  • Research-article

Conference

KDD09
Sponsor:

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Interpretability-based validity methods for clustering results evaluationJournal of Intelligent Information Systems10.1007/s10844-011-0185-039:1(109-139)Online publication date: 1-Aug-2012
  • (2011)HS-measureProceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools10.5555/2151688.2151719(274-280)Online publication date: 16-May-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media