Abstract
When using arbitrarily oriented subspace clustering algorithms one obtains a partitioning of a given data set and for each partition its individual subspace. Since clustering is an unsupervised machine learning task, we may not have “ground truth” labels at our disposal or do not wish to rely on them. What is needed in such cases are internal measure which permits a label-less analysis of the obtained subspace clustering. In this work, we propose methods for revising clusters obtained from arbitrarily oriented correlation clustering algorithms. Initial experiments conducted reveal improvements in the clustering results compared to the original clustering outcome. Our proposed approach is simple and can be applied as a post-processing step on arbitrarily oriented correlation clusterings.
Similar content being viewed by others
Notes
Not to be confused with linear discriminant analysis.
References
Achtert E, Böhm C, Kriegel HP, Kröger P, Zimek A (2007) Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, pp 413–418
Achtert E, Böhm C, David J, Kröger P, Zimek A (2008) Global correlation clustering based on the hough transform. Stat Anal Data Min 1(3):111–127
Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. ACM 29:70–81
Böhm C, Kailing K, Kröger P, Zimek A (2004) Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, New York, pp 455–466
Böhm C, Achtert E, Kröger P, Zimek A, Kriegel H (2007) On exploring complex relationships of correlation clusters. In: 2007 International Conference on Scientific and Statistical Database Management(SSDBM), p 7
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Massey FJ Jr (1951) The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Kambhatla N, Leen TK (1993) Fast nonlinear dimension reduction. In: IEEE International Conference on Neural Networks. IEEE, Piscataway Township, pp 1213–1218
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analy Data Mining 5:363–387
Author information
Authors and Affiliations
Corresponding author
Additional information
Daniyal Kazempour conceptualized and wrote this work during his time at the Ludwig-Maximilians-University Munich.
Rights and permissions
About this article
Cite this article
Kazempour, D., Winter, J., Kröger, P. et al. On Methods and Measures for the Inspection of Arbitrarily Oriented Subspace Clusters. Datenbank Spektrum 21, 213–223 (2021). https://doi.org/10.1007/s13222-021-00388-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-021-00388-6