Abstract
A comprehensive understanding of complex data requires multiple different views. Subspace clustering methods open up multiple interesting views since they support data objects to be assigned to different clusters in different subspaces. Conventional subspace clustering methods yield many redundant clusters or control redundancy by difficult to set parameters. In this paper, we employ concepts from information theory to naturally trade-off the two major properties of a subspace cluster: The quality of a cluster and its redundancy with respect to the other clusters. Our novel algorithm NORD (for NOn-ReDundant) efficiently discovers the truly relevant clusters in complex data sets without requiring any kind of threshold on their redundancy. NORD also exploits the concept of microclusters to support the detection of arbitrarily-shaped clusters. Our comprehensive experimental evaluation shows the effectiveness and efficiency of NORD on both synthetic and real-world data sets and provides a meaningful visualization of both the quality and the degree of the redundancy of the clustering result on first glance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference, pp. 94–105 (1998)
Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM Conference, pp. 719–724 (2008)
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Baumgartner, C., Plant, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Subspace selection for clustering high-dimensional data. In: ICDM, pp. 11–18 (2004)
Costeira, J., Kanade, T.: A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29(3), 159–179 (1998)
Kanatani, K.: Motion segmentation by subspace separation and model selection. In: IEEE ICCV, vol. 2, pp. 586–591 (2001)
Kannan, R., Vempala, S.: Spectral algorithms. Found. Trends Theor. Comput. Sci. 4(3&4), 157–288 (2009)
Meila, M.: Comparing clusterings: an axiomatic view. In: ICML, pp. 577–584 (2005)
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD Conference, pp. 533–541 (2008)
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)
Müller, E., Keller, F., Blanc, S., Böhm, K.: OutRules: a framework for outlier descriptions in multiple context spaces. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 828–832. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33486-3_57
Rissanen, J.: An introduction to the MDL principle. Technical report, Helsinkin Institute for Information Technology (2005)
Rissanen, J.: Information and Complexity in Statistical Modeling. Springer, New York (2007)
Tung, A.K., Xu, X., Ooi, B.C.: CURLER: finding and visualizing nonlinear correlation clusters. In: SIGMOD, pp. 467–478 (2005)
Yang, A.Y., Wright, J., Ma, Y., Sastry, S.S.: Unsupervised segmentation of natural images via lossy data compression. Comput. Vis. Image Underst. 110(2), 212–225 (2008)
Zhang, A., Fawaz, N., Ioannidis, S., Montanari, A.: Guess who rated this movie: identifying users through subspace clustering (2012). CoRR, abs/1208.1544
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hubig, N., Plant, C. (2017). Information-Theoretic Non-redundant Subspace Clustering. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-57454-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)