Synchronization-based scalable subspace clustering of high-dimensional data

Shao, Junming; Wang, Xinzuo; Yang, Qinli; Plant, Claudia; Böhm, Christian

doi:10.1007/s10115-016-1013-1

Synchronization-based scalable subspace clustering of high-dimensional data

Regular Paper
Published: 02 December 2016

Volume 52, pages 83–111, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Junming Shao¹,
Xinzuo Wang¹,
Qinli Yang¹,
Claudia Plant² &
…
Christian Böhm³

670 Accesses
22 Citations
Explore all metrics

Abstract

How to address the challenges of the “curse of dimensionality” and “scalability” in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information-Theoretic Non-redundant Subspace Clustering

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace Clustering—A Survey

References

Aeyels D, De Smet F (2008) A mathematical model for the dynamics of clustering. Phys D Nonlinear Phenom 273(19):2517–2530
Article MathSciNet MATH Google Scholar
Aggarwal CC, Wolf JL, Yu PS et al (1999) Fast algorithms for projected clustering. ACM SIGMOD international conference on management of data, pp 61–72
Aggarwal CC, Yu P S (2000) Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD international conference on management of data, pp 70–81
Agrawal R, Gehrke JE, Gunopulos D et al (1998) Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD international conference on management of data, pp 94–105
Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data, pp 49–60
Arenas A, Diaz-Guilera A, Perez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96(11):1–4
Article MATH Google Scholar
Arenas A, Diaz-Guilera A, Kurths J et al (2008) Synchronization in complex networks. Phys Rep 469:93–153
Article MathSciNet Google Scholar
Bahrololoum A, Nezamabadi-pour H, Saryazdi S (2015) A data clustering approach based on universal gravity rule. Eng Appl Artif Intell 45:415–428
Article Google Scholar
Böhm C, Kailing K, Kröger P et al (2004) Computing clusters of correlation connected objects. ACM SIGMOD international conference on management of data, pp 455–466
Böhm C, Plant C, Shao J et al (2010) Clustering by synchronization. ACM SIGKDD international conference on knowledge discovery and data mining, pp 583–592
Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. ACM SIGKDD international conference on knowledge discovery and data mining, pp 84–93
Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Article Google Scholar
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet MATH Google Scholar
Givoni I, Chung C, Frey B (2011) Hierarchical affinity propagation. 27th conference on uncertainty in artificial intelligence, Barcelona, Spain
Goil S, Nagesh H, Choudhary A (1999) MAFIA: efficient and scalable subspace clustering for very large data sets. ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452
Günnemann S, Faloutsos C (2013) Mixed membership subspace clustering. IEEE international conference on data mining, pp 221–230
Hinneburg A, Keim DA (1999) Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. International conference on very large data bases, pp 506–517
Huang J, Sun H, Kang J et al (2013) ESC: an efficient synchronization-based clustering algorithm. Knowl Based Syst 40:111–122
Article Google Scholar
Indulska M, Orlowska M (2002) Gravity based spatial clustering. ACM international symposium on advances in geographic information systems, pp 125–130
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Kailing K, Kriegel HP, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. SIAM international conference on data mining, p 4
Kim CS, Bae CS, Tcha HJ (2008) A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinform 9:1
Article Google Scholar
Kuramoto Y(1975) Self-entrainment of a population of coupled nonlinear oscillators. In: Araki H (ed) Proceedings of the international symposium on mathematical problems in theoretical physics. Lecture notes in physics. Springer, New York, pp 420–422
Kuramoto Y (1984) Chemical oscillations, waves, and turbulence. Springer, Berlin
Book MATH Google Scholar
Liu J, Wang W (2003) Op-cluster: clustering by tendency in high dimensional space. IEEE international conference on data mining, pp 187–194
Oyang Y, Chen C, Yang T (2001) A study on the hierarchical data clustering algorithm based on gravity theory. Principles of data mining and knowledge discovery, pp 350–361
Procopiuc CM, Jones M, Agarwal PK et al (2002) A Monte Carlo algorithm for fast projective clustering. ACM SIGMOD international conference on management of data, pp 418–427
Shao J (2012) Synchronization on data mining: a universal concept for knowledge discovery. LAP LAMBERT Academic Publishing, Saarbrücken
Google Scholar
Shao J, He X, Böhm C et al (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Discov Data Eng 25(4):893–905
Article Google Scholar
Shao J, Yang Q, Dang H et al (2016) Scalable clustering by iterative partitioning and point attractor representation. ACM Trans Knowl Discov Data 11(1):5
Article Google Scholar
Shao J, Ahmadi Z, Kramer S (2014) Prototype-based Learning on concept-drifting data streams. ACM SIGKDD international conference on knowledge discovery and data mining, pp 512–521
Shao J, Böhm C, Yang Q et al (2010) Synchronization based outlier detection. ECML/PKDD 2010, pp 245–260
Shao J, He X, Yang Q et al (2013) Robust synchronization-based graph clustering. Pacific-Asia conference on knowledge discovery and data mining, pp 249–260
Tung AKH, Xu X, Ooi BC (2005) Curler: finding and visualizing nonlinear correlated clusters. ACM SIGMOD international conference on management of data, pp 467–478
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary?. In: The 26th international conference on machine learning, pp 1073–1080
Wang H, Wang W, Yang J et al (2002) Clustering by pattern similarity in large data sets. ACM SIGMOD international conference on management of data, pp 394–405
Ying W, Chung F, Wang S (2014) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) An efficient data clustering method for very large databases. ACM SIGMOD international conference on management of data, pp 103–114

Download references

Acknowledgements

The research was supported partially by the National Natural Science Foundation of China (Grant Nos. 61403062, 61433014, 41601025), China Postdoctoral Science Foundation (2014M552344, 2015M580786), Science-Technology Foundation for Young Scientist of SiChuan Province (2016JQ0007) and Fundamental Research Funds for the Central Universities (Grant Nos. ZYGX2014J053, ZYGX2014J091).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, China
Junming Shao, Xinzuo Wang & Qinli Yang
Institute for Computer Science, University of Vienna, 1090, Vienna, Austria
Claudia Plant
Institute for Computer Science, University of Munich, 80538, Munich, Germany
Christian Böhm

Authors

Junming Shao
View author publications
You can also search for this author in PubMed Google Scholar
Xinzuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qinli Yang
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Plant
View author publications
You can also search for this author in PubMed Google Scholar
Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junming Shao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shao, J., Wang, X., Yang, Q. et al. Synchronization-based scalable subspace clustering of high-dimensional data. Knowl Inf Syst 52, 83–111 (2017). https://doi.org/10.1007/s10115-016-1013-1

Download citation

Received: 16 June 2015
Revised: 11 October 2016
Accepted: 18 November 2016
Published: 02 December 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10115-016-1013-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synchronization-based scalable subspace clustering of high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Information-Theoretic Non-redundant Subspace Clustering

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace Clustering—A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Synchronization-based scalable subspace clustering of high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Information-Theoretic Non-redundant Subspace Clustering

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace Clustering—A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation