Skip to main content

A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

Abstract

Nowadays, as lots of data is gathered in large volumes and with high velocity, the development of algorithms capable of handling complex data streams in (near) real-time is a major challenge. In this work, we present the algorithm CorrStream which tackles the problem of detecting arbitrarily oriented subspace clusters in high-dimensional data streams. The proposed method follows a two phase approach, where the continuous online phase aggregates data points within a proper microcluster structure that stores all necessary information to define a microcluster’s subspace and is generic enough to cope with a variety of offline procedures. Given several such microclusters, the offline phase is able to build a final clustering model which reveals arbitrarily oriented subspaces in which the data tend to cluster. In our experimental evaluation, we show that CorrStream not only has an acceptable throughput but also outperforms static counterpart algorithms by orders of magnitude when considering the runtime. At the same time, the loss of accuracy is quite small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min. 1(3), 111–127 (2008)

    Article  MathSciNet  Google Scholar 

  2. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: On exploring complex relationships of correlation clusters. In: Proceedings of SSBDM, p. 7 (2007)

    Google Scholar 

  3. Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A., et al.: Robust, complete, and efficient correlation clustering. In: SDM, pp. 413–418 (2007)

    Google Scholar 

  4. Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: Proceedings of SSBDM, pp. 119–128 (2006)

    Google Scholar 

  5. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB, pp. 81–92 (2003)

    Chapter  Google Scholar 

  6. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: ACM SIGMoD Record, vol. 28, pp. 61–72 (1999)

    Article  Google Scholar 

  7. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces, vol. 29 (2000)

    Article  Google Scholar 

  8. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, vol. 27 (1998)

    Article  Google Scholar 

  9. Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of ACM SIGMOD, pp. 455–466 (2004)

    Google Scholar 

  10. Costeira, J.P., Kanade, T.: A multibody factorization method for independently moving objects. IJCV 29(3), 159–179 (1998)

    Article  Google Scholar 

  11. Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE TPAMI 35(11), 2765–2781 (2013)

    Article  Google Scholar 

  12. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  13. Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-based projected clustering of data streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS (LNAI), vol. 7520, pp. 311–324. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33362-0_24

    Chapter  Google Scholar 

  14. Kazempour, D., Mauder, M., Kröger, P., Seidl, T.: Detecting global hyperparaboloid correlated clusters based on hough transform. In: Proceedings of SSDBM, p. 31. ACM (2017)

    Google Scholar 

  15. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM TKDD 3(1), 1 (2009)

    Article  Google Scholar 

  16. Li, Y.: On incremental and robust subspace learning. Pattern Recognit. 37(7), 1509–1518 (2004)

    Article  Google Scholar 

  17. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of NIPS, pp. 849–856 (2002)

    Google Scholar 

  18. Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.P.: Density-based projected clustering over high dimensional data streams. In: SDM, pp. 987–998 (2012)

    Google Scholar 

  19. Patel, V.M., Van Nguyen, H., Vidal, R.: Latent space sparse subspace clustering. In: Proceedings of ICCV, pp. 225–232 (2013)

    Google Scholar 

  20. Peng, X., Xiao, S., Feng, J., Yau, W.Y., Yi, Z.: Deep subspace clustering with sparsity prior. In: IJCAI, pp. 1925–1931 (2016)

    Google Scholar 

  21. Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM CSUR 46(1), 13:1–13:31 (2013)

    MATH  Google Scholar 

  22. Tung, A.K., Xu, X., Ooi, B.C.: Curler: finding and visualizing nonlinear correlation clusters. In: Proceedings of SIGMOD, pp. 467–478 (2005)

    Google Scholar 

  23. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114 (1996)

    Article  Google Scholar 

Download references

Acknowledgement

This work was partially funded by Siemens AG and has been developed in cooperation with the Munich Center for Machine Learning (MCML), funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Felix Borutta or Peer Kröger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borutta, F., Kröger, P., Hubauer, T. (2019). A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32047-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32046-1

  • Online ISBN: 978-3-030-32047-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics