Skip to main content

An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13935))

Included in the following conference series:

  • 1165 Accesses

Abstract

The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code of inc-IKiVAT is on https://github.com/charles-cao/inc-IKiVAT.

References

  1. Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++ a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–1 (2012)

    Google Scholar 

  2. Aggarwal, C.C., Philip, S.Y., Han, J., Wang, J.: A framework for clustering evolving data streams. In: Proceedings 2003 VLDB Conference, pp. 81–92. Elsevier (2003)

    Google Scholar 

  3. Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)

    Google Scholar 

  4. Bezdek, J. C. Hathaway, R.J.: Vat: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks (2002)

    Google Scholar 

  5. Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14), 49–57 (2006)

    Article  Google Scholar 

  6. Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)

    Google Scholar 

  7. Chenaghlou, M., Moshtaghi, M., Leckie, C., Salehi, M.: Online clustering for evolving data streams with online anomaly detection. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 508–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_40

    Chapter  Google Scholar 

  8. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  10. Havens, T.C., Bezdek, J.C., Palaniswami, M.: Scalable single linkage hierarchical clustering for big data. In: 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 396–401. IEEE (2013)

    Google Scholar 

  11. Kang, Z., Lin, Z., Zhu, X., Xu, W.: Structured graph learning for scalable subspace clustering: from single view to multiview. IEEE Trans. Cybernetics (2021)

    Google Scholar 

  12. Kang, Z., Peng, C., Cheng, Q., Xu, Z.: Unified spectral clustering with optimal graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  13. Kumar, D., Bezdek, J.C., Rajasegarar, S., Leckie, C., Palaniswami, M.: A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis. Comput. 33(3), 265–281 (2017)

    Article  Google Scholar 

  14. Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M., Leckie, C., Chan, J., Gubbi, J.: Adaptive cluster tendency visualization and anomaly detection for streaming data. ACM Trans. Knowl. Discovery Data (TKDD) 11(2), 1–40 (2016)

    Google Scholar 

  15. Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. In: 2021 AAAI Conference on Artificial Intelligence (AAAI) (2021)

    Google Scholar 

  16. Liu, H., Wu, J., Liu, T., Tao, D., Fu, Y.: Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29(5), 1129–1143 (2017)

    Article  Google Scholar 

  17. Qin, X., Ting, K.M., Zhu, Y., Lee, V.C.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4755–4762 (2019)

    Google Scholar 

  18. Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: Visual structural assessment and anomaly detection for high-velocity data streams. IEEE Trans. Cybernetics 51(12), 5979–5992 (2021)

    Article  Google Scholar 

  19. Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endowment 15(11), 2321–2333 (2022)

    Article  Google Scholar 

  20. Ting, K.M., Washio, T., Wells, J., Zhang, H., Zhu, Y.: Isolation kernel estimators. Knowledge and Information Systems, pp. 1–29 (2022)

    Google Scholar 

  21. Ting, K.M., Wells, J.R., Washio, T.: Isolation kernel: the x factor in efficient and effective large scale online kernel learning. Data Min. Knowl. Disc. 35(6), 2282–2312 (2021)

    Article  MathSciNet  Google Scholar 

  22. Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 198–206 (2020)

    Google Scholar 

  23. Ting, K.M., Zhu, Y., Carman, M., Zhu, Y., Washio, T., Zhou, Z.H.: Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms. Mach. Learn. 108(2), 331–376 (2019)

    Article  MathSciNet  Google Scholar 

  24. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)

    Google Scholar 

  25. Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: ivat and avat: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)

    Google Scholar 

  26. Yang, Y., Deng, S., Lu, J., Li, Y., Gong, Z., U, L.H., Hao, Z.: Graphlshc: towards large scale spectral hypergraph clustering. Inf. Sci. 544, 117–134 (2021)

    Google Scholar 

  27. Zhu, Y., Ting, K.M., Carman, M.J., Angelova, M.: Cdf transform-and-shift: an effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recogn. 117, 107977 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Heilongjiang Province under grant number LH2021F015, National Foreign Cultural and Educational Expert Project under grant number G2021180008L.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B. et al. (2023). An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33374-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33373-6

  • Online ISBN: 978-3-031-33374-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics