An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering

Zhang, Baojie; Cao, Yang; Zhu, Ye; Rajasegarar, Sutharshan; Liu, Gang; Li, Hong Xian; Angelova, Maia; Li, Gang

doi:10.1007/978-3-031-33374-3_16

Baojie Zhang¹⁰,
Yang Cao¹¹,
Ye Zhu¹¹,
Sutharshan Rajasegarar¹¹,
Gang Liu¹²,
Hong Xian Li¹¹,
Maia Angelova¹¹ &
…
Gang Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13935))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1570 Accesses

Abstract

The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Article 21 January 2019

Histogram-based clustering of multiple data streams

Article 19 March 2019

Data stream clustering: a review

Article 21 July 2020

Notes

1.
The code of inc-IKiVAT is on https://github.com/charles-cao/inc-IKiVAT.

References

Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++ a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–1 (2012)
Google Scholar
Aggarwal, C.C., Philip, S.Y., Han, J., Wang, J.: A framework for clustering evolving data streams. In: Proceedings 2003 VLDB Conference, pp. 81–92. Elsevier (2003)
Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
Google Scholar
Bezdek, J. C. Hathaway, R.J.: Vat: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks (2002)
Google Scholar
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14), 49–57 (2006)
Article Google Scholar
Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)
Google Scholar
Chenaghlou, M., Moshtaghi, M., Leckie, C., Salehi, M.: Online clustering for evolving data streams with online anomaly detection. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 508–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_40
Chapter Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol. 96, pp. 226–231 (1996)
Google Scholar
Havens, T.C., Bezdek, J.C., Palaniswami, M.: Scalable single linkage hierarchical clustering for big data. In: 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 396–401. IEEE (2013)
Google Scholar
Kang, Z., Lin, Z., Zhu, X., Xu, W.: Structured graph learning for scalable subspace clustering: from single view to multiview. IEEE Trans. Cybernetics (2021)
Google Scholar
Kang, Z., Peng, C., Cheng, Q., Xu, Z.: Unified spectral clustering with optimal graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Kumar, D., Bezdek, J.C., Rajasegarar, S., Leckie, C., Palaniswami, M.: A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis. Comput. 33(3), 265–281 (2017)
Article Google Scholar
Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M., Leckie, C., Chan, J., Gubbi, J.: Adaptive cluster tendency visualization and anomaly detection for streaming data. ACM Trans. Knowl. Discovery Data (TKDD) 11(2), 1–40 (2016)
Google Scholar
Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. In: 2021 AAAI Conference on Artificial Intelligence (AAAI) (2021)
Google Scholar
Liu, H., Wu, J., Liu, T., Tao, D., Fu, Y.: Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29(5), 1129–1143 (2017)
Article Google Scholar
Qin, X., Ting, K.M., Zhu, Y., Lee, V.C.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4755–4762 (2019)
Google Scholar
Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: Visual structural assessment and anomaly detection for high-velocity data streams. IEEE Trans. Cybernetics 51(12), 5979–5992 (2021)
Article Google Scholar
Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endowment 15(11), 2321–2333 (2022)
Article Google Scholar
Ting, K.M., Washio, T., Wells, J., Zhang, H., Zhu, Y.: Isolation kernel estimators. Knowledge and Information Systems, pp. 1–29 (2022)
Google Scholar
Ting, K.M., Wells, J.R., Washio, T.: Isolation kernel: the x factor in efficient and effective large scale online kernel learning. Data Min. Knowl. Disc. 35(6), 2282–2312 (2021)
Article MathSciNet Google Scholar
Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 198–206 (2020)
Google Scholar
Ting, K.M., Zhu, Y., Carman, M., Zhu, Y., Washio, T., Zhou, Z.H.: Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms. Mach. Learn. 108(2), 331–376 (2019)
Article MathSciNet Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)
Google Scholar
Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: ivat and avat: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)
Google Scholar
Yang, Y., Deng, S., Lu, J., Li, Y., Gong, Z., U, L.H., Hao, Z.: Graphlshc: towards large scale spectral hypergraph clustering. Inf. Sci. 544, 117–134 (2021)
Google Scholar
Zhu, Y., Ting, K.M., Carman, M.J., Angelova, M.: Cdf transform-and-shift: an effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recogn. 117, 107977 (2021)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Heilongjiang Province under grant number LH2021F015, National Foreign Cultural and Educational Expert Project under grant number G2021180008L.

Author information

Authors and Affiliations

Xi’an Shiyou University, Shaanxi, 710065, China
Baojie Zhang
Deakin University, Burwood, VIC, 3125, Australia
Yang Cao, Ye Zhu, Sutharshan Rajasegarar, Hong Xian Li, Maia Angelova & Gang Li
Harbin Engineering University, Harbin, 150001, China
Gang Liu

Authors

Baojie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Sutharshan Rajasegarar
View author publications
You can also search for this author in PubMed Google Scholar
Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Xian Li
View author publications
You can also search for this author in PubMed Google Scholar
Maia Angelova
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Cao .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B. et al. (2023). An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-33374-3_16
Published: 27 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering