Skip to main content

Dynamic Micro-cluster-Based Streaming Data Clustering Method for Anomaly Detection

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1771))

Included in the following conference series:

Abstract

The identification of anomalies in a data stream is a difficulty for decision-making in real time. A memory-constrained online detection system that is able to quickly detect the concept drift of streaming data is required because the constant arrival of massive amounts of streaming data with changing characteristics makes real-time and efficient anomaly detection a difficult task. This is because of the nature of the data itself, which is constantly changing. In this study, a novel model for detecting anomalies using dynamic micro-clusters scheme is developed. The macro-clusters are generated from a network of connected micro-clusters. When new data items are added, the normal patterns that are formed in macro-clusters will update in tandem with the dynamic micro-clusters in an incremental fashion. An outlier may be understood from both a global and a local perspective by examining the global and local densities respectively. The effectiveness of the suggested approach was evaluated with the use of three different datasets. The findings of the experiment demonstrate that the suggested method is superior to earlier algorithms in terms of both the accuracy of detection and the level of computing complexity it requires.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhai, Y., Ong, Y.-S., Tsang, I.W.: The emerging “big dimensionality.” IEEE Comput. Intell. Mag. 9(3), 14–26 (2014). https://doi.org/10.1109/mci.2014.2326099

    Article  Google Scholar 

  2. Hai, T., Zhou, J., Li, N., Jain, S.K., Agrawal, S., Dhaou, I.B.: Cloud-based bug tracking software defects analysis using deep learning. J. Cloud Comput. 11(1), 1–14 (2022)

    Article  Google Scholar 

  3. Shi, Y., Peng, X., Li, R., Zhang, Y.: Unsupervised anomaly detection for network flow using immune network based k-means clustering (chap. 33). In: Data Science, (Communications in Computer and Information Science), pp. 386–399 (2017)

    Google Scholar 

  4. Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newslett. 15(1), 33–40 (2014). https://doi.org/10.1145/2594473.2594479

    Article  Google Scholar 

  5. Hai, T., Alsharif, S., Dhahad, H.A., Attia, E.A., Shamseldin, M.A., Ahmed, A.N.: The evolutionary artificial intelligence-based algorithm to find the minimum GHG emission via the integrated energy system using the MSW as fuel in a waste heat recovery plant. Sustain. Energy Technol. Assess. 53, 102531 (2022)

    Google Scholar 

  6. Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 22(1), 1729–1738 (2017). https://doi.org/10.1007/s10586-017-1066-2

    Article  Google Scholar 

  7. Dromard, J., Roudiere, G., Owezarski, P.: Online and scalable unsupervised network anomaly detection method. IEEE Trans. Netw. Serv. Manag. 14(1), 34–47 (2017). https://doi.org/10.1109/tnsm.2016.2627340

    Article  Google Scholar 

  8. Choi, H., Kim, M., Lee, G., Kim, W.: Unsupervised learning approach for network intrusion detection system using autoencoders. J. Supercomput. 75(9), 5597–5621 (2019). https://doi.org/10.1007/s11227-019-02805-w

    Article  Google Scholar 

  9. Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The UCI KDD archive of large data sets for data mining research and experimentation. ACM SIGKDD Explor. Newslett. 2(2), 81–85 (2000). https://doi.org/10.1145/380995.381030

    Article  Google Scholar 

  10. Prasad, M., Tripathi, S., Dahal, K.: Unsupervised feature selection and cluster center initialization based arbitrary shaped clusters for intrusion detection. Comput. Secur. 99, 19 (2020). https://doi.org/10.1016/j.cose.2020.102062

    Article  Google Scholar 

  11. Verma, A., Ranga, V.: Statistical analysis of CIDDS-001 dataset for network intrusion detection systems using distance-based machine learning. In: Procedia Computer Science, vol. 125, pp. 709–716 (2018). https://doi.org/10.1016/j.procs.2017.12.091. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85040688913&doi=10.1016%2fj.procs.2017.12.091&partnerID=40&md5=b18bb5f2eb83d20d0a8654577709a0c9

  12. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003, pp. 81–92 (2003). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85012236181&partnerID=40&md5=ba9b3babce1e0698d473b70d76f2062d. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85012236181&partnerID=40&md5=ba9b3babce1e0698d473b70d76f2062d

  13. Cao, F., Ester, M., Qian, W.N., Zhou, A.Y.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth Siam International Conference on Data Mining, p. 328 (2006)

    Google Scholar 

  14. Ruiz, C., Menasalvas, E., Spiliopoulou, M.: C-DenStream: using domain knowledge on a data stream. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS (LNAI), vol. 5808, pp. 287–301. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04747-3_23

    Chapter  Google Scholar 

  15. Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009, vol. 5, pp. 248–252 (2009). https://doi.org/10.1109/FSKD.2009.553. https://www.scopus.com/inward/record.uri?eid=2-s2.0-76549115319&doi=10.1109%2fFSKD.2009.553&partnerID=40&md5=f58d4f0a94fd24238b7ad6e84025bfc2

  16. Hyde, R., Angelov, P., MacKenzie, A.R.: Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf. Sci. 382, 96–114 (2017). https://doi.org/10.1016/j.ins.2016.12.004

    Article  Google Scholar 

  17. Hyde, R., Angelov, P.: A new online clustering approach for data in arbitrary shaped clusters. In: Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015, pp. 228–233 (2015). https://doi.org/10.1109/CYBConf.2015.7175937. https://www.scopus.com/inward/record.uri?eid=2-s2.0-84947967804&doi=10.1109%2fCYBConf.2015.7175937&partnerID=40&md5=4b211a62c8fe6bc814762baf234eea83

  18. Islam, M.K., Ahmed, M.M., Zamli, K.Z.: A buffer-based online clustering for evolving data stream. Inf. Sci. 489, 113–135 (2019). https://doi.org/10.1016/j.ins.2019.03.022

    Article  MathSciNet  Google Scholar 

  19. Škrjanc, I., Ozawa, S., Ban, T., Dovžan, D.: Large-scale cyber attacks monitoring using evolving Cauchy possibilistic clustering. Appl. Soft Comput. 62, 592–601 (2018). https://doi.org/10.1016/j.asoc.2017.11.008

    Article  Google Scholar 

  20. Bigdeli, E., Mohammadi, M., Raahemi, B., Matwin, S.: Incremental anomaly detection using two-layer cluster-based structure. Inf. Sci. 429, 315–331 (2018). https://doi.org/10.1016/j.ins.2017.11.023

    Article  MathSciNet  Google Scholar 

  21. Shou, Z., Zou, F., Tian, H., Li, S.: Outlier detection based on local density of vector dot product in data stream. In: Yang, C.-N., Peng, S.-L., Jain, L.C. (eds.) SICBS 2018. AISC, vol. 895, pp. 170–184. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16946-6_14

    Chapter  Google Scholar 

  22. Prasad, M., Tripathi, S., Dahal, K.: An efficient feature selection based Bayesian and rough set approach for intrusion detection. Appl. Soft Comput. 87, 14 (2020). https://doi.org/10.1016/j.asoc.2019.105980

    Article  Google Scholar 

Download references

Funding

This work is supported by the Youth Science and Technology New Star Plan of Shaanxi Province (2021KJXX-50) and Technology New Star Plan of Shaanxi Province (No. 20JS09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohd Nizam Husen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X., Ahmed, M.M., Husen, M.N., Tao, H., Zhao, Q. (2023). Dynamic Micro-cluster-Based Streaming Data Clustering Method for Anomaly Detection. In: Yusoff, M., Hai, T., Kassim, M., Mohamed, A., Kita, E. (eds) Soft Computing in Data Science. SCDS 2023. Communications in Computer and Information Science, vol 1771. Springer, Singapore. https://doi.org/10.1007/978-981-99-0405-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0405-1_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0404-4

  • Online ISBN: 978-981-99-0405-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics