Skip to main content

Self-organising Approach to Anomaly Mitigation in the Cloud-to-Edge Continuum

  • Conference paper
  • First Online:
Cooperative Information Systems (CoopIS 2024)

Abstract

The cloud-to-edge continuum paradigm has permeated various application domains, including critical urban-city safety systems. In these contexts, anomalies can compromise public safety, for example, by disrupting the communication between smart city infrastructure and vehicles, which aims to prevent accidents at pedestrian crossings. Given these environments’ heterogeneous and large-scale nature, manual recovery from anomalies is not feasible. Machine Learning techniques have emerged as an alternative, supporting a zero-touch approach that enables self-organising and self-healing solutions for anomaly prediction, detection, and mitigation. This paper proposes an Artificial Intelligence-driven, self-organising approach for anomaly management in the cloud-to-edge continuum, integrating both reactive and proactive mechanisms. We evaluate different Machine Learning models, including Random Forest Classifiers, Neural Networks, and Convolutional Neural Networks, to predict node performance anomalies. The simulation results obtained using the COSCO framework showcase the effectiveness of our method. It achieves an F1 score of 73% for multiclass classification, predicting different levels of anomaly severity, and 87% for binary classification, distinguishing between normal and abnormal states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Node’s Million Instructions Per Second (MIPS) maximum capacity. It was removed because, for local normalisation, it would always be a 1.

  2. 2.

    https://github.com/brunofaria1322/COSCO.

  3. 3.

    http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains.

  4. 4.

    https://manpages.ubuntu.com/manpages/xenial/man1/stress-ng.1.html.

References

  1. Azure/AzurePublicDataset. Microsoft Azure (2024). https://github.com/Azure/AzurePublicDataset. Accessed 16 June 2023

  2. Arzovs, A., Judvaitis, J., Nesenbergs, K., Selavo, L.: Distributed learning in the IoT-edge-cloud continuum. Mach. Learn. Knowl. Extract. 6(1), 283–315 (2024). https://doi.org/10.3390/make6010015

    Article  MATH  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  4. Chen, M., et al.: Distributed learning in wireless networks: recent progress and future challenges. IEEE J. Sel. Areas Commun. 39(12), 3579–3605 (2021). https://doi.org/10.1109/JSAC.2021.3118346

    Article  MATH  Google Scholar 

  5. European Commission: The European Green Deal - European Commission (2021). https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/european-green-deal_en. Accessed 12 Apr 2024

  6. Coronado, E., et al.: Zero touch management: a survey of network automation solutions for 5G and 6G networks. IEEE Commun. Surv. Tutor. 24(4), 2535–2578 (2022). https://doi.org/10.1109/COMST.2022.3212586

    Article  MATH  Google Scholar 

  7. Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., Bianchini, R.: Resource central: understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, pp. 153–167. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3132747.3132772

  8. Du, Q., He, Yu., Xie, T., Yin, K., Qiu, J.: An approach of collecting performance anomaly dataset for NFV infrastructure. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11336, pp. 59–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05057-3_5

    Chapter  MATH  Google Scholar 

  9. Du, Q., Xie, T., He, Yu.: Anomaly detection and diagnosis for container-based microservices with performance monitoring. In: Vaidya, J., Li, J. (eds.) ICA3PP 2018. LNCS, vol. 11337, pp. 560–572. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05063-4_42

    Chapter  MATH  Google Scholar 

  10. Faria, B.: Self-organising engine for the cloud-to-edge continuum. Master’s thesis, University of Coimbra, Coimbra, Portugal (2023). https://hdl.handle.net/10316/110708

  11. Gallego-Madrid, J., Sanchez-Iborra, R., Ruiz, P.M., Skarmeta, A.F.: Machine learning-based zero-touch network and service management: a survey. Digit. Commun. Netw. 8(2), 105–123 (2022). https://doi.org/10.1016/j.dcan.2021.09.001

    Article  Google Scholar 

  12. Kumar, Y., Farooq, H., Imran, A.: Fault prediction and reliability analysis in a real cellular network. In: 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1090–1095 (2017). https://doi.org/10.1109/IWCMC.2017.7986437

  13. Liyanage, M., et al.: A survey on Zero touch network and Service Management (ZSM) for 5G and beyond networks. J. Netw. Comput. Appl. 203, 103362 (2022). https://doi.org/10.1016/j.jnca.2022.103362

    Article  MATH  Google Scholar 

  14. Mao, B., Tang, F., Kawamoto, Y., Kato, N.: AI models for green communications towards 6G. IEEE Commun. Surv. Tutor. 24(1) (2021). https://doi.org/10.1109/COMST.2021.3130901

  15. Marchese, A., Tomarchio, O.: Sophos: a framework for application orchestration in the cloud-to-edge continuum. In: Proceedings of the 13th International Conference on Cloud Computing and Services Science - CLOSER, pp. 261–268. SCITEPRESS - Science and Technology Publications, Prague (2023). https://doi.org/10.5220/0011972600003488

  16. Moustapha, A.I., Selmic, R.R.: Wireless sensor network modeling using modified recurrent neural networks: application to fault detection. IEEE Trans. Instrum. Measur. 57(5), 981–988 (2008). https://doi.org/10.1109/TIM.2007.913803

    Article  MATH  Google Scholar 

  17. Palakurti, N.R.: Challenges and future directions in anomaly detection. In: Practical Applications of Data Processing, Algorithms, and Modeling, pp. 269–284. IGI Global (2024). https://doi.org/10.4018/979-8-3693-2909-2.ch020

  18. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 38:1–38:38 (2021). https://doi.org/10.1145/3439950

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Pellegrini, A., Sanzo, P.D., Avresky, D.R.: A machine learning-based framework for building application failure prediction models. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1072–1081 (2015). https://doi.org/10.1109/IPDPSW.2015.110

  21. Sauvanaud, C., Kaâniche, M., Kanoun, K., Lazri, K., Da Silva Silvestre, G.: Anomaly detection and diagnosis for cloud services: practical experiments and lessons learned. J. Syst. Softw. 139, 84–106 (2018). https://doi.org/10.1016/j.jss.2018.01.039

    Article  MATH  Google Scholar 

  22. Sauvanaud, C., Lazri, K., Kaâniche, M., Kanoun, K.: Anomaly detection and root cause localization in virtual network functions. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 196–206 (2016). https://doi.org/10.1109/ISSRE.2016.32

  23. Shen, S., Van Beek, V., Iosup, A.: Statistical characterization of business-critical workloads hosted in cloud datacenters. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 465–474 (2015). https://doi.org/10.1109/CCGrid.2015.60

  24. Soualhia, M., Fu, C., Khomh, F.: Infrastructure fault detection and prediction in edge cloud environments. In: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, SEC ’19, pp. 222–235. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3318216.3363305

  25. Sousa, B., et al.: Estudos Preliminares na área do Projeto. Technical report E2.1, Universidade de Coimbra (2021). https://oreos.pt/wp-content/uploads/2022/05/RD-OREOS-17PT-E2.1-EstudosPreliminaresNaA%CC%81reaDoProjeto.pdf

  26. Su, J., et al.: Large language models for forecasting and anomaly detection: a systematic literature review (2024). arXiv:2402.10350. https://doi.org/10.48550/arXiv.2402.10350

  27. Theodoropoulos, T., Violos, J., Tsanakas, S., Leivadeas, A., Tserpes, K., Varvarigou, T.: Intelligent proactive fault tolerance at the edge through resource usage prediction. ITU J. Future Evolving Technol. 3(3), 761–778 (2022). https://doi.org/10.52953/EHJP3291

    Article  Google Scholar 

  28. Tuli, S., Poojara, S.R., Srirama, S.N., Casale, G., Jennings, N.R.: COSCO: container orchestration using co-simulation and gradient based optimization for fog computing environments. IEEE Trans. Parallel Distrib. Syst. 33(1), 101–116 (2022). https://doi.org/10.1109/TPDS.2021.3087349

    Article  Google Scholar 

  29. Tusa, F., Clayman, S.: End-to-end slices to orchestrate resources and services in the cloud-to-edge continuum. Future Gener. Comput. Syst. 141, 473–488 (2023). https://doi.org/10.1016/j.future.2022.11.026

    Article  Google Scholar 

  30. Verdecchia, R., Sallou, J., Cruz, L.: A systematic review of Green AI. WIREs Data Min. Knowl. Discov. 13(4), e1507 (2023). https://doi.org/10.1002/widm.1507

    Article  MATH  Google Scholar 

  31. Zhang, T., Zhu, K., Hossain, E.: Data-driven machine learning techniques for self-healing in cellular wireless networks: challenges and solutions. Intell. Comput. 2022, 1–8 (2022). https://doi.org/10.34133/2022/9758169

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is funded by the FCT - Foundation for Science and Technology, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R&D Unit - UIDB/00326/2020 or project code UIDP/00326/2020.

Content produced within the scope of the Agenda “NEXUS - Pacto de Inovação - Transição Verde e Digital para Transportes, Logística e Mobilidade”, financed by the Portuguese Recovery and Resilience Plan (PRR), with no. C645112083-00000059 (investment project no. .\(^\circ \) 53).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno Faria .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Faria, B., Abreu, D.P., Velasquez, K., Curado, M. (2025). Self-organising Approach to Anomaly Mitigation in the Cloud-to-Edge Continuum. In: Comuzzi, M., Grigori, D., Sellami, M., Zhou, Z. (eds) Cooperative Information Systems. CoopIS 2024. Lecture Notes in Computer Science, vol 15506. Springer, Cham. https://doi.org/10.1007/978-3-031-81375-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-81375-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-81374-0

  • Online ISBN: 978-3-031-81375-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics