Skip to main content

Comparing Anomaly Detection and Classification Algorithms: A Case Study in Two Domains

  • Conference paper
  • First Online:
Software Quality: Higher Software Quality through Zero Waste Development (SWQD 2023)

Abstract

Utilizing large data sets in practical scenarios usually requires identifying, annotating and classifying rare events or anomalies. Although several methods exists, there are two classes of algorithms: anomaly detection algorithms and classification algorithms. Both types of algorithms have different characteristics and in this paper, we set out to compare them on two cases. We use data from a neurointensive care unit and from microwave radio transmissions. We apply Isolation Forest and Random Forest algorithms to find events in the data that occur with a frequency of ca. 1%. The results show that classification algorithms (Random Forest) perform better and can achieve up to 100% accuracy, while the anomaly detection algorithms (Isolation Forest) can achieve only 73% at best. Based on the results, we conclude that it is better to invest in annotating data á priori and use classification algorithms, despite the lower costs of using the anomaly detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, T., Oreshkin, B., Coates, M.: Machine learning approaches to network anomaly detection. In: Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, pp. 1–6. USENIX Association (2007)

    Google Scholar 

  2. Alghushairy, O., Alsini, R., Soule, T., Ma, X.: A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 5(1), 1 (2020)

    Article  Google Scholar 

  3. Block, L., El-Merhi, A., Liljencrantz, J., Naredi, S., Staron, M., Odenstedt Hergès, H.: Cerebral ischemia detection using artificial intelligence (CIDAI) - a study protocol. Acta Anaesthesiol. Scand. 64(9), 1335–1342 (2020)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Chwala, C., Kunstmann, H.: Commercial microwave link networks for rainfall observation: assessment of the current status and future challenges. Wiley Interdiscip. Rev. Water 6(2), e1337 (2019)

    Article  Google Scholar 

  6. Citerio, G., et al.: Data collection and interpretation. Neurocrit. Care 22(3), 360–368 (2015)

    Article  Google Scholar 

  7. Eltanbouly, S., Bashendy, M., AlNaimi, N., Chkirbene, Z., Erbad, A.: Machine learning techniques for network anomaly detection: a survey. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 156–162. IEEE (2020)

    Google Scholar 

  8. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-sklearn 2.0: hands-free AutoML via meta-learning. arXiv preprint arXiv:2007.04074 (2020)

  9. Gao, Y., Ao, H., Wang, K., Zhou, W., Li, Y.: The diagnosis of wired network malfunctions based on big data and traffic prediction: an overview. In: 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 1204–1208. IEEE (2015)

    Google Scholar 

  10. Gaspard, N.: Current clinical evidence supporting the use of continuous EEG monitoring for delayed cerebral ischemia detection. J. Clin. Neurophysiol. 33(3), 211–216 (2016)

    Article  Google Scholar 

  11. Habeeb, R.A.A., Nasaruddin, F., Gani, A., Hashem, I.A.T., Ahmed, E., Imran, M.: Real-time big data processing for anomaly detection: a survey. Int. J. Inf. Manage. 45, 289–307 (2019)

    Article  Google Scholar 

  12. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  13. Hubert, M., Debruyne, M., Rousseeuw, P.J.: Minimum covariance determinant and extensions. Wiley Interdiscip. Rev. Comput. Stat. 10(3), e1421 (2018)

    Article  MathSciNet  Google Scholar 

  14. Komorowski, M.: Artificial intelligence in intensive care: are we there yet? Intensive Care Med. 45(9), 1298–1300 (2019). https://doi.org/10.1007/s00134-019-05662-6

    Article  Google Scholar 

  15. Lewis, C., Parulkar, S.D., Bebawy, J., Sherwani, S., Hogue, C.W.: Cerebral neuromonitoring during cardiac surgery: a critical appraisal with an emphasis on near-infrared spectroscopy. J. Cardiothorac. Vasc. Anesth. 32(5), 2313–2322 (2018)

    Article  Google Scholar 

  16. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)

    Google Scholar 

  17. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  18. Maringer, E.F., Shiland, J., Brodie, D.: There’s more to medicine than machines. Intensive Care Med. 44(6), 930–931 (2018)

    Article  Google Scholar 

  19. Musumeci, F., et al.: Supervised and semi-supervised learning for failure identification in microwave networks. IEEE Trans. Netw. Serv. Manage. 18(2), 1934–1945 (2020)

    Article  Google Scholar 

  20. Omar, S., Ngadi, A., Jebur, H.H.: Machine learning techniques for anomaly detection: an overview. Int. J. Comput. Appl. 79(2) (2013)

    Google Scholar 

  21. Pandazo, K., Shollo, A., Staron, M., Meding, W.: Presenting software metrics indicators: a case study. In: Proceedings of the 20th International Conference on Software Product and Process Measurement (MENSURA), vol. 20 (2010)

    Google Scholar 

  22. Polz, J., Chwala, C., Graf, M., Kunstmann, H.: Rain event detection in commercial microwave link attenuation data using convolutional neural networks. Atmos. Meas. Tech. 13(7), 3835–3853 (2020)

    Article  Google Scholar 

  23. Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, vol. 68, pp. 1–3. AAAI Press (2000)

    Google Scholar 

  24. Pudashine, J., et al.: Deep learning for an improved prediction of rainfall retrievals from commercial microwave links. Water Resour. Res. 56(7) (2020)

    Google Scholar 

  25. Ramos, L.A., et al.: Machine learning improves prediction of delayed cerebral ischemia in patients with subarachnoid hemorrhage. J. Neurointerv. Surg. 11(5), 497–502 (2019)

    Article  MathSciNet  Google Scholar 

  26. Sandberg, A., Pareto, L., Arts, T.: Agile collaborative research: action principles for industry-academia collaboration. IEEE Softw. 28(4), 74–83 (2011)

    Article  Google Scholar 

  27. Schmidt, J.M.: Heart rate variability for the early detection of delayed cerebral ischemia. J. Clin. Neurophysiol. 33(3), 268–274 (2016)

    Article  Google Scholar 

  28. Staron, M., et al.: Robust machine learning in critical care - software engineering and medical perspectives. In: 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN), pp. 62–69. IEEE (2021)

    Google Scholar 

  29. Staron, M., Meding, W., Caiman, M.: Improving completeness of measurement systems for monitoring software development workflows. In: Winkler, D., Biffl, S., Bergsmann, J. (eds.) SWQD 2013. LNBIP, vol. 133, pp. 230–243. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35702-2_14

    Chapter  Google Scholar 

  30. Swedish Meteorological Institute: SMHI öppna data meteorologiska observationer (2017). https://www.smhi.se

  31. Thudumu, S., Branch, P., Jin, J., Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020). https://doi.org/10.1186/s40537-020-00320-x

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miroslaw Staron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Staron, M., Hergés, H.O., Block, L., Sjödin, M. (2023). Comparing Anomaly Detection and Classification Algorithms: A Case Study in Two Domains. In: Mendez, D., Winkler, D., Kross, J., Biffl, S., Bergsmann, J. (eds) Software Quality: Higher Software Quality through Zero Waste Development. SWQD 2023. Lecture Notes in Business Information Processing, vol 472. Springer, Cham. https://doi.org/10.1007/978-3-031-31488-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31488-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31487-2

  • Online ISBN: 978-3-031-31488-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics