Abstract
Utilizing large data sets in practical scenarios usually requires identifying, annotating and classifying rare events or anomalies. Although several methods exists, there are two classes of algorithms: anomaly detection algorithms and classification algorithms. Both types of algorithms have different characteristics and in this paper, we set out to compare them on two cases. We use data from a neurointensive care unit and from microwave radio transmissions. We apply Isolation Forest and Random Forest algorithms to find events in the data that occur with a frequency of ca. 1%. The results show that classification algorithms (Random Forest) perform better and can achieve up to 100% accuracy, while the anomaly detection algorithms (Isolation Forest) can achieve only 73% at best. Based on the results, we conclude that it is better to invest in annotating data á priori and use classification algorithms, despite the lower costs of using the anomaly detection algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, T., Oreshkin, B., Coates, M.: Machine learning approaches to network anomaly detection. In: Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, pp. 1–6. USENIX Association (2007)
Alghushairy, O., Alsini, R., Soule, T., Ma, X.: A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cogn. Comput. 5(1), 1 (2020)
Block, L., El-Merhi, A., Liljencrantz, J., Naredi, S., Staron, M., Odenstedt Hergès, H.: Cerebral ischemia detection using artificial intelligence (CIDAI) - a study protocol. Acta Anaesthesiol. Scand. 64(9), 1335–1342 (2020)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chwala, C., Kunstmann, H.: Commercial microwave link networks for rainfall observation: assessment of the current status and future challenges. Wiley Interdiscip. Rev. Water 6(2), e1337 (2019)
Citerio, G., et al.: Data collection and interpretation. Neurocrit. Care 22(3), 360–368 (2015)
Eltanbouly, S., Bashendy, M., AlNaimi, N., Chkirbene, Z., Erbad, A.: Machine learning techniques for network anomaly detection: a survey. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 156–162. IEEE (2020)
Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-sklearn 2.0: hands-free AutoML via meta-learning. arXiv preprint arXiv:2007.04074 (2020)
Gao, Y., Ao, H., Wang, K., Zhou, W., Li, Y.: The diagnosis of wired network malfunctions based on big data and traffic prediction: an overview. In: 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 1204–1208. IEEE (2015)
Gaspard, N.: Current clinical evidence supporting the use of continuous EEG monitoring for delayed cerebral ischemia detection. J. Clin. Neurophysiol. 33(3), 211–216 (2016)
Habeeb, R.A.A., Nasaruddin, F., Gani, A., Hashem, I.A.T., Ahmed, E., Imran, M.: Real-time big data processing for anomaly detection: a survey. Int. J. Inf. Manage. 45, 289–307 (2019)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Hubert, M., Debruyne, M., Rousseeuw, P.J.: Minimum covariance determinant and extensions. Wiley Interdiscip. Rev. Comput. Stat. 10(3), e1421 (2018)
Komorowski, M.: Artificial intelligence in intensive care: are we there yet? Intensive Care Med. 45(9), 1298–1300 (2019). https://doi.org/10.1007/s00134-019-05662-6
Lewis, C., Parulkar, S.D., Bebawy, J., Sherwani, S., Hogue, C.W.: Cerebral neuromonitoring during cardiac surgery: a critical appraisal with an emphasis on near-infrared spectroscopy. J. Cardiothorac. Vasc. Anesth. 32(5), 2313–2322 (2018)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Maringer, E.F., Shiland, J., Brodie, D.: There’s more to medicine than machines. Intensive Care Med. 44(6), 930–931 (2018)
Musumeci, F., et al.: Supervised and semi-supervised learning for failure identification in microwave networks. IEEE Trans. Netw. Serv. Manage. 18(2), 1934–1945 (2020)
Omar, S., Ngadi, A., Jebur, H.H.: Machine learning techniques for anomaly detection: an overview. Int. J. Comput. Appl. 79(2) (2013)
Pandazo, K., Shollo, A., Staron, M., Meding, W.: Presenting software metrics indicators: a case study. In: Proceedings of the 20th International Conference on Software Product and Process Measurement (MENSURA), vol. 20 (2010)
Polz, J., Chwala, C., Graf, M., Kunstmann, H.: Rain event detection in commercial microwave link attenuation data using convolutional neural networks. Atmos. Meas. Tech. 13(7), 3835–3853 (2020)
Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, vol. 68, pp. 1–3. AAAI Press (2000)
Pudashine, J., et al.: Deep learning for an improved prediction of rainfall retrievals from commercial microwave links. Water Resour. Res. 56(7) (2020)
Ramos, L.A., et al.: Machine learning improves prediction of delayed cerebral ischemia in patients with subarachnoid hemorrhage. J. Neurointerv. Surg. 11(5), 497–502 (2019)
Sandberg, A., Pareto, L., Arts, T.: Agile collaborative research: action principles for industry-academia collaboration. IEEE Softw. 28(4), 74–83 (2011)
Schmidt, J.M.: Heart rate variability for the early detection of delayed cerebral ischemia. J. Clin. Neurophysiol. 33(3), 268–274 (2016)
Staron, M., et al.: Robust machine learning in critical care - software engineering and medical perspectives. In: 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN), pp. 62–69. IEEE (2021)
Staron, M., Meding, W., Caiman, M.: Improving completeness of measurement systems for monitoring software development workflows. In: Winkler, D., Biffl, S., Bergsmann, J. (eds.) SWQD 2013. LNBIP, vol. 133, pp. 230–243. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35702-2_14
Swedish Meteorological Institute: SMHI öppna data meteorologiska observationer (2017). https://www.smhi.se
Thudumu, S., Branch, P., Jin, J., Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020). https://doi.org/10.1186/s40537-020-00320-x
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Staron, M., Hergés, H.O., Block, L., Sjödin, M. (2023). Comparing Anomaly Detection and Classification Algorithms: A Case Study in Two Domains. In: Mendez, D., Winkler, D., Kross, J., Biffl, S., Bergsmann, J. (eds) Software Quality: Higher Software Quality through Zero Waste Development. SWQD 2023. Lecture Notes in Business Information Processing, vol 472. Springer, Cham. https://doi.org/10.1007/978-3-031-31488-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-31488-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31487-2
Online ISBN: 978-3-031-31488-9
eBook Packages: Computer ScienceComputer Science (R0)