Application of Anomaly Detection Models to Malware Detection in the Presence of Concept Drift

Escudero García, David; DeCastro-García, Noemí

doi:10.1007/978-3-031-40725-3_2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14001))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

581 Accesses

Abstract

Machine learning is one of the main approaches to malware detection in the literature, since machine learning models are more adaptive than signature based solutions. One of the main challenges in the application of machine learning to malware detection is the presence of concept drift, which is a change in the data distribution over time. To tackle drift, online models that can be dynamically updated passively or by actively detecting change are applied. However, these models require new instances to be labelled to update the model. Usually, labels are scarce, cannot be obtained immediately and the presence of imbalance in the data make the construction of an effective model difficult. It has been studied that concept drift has a lower impact on benign instances, so we test the effectiveness of anomaly detection models to detect malware in the presence of concept drift. Anomaly detection models only need benign instances for training, and therefore may be less affected by the scarcity of labelled malicious instances. The results show that anomaly detection models achieve better results than supervised online models in conditions of heavy data imbalance and label scarcity.

Supported by Spanish National Cybersecurity Institute (INCIBE).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amer, M., Goldstein, M., Abdennadher, S.: Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, pp. 8–15 (2013)
Google Scholar
Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22
Chapter Google Scholar
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Article Google Scholar
Ceschin, F., Botacin, M., Gomes, H.M., Pinagé, F., Oliveira, L.S., Grégio, A.: Fast & furious: on the modelling of malware detection as an evolving data stream. Expert Syst. Appl. 212, 118590 (2023). https://doi.org/10.1016/j.eswa.2022.118590
Article Google Scholar
Choras, M., Wozniak, M.: Concept Drift Analysis for Improving Anomaly Detection Systems in Cybersecurity, pp. 35–42 (2017). https://doi.org/10.18690/978-961-286-114-8.3
Cook, J., Ramadas, V.: When to consult precision-recall curves. Stand. Genomic Sci. 20(1), 131–148 (2020)
Google Scholar
Cox, D.R.: The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B (Methodol.) 20(2), 215–242 (1958)
MathSciNet MATH Google Scholar
Darem, A.A., Ghaleb, F.A., Al-Hashmi, A.A., Abawajy, J.H., Alanazi, S.M., Al-Rezami, A.Y.: An adaptive behavioral-based incremental batch learning malware variants detection model using concept drift detection and sequential deep learning. IEEE Access 9, 97180–97196 (2021). https://doi.org/10.1109/ACCESS.2021.3093366
Article Google Scholar
Galloro, N., Polino, M., Carminati, M., Continella, A., Zanero, S.: A systematical and longitudinal study of evasive behaviors in windows malware. Comput. Secur. 113, 102550 (2022). https://doi.org/10.1016/j.cose.2021.102550
Article Google Scholar
Gama, J., Žliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4) (2014). https://doi.org/10.1145/2523813
Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153 (2020). https://doi.org/10.1016/j.jnca.2019.102526
Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106, 1469–1495 (2017)
Article MathSciNet Google Scholar
Guerra-Manzanares, A., Bahsi, H., Nõmm, S.: KronoDroid: time-based hybrid-featured dataset for effective android malware detection and characterization. Comput. Secur. 110, 102399 (2021). https://doi.org/10.1016/j.cose.2021.102399
Article Google Scholar
Guerra-Manzanares, A., Luckner, M., Bahsi, H.: Android malware concept drift using system calls: detection, characterization and challenges. Expert Syst. Appl. 206, 117200 (2022). https://doi.org/10.1016/j.eswa.2022.117200
Article Google Scholar
Halimu, C., Kasem, A., Newaz, S.S.: Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. In: Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, pp. 1–6 (2019)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 97–106. Association for Computing Machinery, New York (2001). https://doi.org/10.1145/502512.502529
Jordaney, R., et al.: Transcend: detecting concept drift in malware classification models. In: 26th USENIX Security Symposium (USENIX Security 2017), Vancouver, BC, pp. 625–642. USENIX Association (2017)
Google Scholar
Kan, Z., Pendlebury, F., Pierazzi, F., Cavallaro, L.: Investigating labelless drift adaptation for malware detection. In: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, AISec 2021, pp. 123–134. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474369.3486873
Kegelmeyer, W.P., Chiang, K., Ingram, J.: Streaming malware classification in the presence of concept drift and class imbalance. In: Proceedings of 12th International Conference on Machine Learning and Applications, vol. 2, pp. 48–53 (2013). https://doi.org/10.1109/ICMLA.2013.104
Kermenov, R., Nabissi, G., Longhi, S., Bonci, A.: Anomaly detection and concept drift adaptation for dynamic systems: a general method with practical implementation using an industrial collaborative robot. Sensors 23(6) (2023). https://doi.org/10.3390/s23063260
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020). https://doi.org/10.1109/ACCESS.2020.3006143
Article Google Scholar
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2019). https://doi.org/10.1109/TKDE.2018.2876857
Article Google Scholar
Manapragada, C., Webb, G.I., Salehi, M.: Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pp. 1953–1962. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3219819.3220005
Matthews, B.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophys. Acta (BBA) - Protein Struct. 405(2), 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9
Montiel, J., et al.: River: machine learning for streaming data in Python (2021)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965
Article MATH Google Scholar
Shahraki, A., Abbasi, M., Taherkordi, A., Jurcut, A.D.: A comparative study on online machine learning techniques for network traffic streams analysis. Comput. Netw. 207, 108836 (2022). https://doi.org/10.1016/j.comnet.2022.108836
Article Google Scholar
Tan, S.C., Ting, K.M., Liu, T.F.: Fast anomaly detection for streaming data. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 2, pp. 1511–15160. AAAI Press (2011)
Google Scholar
Yang, L., et al.: CADE: detecting and explaining concept drift samples for security applications. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2327–2344. USENIX Association (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Institute of Applied Science in Cybersecurity, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
David Escudero García
Department of Mathematics, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
Noemí DeCastro-García

Authors

David Escudero García
View author publications
You can also search for this author in PubMed Google Scholar
Noemí DeCastro-García
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Escudero García .

Editor information

Editors and Affiliations

University of Deusto, Bilbao, Spain
Pablo García Bringas
University of Leon, León, Spain
Hilde Pérez García
University of La Rioja, Logroño, La Rioja, Spain
Francisco Javier Martínez de Pisón
Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
University of Burgos, Burgos, Spain
Álvaro Herrero
University of A Coruña, Ferrol - Coruña, Spain
José Luis Calvo Rolle
University of A Coruña, Ferrol - Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Escudero García, D., DeCastro-García, N. (2023). Application of Anomaly Detection Models to Malware Detection in the Presence of Concept Drift. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-40725-3_2
Published: 29 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40724-6
Online ISBN: 978-3-031-40725-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Application of Anomaly Detection Models to Malware Detection in the Presence of Concept Drift