Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings

Jovanović, Djordje D.; Vuletić, Pavle V.

doi:10.1007/s11760-025-03813-5

Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings

Original Paper
Published: 28 January 2025

Volume 19, article number 254, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Djordje D. Jovanović^1,2 &
Pavle V. Vuletić²

99 Accesses
Explore all metrics

Abstract

This paper, presents a new methodology for IoT botnet detection based on network intra-flow parameter time series analysis and supervised machine learning classification. The study focuses on time series feature extraction and machine learning pipeline improvements and methods to solve the problem of heavily imbalanced datasets, characteristics of many information security use cases. Another side result is the inference of key distinguishing malware behavior features that make them detectable with large precision. The research is based on real-world IoT malware dynamic behavior analysis, The samples were collected over 4 years (2019–2023), presenting one of the most recent IoT malware datasets and a unique long-term malware behavior analysis. The analysis suggests the type and rate of changes in IoT botnet malware behavior and some invariant features that can be used to reliably detect even previously unseen malware samples (so-called zero-day cases). Presented experimental results prove that the synthetic sample generation methodologies used in this study do not overfit the classifiers, but can detect zero-day malware samples with 0.9706 accuracy and 0.9041 f1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Providing Network-Based Datasets and Multi-dimensional Features for IoT Botnet Detection Research

A Novel Approach of Botnets Detection Based on Analyzing Dynamical Network Traffic Behavior

Article 30 April 2021

Enhancing energy efficiency and imbalance handling in botnet detection in IoT networks: a multi-stage feature reduction and weighted approach

Article 24 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Vormayr, G., Zseby, T., Fabini, J.: Botnet communication patterns. IEEE Commun. Surv. Tutor. 19(4), 2768–2796 (2017)
Article Google Scholar
Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., et al.: Understanding the Mirai Botnet. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 1093–1110. USENIX Association, Vancouver (2017). https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/antonakakis
Jovanovic, D.D., Vuletic, P.V.: PI-BODE: programmable intraflow-based IoT botnet detection system. Comput. Sci. Inf. Syst. 21(1), 37–56 (2024). https://doi.org/10.2298/CSIS211116064J
Article Google Scholar
Livadas, C., Walsh, R., Lapsley, D., Strayer, W.T.: Usilng machine learning techniques to identify botnet traffic. In: Proceedings. 2006 31st IEEE Conference on local computer networks, pp. 967–974. IEEE (2006)
Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 100, 779–796 (2019). https://doi.org/10.1016/j.future.2019.05.041
Article Google Scholar
Lee, J.S., Jeong, H., Park, J.H., Kim, M., Noh, B.N.: The activity analysis of malicious HTTP-based botnets using degree of periodic repeatability. In: 2008 International Conference on Security Technology, pp. 83–86 (2008)
Eslahi, M., Rohmad, M.S., Nilsaz, H., Naseri, M.V., Tahir, N.M., Hashim, H.: Periodicity classification of HTTP traffic to detect HTTP Botnets. In: 2015 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp. 119–123 (2015)
Wang, W., Shang, Y., He, Y., Li, Y., Liu, J.: BotMark: automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors. Inf. Sci. 09, 511 (2019). https://doi.org/10.1016/j.ins.2019.09.024
Article MATH Google Scholar
Cusack, G., Michel, O., Keller, E.: Machine learning-based detection of ransomware using SDN. In: Proceedings of the 2018 ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization. SDN-NFV Sec’18, pp. 1-6. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3180465.3180467
Liaqat, S., Akhunzada, A., Shaikh, F.S., Giannetsos, A., Jan, M.A.: SDN orchestration to combat evolving cyber threats in Internet of Medical Things (IoMT). Comput. Commun. 160, 697–705 (2020). https://doi.org/10.1016/j.comcom.2020.07.006
Article Google Scholar
De La Torre, Parra G., Rad, P., Choo, K.K.R., Beebe, N.: Detecting Internet of Things attacks using distributed deep learning. J. Netw. Comput. Appl. 163, 102662 (2020). https://doi.org/10.1016/j.jnca.2020.102662
Article MATH Google Scholar
Stiawan, D., Bin Idris, M.Y., Bamhdi, A.M., Budiarto, R.: CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 8, 132911–132921 (2020). https://doi.org/10.1109/ACCESS.2020.3009843
Article Google Scholar
Mhmood, A., Ergul, O., Rahebi, J.: Detection of cyber attacks on smart grids using improved VGG19 deep neural network architecture and aquila optimizer algorithm. Signal Image Video Process. 18, 1477–1491 (2024). https://doi.org/10.21203/rs.3.rs-3217829/v1
Article Google Scholar
Geetha, C., Johnson, S., Oliver, A., Lekha, D.: Adaptive weighted kernel support vector machine-based circle search approach for intrusion detection in IoT environments. SIViP 04(18), 1–12 (2024). https://doi.org/10.1007/s11760-024-03088-2
Article Google Scholar
Milosevic, M.S., Ciric, V.M.: Extreme minority class detection in imbalanced data for network intrusion. Comput. Secur. 123, 102940 (2022). https://doi.org/10.1016/j.cose.2022.102940
Article MATH Google Scholar
Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021). https://doi.org/10.1016/j.cose.2021.102435
Article MATH Google Scholar
Masoudi-Sobhanzadeh, Y., Emami-Moghaddam, S.: A real-time IoT-based botnet detection method using a novel two-step feature selection technique and the support vector machine classifier. Comput. Netw. 217, 109365 (2022). https://doi.org/10.1016/j.comnet.2022.109365
Article MATH Google Scholar
Aborujilah, A., Nassr, R., Al-Othmani, A., Ali, N., Awang Long, Z., Husen, M.N., et al.: SMOTE-based framework for IoT Botnet attack detection. Adv. Cyber Secur. (2021). https://doi.org/10.1007/978-981-33-6835-4_19
Article Google Scholar
Kumar, R., Malik, A., Ranga, V.: An intellectual intrusion detection system using Hybrid Hunger Games Search and Remora Optimization Algorithm for IoT wireless networks. Knowl. Based Syst. 256, 109762 (2022). https://doi.org/10.1016/j.knosys.2022.109762
Article MATH Google Scholar
Rust-Nguyen, N., Sharma, S., Stamp, M.: Darknet traffic classification and adversarial attacks using machine learning. Comput. Secur. 127, 103098 (2023)
Article MATH Google Scholar
Bojarajulu, B., Tanwar, S.: Customized convolutional neural network model for IoT botnet attack detection. SIViP 06(18), 1–13 (2024). https://doi.org/10.1007/s11760-024-03248-4
Article MATH Google Scholar
Rustam, F., Jurcut, A.D.: Malicious traffic detection in multi-environment networks using novel S-DATE and PSO-D-SEM approaches. Comput. Secur. 136, 103564 (2024)
Article MATH Google Scholar
Qing, Y., Liu, X., Du, Y.: Mitigating data imbalance to improve the generalizability in IoT DDoS detection tasks. J. Supercomput. 80(7), 9935–9960 (2024)
Article MATH Google Scholar
Alfrhan, A.A., Alhusain, R.H., Khan, R.U.: SMOTE: class imbalance problem in intrusion detection system. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1–5. IEEE (2020)
Gonzalez-Cuautle, D., Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, L.K., Portillo-Portillo, J., Olivares-Mercado, J., et al.: Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl. Sci. 10(3), 794 (2020)
Article MATH Google Scholar
Jovanovic, D.D., Vuletic, P.V.: Analysis and characterization of IoT malware command and control communication. In: 27th Telecommunications Forum TELFOR. IEEE (2019). https://ieeexplore.ieee.org/abstract/document/8971194
Jovanovic, G., Vuletic, P.: ETF IoT Botnet Dataset. https://doi.org/10.17632/nbs66kvx6n.1
Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W.: Time series feature extraction on basis of scalable hypothesis tests tsfresh: a python package. Neurocomputing 307, 72–77 (2017). https://doi.org/10.1016/j.neucom.2018.03.067
Article Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer (2005)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet MATH Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE 2008, pp. 1322–1328 (2008)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
MATH Google Scholar
Taheri, R., Ahmadzadeh, M.: Studying the effect of discretization of data on accuracy of predicting Naïve Bayes algorithm, case study KDD99 CUP. J. Curr. Res. Sci. S, 457–462 (2016)
MATH Google Scholar
Taheri, R., Ahmadzadeh, M., Kharazmi, M.: A new approach for feature selection in intrusion detection system. Cumhuriyet Dent. J. 01(36), 1344–1357 (2015)
MATH Google Scholar
Chen, X., Jeong, J.C.: Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 429–435. IEEE (2007)
Hamed, T. Recursive Feature Addition: A Novel Feature Selection Technique, Including a Proof of Concept in Network Security. Ph.D. Dissertation, The University of Guelph, Guelph, ON, Canada, 2017
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta: a system for feature selection. Fund. Inform. 101(4), 271–285 (2010)
MathSciNet MATH Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010)
Article MATH Google Scholar
cerlymarco.: Shap-hypertune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models. Figshare https://github.com/cerlymarco/shap-hypetune
Watanabe, S.: Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127 (2023)
Nguyen, D.A., Kong, J., Wang, H., Menzel, S., Sendhoff, B., Kononova, A.V.: Improved automated cash optimization with tree Parzen estimators for class imbalance problems. In: IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–9. IEEE 2021 (2021)
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123. PMLR (2013)
Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)
Article MATH Google Scholar
Sheridan, R.P., Wang, W.M., Liaw, A., Ma, J., Gifford, E.M.: Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 56(12), 2353–2360 (2016)
Article MATH Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Article MathSciNet MATH Google Scholar
Lichy, A., Bader, O., Dubin, R., Dvir, A., Hajaj, C.: When a RF beats a CNN and GRU, together—a comparison of deep learning and classical machine learning approaches for encrypted malware traffic classification. Int. J. Inf. Secur. (2022). https://doi.org/10.48550/arXiv.2206.08004
Article Google Scholar
Ziza, K., Tadic, P., Vuletic, P.: DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. 22, 1865–1880 (2023). https://doi.org/10.1007/s10207-023-00723-w
Article MATH Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) 30,3149–3157. Curran Associates Inc, Red Hook, NY, USA (2017)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021)
Article MATH Google Scholar
Sagi, O., Rokach, L.: Approximating XGBoost with an interpretable decision tree. Inf. Sci. 572, 522–542 (2021)

Download references

Acknowledgements

This research was partially financially supported by the Ministry of Science, Technological Development, and Innovation of the Republic of Serbia (Contract No. 451-03-68/2024-03/200103).

Author information

Authors and Affiliations

Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, Belgrade, 11000, Serbia
Djordje D. Jovanović
University of Belgrade, School of Electrical Engineering, Bulevar Kralja Aleksandra 73, Belgrade, 11000, Serbia
Djordje D. Jovanović & Pavle V. Vuletić

Authors

Djordje D. Jovanović
View author publications
You can also search for this author inPubMed Google Scholar
Pavle V. Vuletić
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Djordje D. Jovanović.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jovanović, D.D., Vuletić, P.V. Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings. SIViP 19, 254 (2025). https://doi.org/10.1007/s11760-025-03813-5

Download citation

Received: 04 October 2024
Revised: 13 December 2024
Accepted: 03 January 2025
Published: 28 January 2025
DOI: https://doi.org/10.1007/s11760-025-03813-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Providing Network-Based Datasets and Multi-dimensional Features for IoT Botnet Detection Research

A Novel Approach of Botnets Detection Based on Analyzing Dynamical Network Traffic Behavior

Enhancing energy efficiency and imbalance handling in botnet detection in IoT networks: a multi-stage feature reduction and weighted approach

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now