Skip to main content

Advertisement

Log in

Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

This paper, presents a new methodology for IoT botnet detection based on network intra-flow parameter time series analysis and supervised machine learning classification. The study focuses on time series feature extraction and machine learning pipeline improvements and methods to solve the problem of heavily imbalanced datasets, characteristics of many information security use cases. Another side result is the inference of key distinguishing malware behavior features that make them detectable with large precision. The research is based on real-world IoT malware dynamic behavior analysis, The samples were collected over 4 years (2019–2023), presenting one of the most recent IoT malware datasets and a unique long-term malware behavior analysis. The analysis suggests the type and rate of changes in IoT botnet malware behavior and some invariant features that can be used to reliably detect even previously unseen malware samples (so-called zero-day cases). Presented experimental results prove that the synthetic sample generation methodologies used in this study do not overfit the classifiers, but can detect zero-day malware samples with 0.9706 accuracy and 0.9041 f1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Vormayr, G., Zseby, T., Fabini, J.: Botnet communication patterns. IEEE Commun. Surv. Tutor. 19(4), 2768–2796 (2017)

    Article  Google Scholar 

  2. Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., et al.: Understanding the Mirai Botnet. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 1093–1110. USENIX Association, Vancouver (2017). https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/antonakakis

  3. Jovanovic, D.D., Vuletic, P.V.: PI-BODE: programmable intraflow-based IoT botnet detection system. Comput. Sci. Inf. Syst. 21(1), 37–56 (2024). https://doi.org/10.2298/CSIS211116064J

    Article  Google Scholar 

  4. Livadas, C., Walsh, R., Lapsley, D., Strayer, W.T.: Usilng machine learning techniques to identify botnet traffic. In: Proceedings. 2006 31st IEEE Conference on local computer networks, pp. 967–974. IEEE (2006)

  5. Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 100, 779–796 (2019). https://doi.org/10.1016/j.future.2019.05.041

    Article  Google Scholar 

  6. Lee, J.S., Jeong, H., Park, J.H., Kim, M., Noh, B.N.: The activity analysis of malicious HTTP-based botnets using degree of periodic repeatability. In: 2008 International Conference on Security Technology, pp. 83–86 (2008)

  7. Eslahi, M., Rohmad, M.S., Nilsaz, H., Naseri, M.V., Tahir, N.M., Hashim, H.: Periodicity classification of HTTP traffic to detect HTTP Botnets. In: 2015 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp. 119–123 (2015)

  8. Wang, W., Shang, Y., He, Y., Li, Y., Liu, J.: BotMark: automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors. Inf. Sci. 09, 511 (2019). https://doi.org/10.1016/j.ins.2019.09.024

    Article  MATH  Google Scholar 

  9. Cusack, G., Michel, O., Keller, E.: Machine learning-based detection of ransomware using SDN. In: Proceedings of the 2018 ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization. SDN-NFV Sec’18, pp. 1-6. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3180465.3180467

  10. Liaqat, S., Akhunzada, A., Shaikh, F.S., Giannetsos, A., Jan, M.A.: SDN orchestration to combat evolving cyber threats in Internet of Medical Things (IoMT). Comput. Commun. 160, 697–705 (2020). https://doi.org/10.1016/j.comcom.2020.07.006

    Article  Google Scholar 

  11. De La Torre, Parra G., Rad, P., Choo, K.K.R., Beebe, N.: Detecting Internet of Things attacks using distributed deep learning. J. Netw. Comput. Appl. 163, 102662 (2020). https://doi.org/10.1016/j.jnca.2020.102662

    Article  MATH  Google Scholar 

  12. Stiawan, D., Bin Idris, M.Y., Bamhdi, A.M., Budiarto, R.: CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 8, 132911–132921 (2020). https://doi.org/10.1109/ACCESS.2020.3009843

    Article  Google Scholar 

  13. Mhmood, A., Ergul, O., Rahebi, J.: Detection of cyber attacks on smart grids using improved VGG19 deep neural network architecture and aquila optimizer algorithm. Signal Image Video Process. 18, 1477–1491 (2024). https://doi.org/10.21203/rs.3.rs-3217829/v1

    Article  Google Scholar 

  14. Geetha, C., Johnson, S., Oliver, A., Lekha, D.: Adaptive weighted kernel support vector machine-based circle search approach for intrusion detection in IoT environments. SIViP 04(18), 1–12 (2024). https://doi.org/10.1007/s11760-024-03088-2

    Article  Google Scholar 

  15. Milosevic, M.S., Ciric, V.M.: Extreme minority class detection in imbalanced data for network intrusion. Comput. Secur. 123, 102940 (2022). https://doi.org/10.1016/j.cose.2022.102940

    Article  MATH  Google Scholar 

  16. Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021). https://doi.org/10.1016/j.cose.2021.102435

    Article  MATH  Google Scholar 

  17. Masoudi-Sobhanzadeh, Y., Emami-Moghaddam, S.: A real-time IoT-based botnet detection method using a novel two-step feature selection technique and the support vector machine classifier. Comput. Netw. 217, 109365 (2022). https://doi.org/10.1016/j.comnet.2022.109365

    Article  MATH  Google Scholar 

  18. Aborujilah, A., Nassr, R., Al-Othmani, A., Ali, N., Awang Long, Z., Husen, M.N., et al.: SMOTE-based framework for IoT Botnet attack detection. Adv. Cyber Secur. (2021). https://doi.org/10.1007/978-981-33-6835-4_19

    Article  Google Scholar 

  19. Kumar, R., Malik, A., Ranga, V.: An intellectual intrusion detection system using Hybrid Hunger Games Search and Remora Optimization Algorithm for IoT wireless networks. Knowl. Based Syst. 256, 109762 (2022). https://doi.org/10.1016/j.knosys.2022.109762

    Article  MATH  Google Scholar 

  20. Rust-Nguyen, N., Sharma, S., Stamp, M.: Darknet traffic classification and adversarial attacks using machine learning. Comput. Secur. 127, 103098 (2023)

    Article  MATH  Google Scholar 

  21. Bojarajulu, B., Tanwar, S.: Customized convolutional neural network model for IoT botnet attack detection. SIViP 06(18), 1–13 (2024). https://doi.org/10.1007/s11760-024-03248-4

    Article  MATH  Google Scholar 

  22. Rustam, F., Jurcut, A.D.: Malicious traffic detection in multi-environment networks using novel S-DATE and PSO-D-SEM approaches. Comput. Secur. 136, 103564 (2024)

    Article  MATH  Google Scholar 

  23. Qing, Y., Liu, X., Du, Y.: Mitigating data imbalance to improve the generalizability in IoT DDoS detection tasks. J. Supercomput. 80(7), 9935–9960 (2024)

    Article  MATH  Google Scholar 

  24. Alfrhan, A.A., Alhusain, R.H., Khan, R.U.: SMOTE: class imbalance problem in intrusion detection system. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1–5. IEEE (2020)

  25. Gonzalez-Cuautle, D., Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, L.K., Portillo-Portillo, J., Olivares-Mercado, J., et al.: Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl. Sci. 10(3), 794 (2020)

    Article  MATH  Google Scholar 

  26. Jovanovic, D.D., Vuletic, P.V.: Analysis and characterization of IoT malware command and control communication. In: 27th Telecommunications Forum TELFOR. IEEE (2019). https://ieeexplore.ieee.org/abstract/document/8971194

  27. Jovanovic, G., Vuletic, P.: ETF IoT Botnet Dataset. https://doi.org/10.17632/nbs66kvx6n.1

  28. Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W.: Time series feature extraction on basis of scalable hypothesis tests tsfresh: a python package. Neurocomputing 307, 72–77 (2017). https://doi.org/10.1016/j.neucom.2018.03.067

    Article  Google Scholar 

  29. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer (2005)

  30. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  31. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  32. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE 2008, pp. 1322–1328 (2008)

  33. Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)

    MATH  Google Scholar 

  34. Taheri, R., Ahmadzadeh, M.: Studying the effect of discretization of data on accuracy of predicting Naïve Bayes algorithm, case study KDD99 CUP. J. Curr. Res. Sci. S, 457–462 (2016)

    MATH  Google Scholar 

  35. Taheri, R., Ahmadzadeh, M., Kharazmi, M.: A new approach for feature selection in intrusion detection system. Cumhuriyet Dent. J. 01(36), 1344–1357 (2015)

    MATH  Google Scholar 

  36. Chen, X., Jeong, J.C.: Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 429–435. IEEE (2007)

  37. Hamed, T. Recursive Feature Addition: A Novel Feature Selection Technique, Including a Proof of Concept in Network Security. Ph.D. Dissertation, The University of Guelph, Guelph, ON, Canada, 2017

  38. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta: a system for feature selection. Fund. Inform. 101(4), 271–285 (2010)

    MathSciNet  MATH  Google Scholar 

  39. Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010)

    Article  MATH  Google Scholar 

  40. cerlymarco.: Shap-hypertune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models. Figshare https://github.com/cerlymarco/shap-hypetune

  41. Watanabe, S.: Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127 (2023)

  42. Nguyen, D.A., Kong, J., Wang, H., Menzel, S., Sendhoff, B., Kononova, A.V.: Improved automated cash optimization with tree Parzen estimators for class imbalance problems. In: IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–9. IEEE 2021 (2021)

  43. Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123. PMLR (2013)

  44. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)

    Article  MATH  Google Scholar 

  45. Sheridan, R.P., Wang, W.M., Liaw, A., Ma, J., Gifford, E.M.: Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 56(12), 2353–2360 (2016)

    Article  MATH  Google Scholar 

  46. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  47. Lichy, A., Bader, O., Dubin, R., Dvir, A., Hajaj, C.: When a RF beats a CNN and GRU, together—a comparison of deep learning and classical machine learning approaches for encrypted malware traffic classification. Int. J. Inf. Secur. (2022). https://doi.org/10.48550/arXiv.2206.08004

    Article  Google Scholar 

  48. Ziza, K., Tadic, P., Vuletic, P.: DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. 22, 1865–1880 (2023). https://doi.org/10.1007/s10207-023-00723-w

    Article  MATH  Google Scholar 

  49. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) 30,3149–3157. Curran Associates Inc, Red Hook, NY, USA (2017)

  50. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

  51. Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021)

    Article  MATH  Google Scholar 

  52. Sagi, O., Rokach, L.: Approximating XGBoost with an interpretable decision tree. Inf. Sci. 572, 522–542 (2021)

Download references

Acknowledgements

This research was partially financially supported by the Ministry of Science, Technological Development, and Innovation of the Republic of Serbia (Contract No. 451-03-68/2024-03/200103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Djordje D. Jovanović.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jovanović, D.D., Vuletić, P.V. Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings. SIViP 19, 254 (2025). https://doi.org/10.1007/s11760-025-03813-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03813-5

Keywords