Abstract
This paper, presents a new methodology for IoT botnet detection based on network intra-flow parameter time series analysis and supervised machine learning classification. The study focuses on time series feature extraction and machine learning pipeline improvements and methods to solve the problem of heavily imbalanced datasets, characteristics of many information security use cases. Another side result is the inference of key distinguishing malware behavior features that make them detectable with large precision. The research is based on real-world IoT malware dynamic behavior analysis, The samples were collected over 4 years (2019–2023), presenting one of the most recent IoT malware datasets and a unique long-term malware behavior analysis. The analysis suggests the type and rate of changes in IoT botnet malware behavior and some invariant features that can be used to reliably detect even previously unseen malware samples (so-called zero-day cases). Presented experimental results prove that the synthetic sample generation methodologies used in this study do not overfit the classifiers, but can detect zero-day malware samples with 0.9706 accuracy and 0.9041 f1 score.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Vormayr, G., Zseby, T., Fabini, J.: Botnet communication patterns. IEEE Commun. Surv. Tutor. 19(4), 2768–2796 (2017)
Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., et al.: Understanding the Mirai Botnet. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 1093–1110. USENIX Association, Vancouver (2017). https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/antonakakis
Jovanovic, D.D., Vuletic, P.V.: PI-BODE: programmable intraflow-based IoT botnet detection system. Comput. Sci. Inf. Syst. 21(1), 37–56 (2024). https://doi.org/10.2298/CSIS211116064J
Livadas, C., Walsh, R., Lapsley, D., Strayer, W.T.: Usilng machine learning techniques to identify botnet traffic. In: Proceedings. 2006 31st IEEE Conference on local computer networks, pp. 967–974. IEEE (2006)
Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 100, 779–796 (2019). https://doi.org/10.1016/j.future.2019.05.041
Lee, J.S., Jeong, H., Park, J.H., Kim, M., Noh, B.N.: The activity analysis of malicious HTTP-based botnets using degree of periodic repeatability. In: 2008 International Conference on Security Technology, pp. 83–86 (2008)
Eslahi, M., Rohmad, M.S., Nilsaz, H., Naseri, M.V., Tahir, N.M., Hashim, H.: Periodicity classification of HTTP traffic to detect HTTP Botnets. In: 2015 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp. 119–123 (2015)
Wang, W., Shang, Y., He, Y., Li, Y., Liu, J.: BotMark: automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors. Inf. Sci. 09, 511 (2019). https://doi.org/10.1016/j.ins.2019.09.024
Cusack, G., Michel, O., Keller, E.: Machine learning-based detection of ransomware using SDN. In: Proceedings of the 2018 ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization. SDN-NFV Sec’18, pp. 1-6. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3180465.3180467
Liaqat, S., Akhunzada, A., Shaikh, F.S., Giannetsos, A., Jan, M.A.: SDN orchestration to combat evolving cyber threats in Internet of Medical Things (IoMT). Comput. Commun. 160, 697–705 (2020). https://doi.org/10.1016/j.comcom.2020.07.006
De La Torre, Parra G., Rad, P., Choo, K.K.R., Beebe, N.: Detecting Internet of Things attacks using distributed deep learning. J. Netw. Comput. Appl. 163, 102662 (2020). https://doi.org/10.1016/j.jnca.2020.102662
Stiawan, D., Bin Idris, M.Y., Bamhdi, A.M., Budiarto, R.: CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 8, 132911–132921 (2020). https://doi.org/10.1109/ACCESS.2020.3009843
Mhmood, A., Ergul, O., Rahebi, J.: Detection of cyber attacks on smart grids using improved VGG19 deep neural network architecture and aquila optimizer algorithm. Signal Image Video Process. 18, 1477–1491 (2024). https://doi.org/10.21203/rs.3.rs-3217829/v1
Geetha, C., Johnson, S., Oliver, A., Lekha, D.: Adaptive weighted kernel support vector machine-based circle search approach for intrusion detection in IoT environments. SIViP 04(18), 1–12 (2024). https://doi.org/10.1007/s11760-024-03088-2
Milosevic, M.S., Ciric, V.M.: Extreme minority class detection in imbalanced data for network intrusion. Comput. Secur. 123, 102940 (2022). https://doi.org/10.1016/j.cose.2022.102940
Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021). https://doi.org/10.1016/j.cose.2021.102435
Masoudi-Sobhanzadeh, Y., Emami-Moghaddam, S.: A real-time IoT-based botnet detection method using a novel two-step feature selection technique and the support vector machine classifier. Comput. Netw. 217, 109365 (2022). https://doi.org/10.1016/j.comnet.2022.109365
Aborujilah, A., Nassr, R., Al-Othmani, A., Ali, N., Awang Long, Z., Husen, M.N., et al.: SMOTE-based framework for IoT Botnet attack detection. Adv. Cyber Secur. (2021). https://doi.org/10.1007/978-981-33-6835-4_19
Kumar, R., Malik, A., Ranga, V.: An intellectual intrusion detection system using Hybrid Hunger Games Search and Remora Optimization Algorithm for IoT wireless networks. Knowl. Based Syst. 256, 109762 (2022). https://doi.org/10.1016/j.knosys.2022.109762
Rust-Nguyen, N., Sharma, S., Stamp, M.: Darknet traffic classification and adversarial attacks using machine learning. Comput. Secur. 127, 103098 (2023)
Bojarajulu, B., Tanwar, S.: Customized convolutional neural network model for IoT botnet attack detection. SIViP 06(18), 1–13 (2024). https://doi.org/10.1007/s11760-024-03248-4
Rustam, F., Jurcut, A.D.: Malicious traffic detection in multi-environment networks using novel S-DATE and PSO-D-SEM approaches. Comput. Secur. 136, 103564 (2024)
Qing, Y., Liu, X., Du, Y.: Mitigating data imbalance to improve the generalizability in IoT DDoS detection tasks. J. Supercomput. 80(7), 9935–9960 (2024)
Alfrhan, A.A., Alhusain, R.H., Khan, R.U.: SMOTE: class imbalance problem in intrusion detection system. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1–5. IEEE (2020)
Gonzalez-Cuautle, D., Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, L.K., Portillo-Portillo, J., Olivares-Mercado, J., et al.: Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl. Sci. 10(3), 794 (2020)
Jovanovic, D.D., Vuletic, P.V.: Analysis and characterization of IoT malware command and control communication. In: 27th Telecommunications Forum TELFOR. IEEE (2019). https://ieeexplore.ieee.org/abstract/document/8971194
Jovanovic, G., Vuletic, P.: ETF IoT Botnet Dataset. https://doi.org/10.17632/nbs66kvx6n.1
Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W.: Time series feature extraction on basis of scalable hypothesis tests tsfresh: a python package. Neurocomputing 307, 72–77 (2017). https://doi.org/10.1016/j.neucom.2018.03.067
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer (2005)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE 2008, pp. 1322–1328 (2008)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
Taheri, R., Ahmadzadeh, M.: Studying the effect of discretization of data on accuracy of predicting Naïve Bayes algorithm, case study KDD99 CUP. J. Curr. Res. Sci. S, 457–462 (2016)
Taheri, R., Ahmadzadeh, M., Kharazmi, M.: A new approach for feature selection in intrusion detection system. Cumhuriyet Dent. J. 01(36), 1344–1357 (2015)
Chen, X., Jeong, J.C.: Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 429–435. IEEE (2007)
Hamed, T. Recursive Feature Addition: A Novel Feature Selection Technique, Including a Proof of Concept in Network Security. Ph.D. Dissertation, The University of Guelph, Guelph, ON, Canada, 2017
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta: a system for feature selection. Fund. Inform. 101(4), 271–285 (2010)
Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010)
cerlymarco.: Shap-hypertune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models. Figshare https://github.com/cerlymarco/shap-hypetune
Watanabe, S.: Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127 (2023)
Nguyen, D.A., Kong, J., Wang, H., Menzel, S., Sendhoff, B., Kononova, A.V.: Improved automated cash optimization with tree Parzen estimators for class imbalance problems. In: IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–9. IEEE 2021 (2021)
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123. PMLR (2013)
Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)
Sheridan, R.P., Wang, W.M., Liaw, A., Ma, J., Gifford, E.M.: Extreme gradient boosting as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 56(12), 2353–2360 (2016)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Lichy, A., Bader, O., Dubin, R., Dvir, A., Hajaj, C.: When a RF beats a CNN and GRU, together—a comparison of deep learning and classical machine learning approaches for encrypted malware traffic classification. Int. J. Inf. Secur. (2022). https://doi.org/10.48550/arXiv.2206.08004
Ziza, K., Tadic, P., Vuletic, P.: DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. 22, 1865–1880 (2023). https://doi.org/10.1007/s10207-023-00723-w
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) 30,3149–3157. Curran Associates Inc, Red Hook, NY, USA (2017)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021)
Sagi, O., Rokach, L.: Approximating XGBoost with an interpretable decision tree. Inf. Sci. 572, 522–542 (2021)
Acknowledgements
This research was partially financially supported by the Ministry of Science, Technological Development, and Innovation of the Republic of Serbia (Contract No. 451-03-68/2024-03/200103).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jovanović, D.D., Vuletić, P.V. Machine learning pipelines for IoT botnet detection and behavior characterization in heavily imbalanced settings. SIViP 19, 254 (2025). https://doi.org/10.1007/s11760-025-03813-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-025-03813-5