Abstract
Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali, A.I., Partal, S.Z., Kepke, S., Partal, H.P.: ZigBee and LoRa based wireless sensors for smart environment and IoT applications. In: 2019 1st Global Power, Energy and Communication Conference (GPECOM), pp. 19–23 (2019). https://doi.org/10.1109/GPECOM.2019.8778505
An, N., Duff, A., Noorani, M., Weber, S., Mancoridis, S.: Malware anomaly detection on virtual assistants, pp. 124–131, October 2018. https://doi.org/10.1109/MALWARE.2018.8659366
Antonakakis, M., et al.: Understanding the mirai botnet. In: 26th USENIX security symposium (USENIX Security 17), pp. 1093–1110 (2017)
Aslan, A., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access 8, 6249–6271 (2020). https://doi.org/10.1109/ACCESS.2019.2963724
Bilge, L., Dumitraş, T.: Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. CCS 2012, New York, NY, USA, pp. 833–844. Association for Computing Machinery (2012). https://doi.org/10.1145/2382196.2382284
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2, https://www.sciencedirect.com/science/article/pii/S0031320396001422
Carter, J., Mancoridis, S., Galinkin, E.: Fast, lightweight IoT anomaly detection using feature pruning and PCA. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing. SAC 2022, New York, NY, USA, pp. 133–138. Association for Computing Machinery (2022). https://doi.org/10.1145/3477314.3508377
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Du, W.K.: Tool 78: Reset every TCP packet. https://web.ecs.syr.edu/~wedu/Teaching/cis758/netw522/netwox-doc_html/tools/78.html
Hasan, M., Islam, M.M., Zarif, M.I.I., Hashem, M.: Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet of Things 7, 100059 (2019). https://doi.org/10.1016/j.iot.2019.100059, https://www.sciencedirect.com/science/article/pii/S2542660519300241
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Jain, A., et al.: Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD 2020, pp. 3561–3562, New York, NY, USA. Association for Computing Machinery (2020). https://doi.org/10.1145/3394486.3406477, https://doi.org/10.1145/3394486.3406477
Kang, D.K., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 118–125 (2005). https://doi.org/10.1109/IAW.2005.1495942
Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50, 80–84 (2017). https://doi.org/10.1109/MC.2017.201
Lemay, A., Calvet, J., Menet, F., Fernandez, J.M.: Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72, 26–59 (2018). https://doi.org/10.1016/j.cose.2017.08.005, https://www.sciencedirect.com/science/article/pii/S0167404817301608
Li, S., Zhang, Q., Wu, X., Han, W., Tian, Z., Yu, S.: Attribution classification method of apt malware in IoT using machine learning techniques. Sec. Commun. Netw. 2021 (2021). https://doi.org/10.1155/2021/9396141
Liu, A., Martin, C., Hetherington, T., Matzner, S.: A comparison of system call feature representations for insider threat detection. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 340–347 (2005). https://doi.org/10.1109/IAW.2005.1495972
Mittal, A., Shrivastava, K., Manoria, M.: A review of DDOS attack and its countermeasures in TCP based networks. IJCSES 2, 177–187 (2011). https://doi.org/10.5121/ijcses.2011.2413
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ramos, J.: Using TF-IDF to determine word relevance in document queries, January 2003
Surya, S.R., Magrica, G.A.: A survey on wireless networks attacks. In: 2017 2nd International Conference on Computing and Communications Technologies (ICCCT), pp. 240–247 (2017). https://doi.org/10.1109/ICCCT2.2017.7972278
ThingsBoard - Open source IoT Platform: Thingsboard - open source IoT platform. https://thingsboard.io
Wallach, H.M.: Topic modeling: Beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning. ICML 2006, New York, NY, USA, pp. 977–984, Association for Computing Machinery (2006). https://doi.org/10.1145/1143844.1143967
Acknowledgments
The work was funded in part by Spiros Mancoridis’ Auerbach Berger Chair in Cybersecurity.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Carter, J., Mancoridis, S., Nkomo, M., Weber, S., Dandekar, K.R. (2023). System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-0272-9_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)