System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection

Carter, John; Mancoridis, Spiros; Nkomo, Malvin; Weber, Steven; Dandekar, Kapil R.

doi:10.1007/978-981-99-0272-9_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1768))

Included in the following conference series:

International Conference on Ubiquitous Security

705 Accesses
1 Citations

Abstract

Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, A.I., Partal, S.Z., Kepke, S., Partal, H.P.: ZigBee and LoRa based wireless sensors for smart environment and IoT applications. In: 2019 1st Global Power, Energy and Communication Conference (GPECOM), pp. 19–23 (2019). https://doi.org/10.1109/GPECOM.2019.8778505
An, N., Duff, A., Noorani, M., Weber, S., Mancoridis, S.: Malware anomaly detection on virtual assistants, pp. 124–131, October 2018. https://doi.org/10.1109/MALWARE.2018.8659366
Antonakakis, M., et al.: Understanding the mirai botnet. In: 26th USENIX security symposium (USENIX Security 17), pp. 1093–1110 (2017)
Google Scholar
Aslan, A., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access 8, 6249–6271 (2020). https://doi.org/10.1109/ACCESS.2019.2963724
Article Google Scholar
Bilge, L., Dumitraş, T.: Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. CCS 2012, New York, NY, USA, pp. 833–844. Association for Computing Machinery (2012). https://doi.org/10.1145/2382196.2382284
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2, https://www.sciencedirect.com/science/article/pii/S0031320396001422
Carter, J., Mancoridis, S., Galinkin, E.: Fast, lightweight IoT anomaly detection using feature pruning and PCA. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing. SAC 2022, New York, NY, USA, pp. 133–138. Association for Computing Machinery (2022). https://doi.org/10.1145/3477314.3508377
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Du, W.K.: Tool 78: Reset every TCP packet. https://web.ecs.syr.edu/~wedu/Teaching/cis758/netw522/netwox-doc_html/tools/78.html
Hasan, M., Islam, M.M., Zarif, M.I.I., Hashem, M.: Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet of Things 7, 100059 (2019). https://doi.org/10.1016/j.iot.2019.100059, https://www.sciencedirect.com/science/article/pii/S2542660519300241
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Jain, A., et al.: Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD 2020, pp. 3561–3562, New York, NY, USA. Association for Computing Machinery (2020). https://doi.org/10.1145/3394486.3406477, https://doi.org/10.1145/3394486.3406477
Kang, D.K., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 118–125 (2005). https://doi.org/10.1109/IAW.2005.1495942
Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50, 80–84 (2017). https://doi.org/10.1109/MC.2017.201
Article Google Scholar
Lemay, A., Calvet, J., Menet, F., Fernandez, J.M.: Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72, 26–59 (2018). https://doi.org/10.1016/j.cose.2017.08.005, https://www.sciencedirect.com/science/article/pii/S0167404817301608
Li, S., Zhang, Q., Wu, X., Han, W., Tian, Z., Yu, S.: Attribution classification method of apt malware in IoT using machine learning techniques. Sec. Commun. Netw. 2021 (2021). https://doi.org/10.1155/2021/9396141
Liu, A., Martin, C., Hetherington, T., Matzner, S.: A comparison of system call feature representations for insider threat detection. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 340–347 (2005). https://doi.org/10.1109/IAW.2005.1495972
Mittal, A., Shrivastava, K., Manoria, M.: A review of DDOS attack and its countermeasures in TCP based networks. IJCSES 2, 177–187 (2011). https://doi.org/10.5121/ijcses.2011.2413
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries, January 2003
Google Scholar
Surya, S.R., Magrica, G.A.: A survey on wireless networks attacks. In: 2017 2nd International Conference on Computing and Communications Technologies (ICCCT), pp. 240–247 (2017). https://doi.org/10.1109/ICCCT2.2017.7972278
ThingsBoard - Open source IoT Platform: Thingsboard - open source IoT platform. https://thingsboard.io
Wallach, H.M.: Topic modeling: Beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning. ICML 2006, New York, NY, USA, pp. 977–984, Association for Computing Machinery (2006). https://doi.org/10.1145/1143844.1143967

Download references

Acknowledgments

The work was funded in part by Spiros Mancoridis’ Auerbach Berger Chair in Cybersecurity.

Author information

Authors and Affiliations

Drexel University, Philadelphia, PA, 19104, USA
John Carter, Spiros Mancoridis, Malvin Nkomo, Steven Weber & Kapil R. Dandekar

Authors

John Carter
View author publications
You can also search for this author in PubMed Google Scholar
Spiros Mancoridis
View author publications
You can also search for this author in PubMed Google Scholar
Malvin Nkomo
View author publications
You can also search for this author in PubMed Google Scholar
Steven Weber
View author publications
You can also search for this author in PubMed Google Scholar
Kapil R. Dandekar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Carter .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
Temple University, Philadelphia, PA, USA
Jie Wu
Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Ernesto Damiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carter, J., Mancoridis, S., Nkomo, M., Weber, S., Dandekar, K.R. (2023). System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-0272-9_7
Published: 16 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection