Skip to main content

System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection

  • Conference paper
  • First Online:
Ubiquitous Security (UbiSec 2022)

Abstract

Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali, A.I., Partal, S.Z., Kepke, S., Partal, H.P.: ZigBee and LoRa based wireless sensors for smart environment and IoT applications. In: 2019 1st Global Power, Energy and Communication Conference (GPECOM), pp. 19–23 (2019). https://doi.org/10.1109/GPECOM.2019.8778505

  2. An, N., Duff, A., Noorani, M., Weber, S., Mancoridis, S.: Malware anomaly detection on virtual assistants, pp. 124–131, October 2018. https://doi.org/10.1109/MALWARE.2018.8659366

  3. Antonakakis, M., et al.: Understanding the mirai botnet. In: 26th USENIX security symposium (USENIX Security 17), pp. 1093–1110 (2017)

    Google Scholar 

  4. Aslan, A., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access 8, 6249–6271 (2020). https://doi.org/10.1109/ACCESS.2019.2963724

    Article  Google Scholar 

  5. Bilge, L., Dumitraş, T.: Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. CCS 2012, New York, NY, USA, pp. 833–844. Association for Computing Machinery (2012). https://doi.org/10.1145/2382196.2382284

  6. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2, https://www.sciencedirect.com/science/article/pii/S0031320396001422

  7. Carter, J., Mancoridis, S., Galinkin, E.: Fast, lightweight IoT anomaly detection using feature pruning and PCA. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing. SAC 2022, New York, NY, USA, pp. 133–138. Association for Computing Machinery (2022). https://doi.org/10.1145/3477314.3508377

  8. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  9. Du, W.K.: Tool 78: Reset every TCP packet. https://web.ecs.syr.edu/~wedu/Teaching/cis758/netw522/netwox-doc_html/tools/78.html

  10. Hasan, M., Islam, M.M., Zarif, M.I.I., Hashem, M.: Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet of Things 7, 100059 (2019). https://doi.org/10.1016/j.iot.2019.100059, https://www.sciencedirect.com/science/article/pii/S2542660519300241

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  12. Jain, A., et al.: Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD 2020, pp. 3561–3562, New York, NY, USA. Association for Computing Machinery (2020). https://doi.org/10.1145/3394486.3406477, https://doi.org/10.1145/3394486.3406477

  13. Kang, D.K., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 118–125 (2005). https://doi.org/10.1109/IAW.2005.1495942

  14. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50, 80–84 (2017). https://doi.org/10.1109/MC.2017.201

    Article  Google Scholar 

  15. Lemay, A., Calvet, J., Menet, F., Fernandez, J.M.: Survey of publicly available reports on advanced persistent threat actors. Comput. Secur. 72, 26–59 (2018). https://doi.org/10.1016/j.cose.2017.08.005, https://www.sciencedirect.com/science/article/pii/S0167404817301608

  16. Li, S., Zhang, Q., Wu, X., Han, W., Tian, Z., Yu, S.: Attribution classification method of apt malware in IoT using machine learning techniques. Sec. Commun. Netw. 2021 (2021). https://doi.org/10.1155/2021/9396141

  17. Liu, A., Martin, C., Hetherington, T., Matzner, S.: A comparison of system call feature representations for insider threat detection. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, pp. 340–347 (2005). https://doi.org/10.1109/IAW.2005.1495972

  18. Mittal, A., Shrivastava, K., Manoria, M.: A review of DDOS attack and its countermeasures in TCP based networks. IJCSES 2, 177–187 (2011). https://doi.org/10.5121/ijcses.2011.2413

    Article  Google Scholar 

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Ramos, J.: Using TF-IDF to determine word relevance in document queries, January 2003

    Google Scholar 

  21. Surya, S.R., Magrica, G.A.: A survey on wireless networks attacks. In: 2017 2nd International Conference on Computing and Communications Technologies (ICCCT), pp. 240–247 (2017). https://doi.org/10.1109/ICCCT2.2017.7972278

  22. ThingsBoard - Open source IoT Platform: Thingsboard - open source IoT platform. https://thingsboard.io

  23. Wallach, H.M.: Topic modeling: Beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning. ICML 2006, New York, NY, USA, pp. 977–984, Association for Computing Machinery (2006). https://doi.org/10.1145/1143844.1143967

Download references

Acknowledgments

The work was funded in part by Spiros Mancoridis’ Auerbach Berger Chair in Cybersecurity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Carter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carter, J., Mancoridis, S., Nkomo, M., Weber, S., Dandekar, K.R. (2023). System Call Processing Using Lightweight NLP for IoT Behavioral Malware Detection. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0272-9_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0271-2

  • Online ISBN: 978-981-99-0272-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics