Abstract
Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec and GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show that each of the four different methods is valid when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: IEEE Symposium on Security and Privacy, pp. 120–128. IEEE (1996)
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)
Kim, G., Yi, H., Lee, J., Paek, Y., Yoon, S.: LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. arXiv preprint arXiv:1611.01726, pp. 1–12 (2016)
Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence (AI), pp. 137–149. Springer (2016)
Sharma, A., Pujari, A.K., Paliwal, K.K.: Intrusion detection using text processing techniques with a kernel based similarity measure. Comput. Secur. 26(7–8), 488–495 (2007)
Murtaza, S.S., Khreich, W., Hamou-Lhadj, A., Gagnon, S.: A trace abstraction approach for host-based anomaly detection. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), pp. 170–177. IEEE (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492. IEEE (2013)
Eskin, E., Lee, W., Stolfo, S.J.: Modeling system calls for intrusion detection with dynamic window sizes. In: DARPA Information Survivability Conference & Exposition II (DISCEX), vol. 1, pp. 165–175. IEEE (2001)
Hoang, X.D., Hu, J.: An efficient hidden markov model training scheme for anomaly intrusion detection of server applications based on system calls. In: IEEE International Conference on Networks (ICon), vol. 2, pp. 470–474. IEEE (2004)
Kosoresow, A.P., Hofmeyer, S.: Intrusion detection via system call traces. IEEE Softw. 14(5), 35–42 (1997)
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Appl. Data Min. Comput. Secur. 6, 77–102 (2002)
Wang, Y., Wong, J., Miner, A.: Anomaly intrusion detection using one class SVM. In: IEEE SMC Information Assurance Workshop, pp. 358–364. IEEE (2004)
Chawla, A., Lee, B., Fallon, S., Jacob, P.: Host based intrusion detection system with combined CNN/RNN model. In: International Workshop on AI in Security, pp. 9–18 (2018)
Xie, M., Hu, J., Yu, X., Chang, E.: Evaluating host-based anomaly detection systems: application of the frequency-based algorithms to ADFA-LD. In: International Conference on Network and System Security, pp. 542–549. Springer (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Ring, M., Landes, D., Dallmann, A., Hotho, A.: IP2Vec: learning similarities between IP adresses. In: Workshop on Data Mining for Cyber Security (DMCS), International Conference on Data Mining Workshops (ICDMW), pp. 657–666. IEEE (2017)
[Online] DARPA 1998/1999 Dataset. https://www.ll.mit.edu/r-d/datasets. Accessed 14 Mar 2019
[Online] UNM Dataset. https://www.cs.unm.edu/~immsec/systemcalls.htm. Accessed 14 Nov 2018
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy, pp. 133–145. IEEE (1999)
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inform. Syst. Secur. (TISSEC) 3(4), 262–294 (2000)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)
Acknowledgements
S.W. is funded by the Bavarian State Ministry of Science and Arts in the framework of the Centre Digitization.Bavaria (ZD.B). S.W. and M.R. are further supported by the BayWISS Consortium Digitization. Last but not least, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wunderlich, S., Ring, M., Landes, D., Hotho, A. (2020). Comparison of System Call Representations for Intrusion Detection. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019). CISIS ICEUTE 2019 2019. Advances in Intelligent Systems and Computing, vol 951. Springer, Cham. https://doi.org/10.1007/978-3-030-20005-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-20005-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20004-6
Online ISBN: 978-3-030-20005-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)