Abstract
The current progress in computer technology is matched by the increase in the malware and cyber-attacks, resulting in a nearly constant battle between establishing a complete malware detection technique and newly evolving smart malicious code. The analysis of malware is made difficult by the fact that, to a large extent, malware and benign code use the same instructions. This suggests that the difference in behavior might be due not to the instructions used, but in how they are used. In particular, the context in which instructions are used seems to play an important role in deciding between malicious and benign code. This work describes progress towards defining and extracting the context of API from Portable Execution files of the Windows operating system. It is suggested that the context can be used as a feature in a machine learning algorithm towards identifying attempts to corrupt the system and to elude the antivirus scanners through code obfuscation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This example is used here because most people have an intuitive understanding of the words used in it. By contrast, in the domain of API calls, such intuition may be present only in some very experienced domain experts.
References
Alazab, M., Venkataraman, S., Watters, P.: Towards understanding malware behaviour by the extraction of API calls. In: 2010 Second Cybercrime and Trustworthy Computing Workshop, pp. 52–59 (2010)
Anderson, H.S., Roth, P.: Ember: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Bazrafshan, Z., Hashemi, H., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. In: The 5th Conference on Information and Knowledge Technology, pp. 113–120 (2013)
Gavriluţ, D., Cimpoeşu, M., Anton, D., Ciortuz, L.: Malware detection using machine learning. In: 2009 International Multiconference on Computer Science and Information Technology, pp. 735–741. IEEE (2009)
Guido, A.C.M.: Introduction to machine learning with Python: A guide for data scientists (2016)
Li, B., Han, L.: Distance weighted cosine similarity measure for text classification. In: Yin, H., et al. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 611–618. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41278-3_74
Liu, L., Wang, Bs., Yu, B., et al.: Automatic malware classification and new malware detection using machine learning. Front. Inf. Technol. Electron. Eng. 18, 1336–1347 (2017)
Lin, Y., Jiang, J., Lee, S.: A similarity measure for text classification and clustering. IEEE Trans. Knowl. Data Eng. 26(7), 1575–1590 (2014)
Souri, A., Hosseini, R.: A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 8(1), 1–22 (2018). https://doi.org/10.1186/s13673-018-0125-x
McCormick, C.: Word2vec tutorial-the skip-gram model (2016)
McKinney, W.: Python for data analysis :data wrangling with pandas, numpy, ipython (2017)
Microsoft: Server core functions by dll (windows) (2019)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), pp. 421–430. IEEE (2007)
Ranveer, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120(5), 1–7 (2015)
Rong, X.: word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. ICEIS 2(9), 317–320 (2009)
Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK 2009), pp. 74–79 (2009)
Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. 50, 1–40 (2017)
Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047. Association for Computing Machinery (2007)
Acknowledgment
This research was partially supported by the AFRL Award #FA8650 to the University of Cincinnati.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chandrasekaran, M., Ralescu, A., Kapp, D., Kebede, T.M. (2021). Context for API Calls in Malware vs Benign Programs. In: Simian, D., Stoica, L.F. (eds) Modelling and Development of Intelligent Systems. MDIS 2020. Communications in Computer and Information Science, vol 1341. Springer, Cham. https://doi.org/10.1007/978-3-030-68527-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-68527-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68526-3
Online ISBN: 978-3-030-68527-0
eBook Packages: Computer ScienceComputer Science (R0)