Abstract
Application Programming Interface (API) calls in windows operating system (OS) is an attractive feature for malware analysis and detection as they can properly reflect the actions of portable executable (PE) files. In this paper, we provide an approach based on sequential pattern mining (SPM) for the analysis of malware behavior during executions. A dataset that contains sequences of API calls made by different malware on Windows OS is first abstracted into a suitable format (sequences of integers). SPM algorithms are then used on the corpus to find frequent API calls and their patterns. Moreover, sequential rules between API calls patterns as well as maximal and closed frequent API calls are discovered. Obtained preliminary results suggest that discovered frequent patterns of API calls and sequential rules between them can be used in the development of malware detection and classification techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abouelhoda, M., Ghanem, M.: String mining in bioinformatics. In: Gaber, M. (ed.) Scientific Data Mining and Knowledge Discovery, pp. 207–247. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02788-8_9
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB, pp. 487–499 (1994)
Ahmadi, M., Sami, A., Rahimi, H., Yadegari, B.: Malware detection by behavioural sequential patterns. Comput. Fraud Secur. 2013(8), 11–19 (2013)
Çatak, F.Ö., Yazi, A.F.: A benchmark API call dataset for windows PE malware classification. CoRR, abs/1905.01999 (2019)
Çatak, F.Ö., Yazi, A.F., Elezaj, O., Ahmed, J.: Deep learning based sequential model for malware analysis using Windows exe API calls. Peer J. Comput. Sci. 6, e285 (2020)
Cho, I.K., Im, E.G.: Extracting representative API patterns of malware families using multiple sequence alignments. In: Proceedings of RACS, pp. 308–313 (2015)
Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16–25 (2016)
Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 40–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_4
Fournier-Viger, P., Gomariz, A., Gueniche, T., Mwamikazi, E., Thomas, R.: TKS: efficient mining of Top-K sequential patterns. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013. LNCS (LNAI), vol. 8346, pp. 109–120. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53914-5_10
Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using partially-ordered sequential rules to generate more accurate sequence prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 431–442. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35527-1_36
Fournier-Viger, P., Gueniche, T., Zida, S., Tseng, V.S.: ERMiner: sequential rule mining using equivalence classes. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 108–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_10
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recogn. 1(1), 54–77 (2017)
Fournier-Viger, P., Wu, C.-W., Gomariz, A., Tseng, V.S.: VMSP: efficient vertical mining of maximal sequential patterns. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 83–94. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_8
Fumarola, F., Lanotte, P.F., Ceci, M., Malerba, D.: CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl. Inf. Syst. 48(2), 429–463 (2016)
Griffin, K., Schneider, S., Hu, X., Chiueh, T.: Automatic generation of string signatures for malware detection. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 101–120. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04342-0_6
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)
Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 11, 659101:1–659101:9 (2015)
Mustafa, R.U., Nawaz, M.S., Ferzund, J., Lali, M.I.U., Shahzad, B., Fournier-Viger, P.: Early detection of controversial Urdu speeches from social media. Data Sci. Pattern Recogn. 1(2), 26–42 (2017)
Nawaz, M.S., Fournier-Viger, P., Shojaee, A., Fujita, H.: Using artificial intelligence techniques for COVID-19 genome analysis. Appl. Intell. 51(5), 3086–3103 (2021)
Nawaz, M.S., Fournier-Viger, P., Zhang, J.: Proof learning in PVS with utility pattern mining. IEEE Access 8, 119806–119818 (2020)
Nawaz, M.S., Sun, M., Fournier-Viger, P.: Proof guidance in PVS with sequential pattern mining. In: Hojjat, H., Massink, M. (eds.) FSEN 2019. LNCS, vol. 11761, pp. 45–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31517-7_4
Ni, L., Luo, W., Lu, N., Zhu, W.: Mining the local dependency itemset in a products network. ACM Trans. Manage. Inf. Syst. 11(1), 3:1–3:31 (2020)
Pektas, A., Pektas, E.N., Acarman, T.: Mining patterns of sequential malicious APIs to detect malware. Int. J. Netw. Secur. Appl. 10(4), 1–9 (2018)
Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using small sets of frequent part-of-speech skip-grams. In: Proceedings of FLAIRS, pp. 86–91 (2016)
Qiao, Y., Yang, Y., He, J., Tang, C., Liu, Z.: CBM: free, automatic malware analysis framework using API call sequences. In: Sun, F., Li, T., Li, H. (eds.) Knowledge Engineering and Management. AISC, vol. 214, pp. 225–236. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-37832-4_21
Qiao, Y., Yang, Y., Ji, L., He, J.: Analyzing malware by abstracting the frequent itemsets in API call sequences. In: Proceedings of TrustCom, pp. 265–270 (2013)
Ventura, S., Luna, J.M.: Supervised Descriptive Pattern Mining. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98140-6
Schweizer, D., Zehnder, M., Wache, H., Witschel, H.F., Zanatta, D., Rodriguez, M.: Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants. In: Proceedings of ICMLA, pp. 1123–1129 (2015)
Sundarkumar, G.G., Ravi, V., Nwogu, I., Govindaraju, V.: Malware detection via API calls, topic models and machine learning. In: Proceedings of CASE, pp. 1212–1217 (2015)
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: Proceedings of ICACCI, pp. 2337–2342 (2014)
Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41:1–41:40 (2017)
Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent PE-malware detection system based on association mining. J. Comput. Virol. 4(4), 323–334 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Nawaz, M.S., Fournier-Viger, P., Nawaz, M.Z., Chen, G., Wu, Y. (2021). Metamorphic Malware Behavior Analysis Using Sequential Pattern Mining. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-93733-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93732-4
Online ISBN: 978-3-030-93733-1
eBook Packages: Computer ScienceComputer Science (R0)