Skip to main content

Metamorphic Malware Behavior Analysis Using Sequential Pattern Mining

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Application Programming Interface (API) calls in windows operating system (OS) is an attractive feature for malware analysis and detection as they can properly reflect the actions of portable executable (PE) files. In this paper, we provide an approach based on sequential pattern mining (SPM) for the analysis of malware behavior during executions. A dataset that contains sequences of API calls made by different malware on Windows OS is first abstracted into a suitable format (sequences of integers). SPM algorithms are then used on the corpus to find frequent API calls and their patterns. Moreover, sequential rules between API calls patterns as well as maximal and closed frequent API calls are discovered. Obtained preliminary results suggest that discovered frequent patterns of API calls and sequential rules between them can be used in the development of malware detection and classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.legaljobs.io/blog/malware-statistics.

  2. 2.

    http://mcafee.com/enterprise/en-us/lp/threats-reports/apr-2021.html.

  3. 3.

    http://gs.statcounter.com/os-market-share/desktop/worldwide.

References

  1. Abouelhoda, M., Ghanem, M.: String mining in bioinformatics. In: Gaber, M. (ed.) Scientific Data Mining and Knowledge Discovery, pp. 207–247. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02788-8_9

    Chapter  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB, pp. 487–499 (1994)

    Google Scholar 

  3. Ahmadi, M., Sami, A., Rahimi, H., Yadegari, B.: Malware detection by behavioural sequential patterns. Comput. Fraud Secur. 2013(8), 11–19 (2013)

    Article  Google Scholar 

  4. Çatak, F.Ö., Yazi, A.F.: A benchmark API call dataset for windows PE malware classification. CoRR, abs/1905.01999 (2019)

    Google Scholar 

  5. Çatak, F.Ö., Yazi, A.F., Elezaj, O., Ahmed, J.: Deep learning based sequential model for malware analysis using Windows exe API calls. Peer J. Comput. Sci. 6, e285 (2020)

    Article  Google Scholar 

  6. Cho, I.K., Im, E.G.: Extracting representative API patterns of malware families using multiple sequence alignments. In: Proceedings of RACS, pp. 308–313 (2015)

    Google Scholar 

  7. Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16–25 (2016)

    Article  Google Scholar 

  8. Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 40–52. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_4

    Chapter  Google Scholar 

  9. Fournier-Viger, P., Gomariz, A., Gueniche, T., Mwamikazi, E., Thomas, R.: TKS: efficient mining of Top-K sequential patterns. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013. LNCS (LNAI), vol. 8346, pp. 109–120. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53914-5_10

    Chapter  Google Scholar 

  10. Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using partially-ordered sequential rules to generate more accurate sequence prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 431–442. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35527-1_36

    Chapter  Google Scholar 

  11. Fournier-Viger, P., Gueniche, T., Zida, S., Tseng, V.S.: ERMiner: sequential rule mining using equivalence classes. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 108–119. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_10

    Chapter  Google Scholar 

  12. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8

    Chapter  Google Scholar 

  13. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recogn. 1(1), 54–77 (2017)

    Google Scholar 

  14. Fournier-Viger, P., Wu, C.-W., Gomariz, A., Tseng, V.S.: VMSP: efficient vertical mining of maximal sequential patterns. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 83–94. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_8

    Chapter  Google Scholar 

  15. Fumarola, F., Lanotte, P.F., Ceci, M., Malerba, D.: CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl. Inf. Syst. 48(2), 429–463 (2016)

    Article  Google Scholar 

  16. Griffin, K., Schneider, S., Hu, X., Chiueh, T.: Automatic generation of string signatures for malware detection. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 101–120. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04342-0_6

    Chapter  Google Scholar 

  17. Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)

    Article  Google Scholar 

  18. Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 11, 659101:1–659101:9 (2015)

    Google Scholar 

  19. Mustafa, R.U., Nawaz, M.S., Ferzund, J., Lali, M.I.U., Shahzad, B., Fournier-Viger, P.: Early detection of controversial Urdu speeches from social media. Data Sci. Pattern Recogn. 1(2), 26–42 (2017)

    Google Scholar 

  20. Nawaz, M.S., Fournier-Viger, P., Shojaee, A., Fujita, H.: Using artificial intelligence techniques for COVID-19 genome analysis. Appl. Intell. 51(5), 3086–3103 (2021)

    Article  Google Scholar 

  21. Nawaz, M.S., Fournier-Viger, P., Zhang, J.: Proof learning in PVS with utility pattern mining. IEEE Access 8, 119806–119818 (2020)

    Article  Google Scholar 

  22. Nawaz, M.S., Sun, M., Fournier-Viger, P.: Proof guidance in PVS with sequential pattern mining. In: Hojjat, H., Massink, M. (eds.) FSEN 2019. LNCS, vol. 11761, pp. 45–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31517-7_4

    Chapter  Google Scholar 

  23. Ni, L., Luo, W., Lu, N., Zhu, W.: Mining the local dependency itemset in a products network. ACM Trans. Manage. Inf. Syst. 11(1), 3:1–3:31 (2020)

    Google Scholar 

  24. Pektas, A., Pektas, E.N., Acarman, T.: Mining patterns of sequential malicious APIs to detect malware. Int. J. Netw. Secur. Appl. 10(4), 1–9 (2018)

    Google Scholar 

  25. Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using small sets of frequent part-of-speech skip-grams. In: Proceedings of FLAIRS, pp. 86–91 (2016)

    Google Scholar 

  26. Qiao, Y., Yang, Y., He, J., Tang, C., Liu, Z.: CBM: free, automatic malware analysis framework using API call sequences. In: Sun, F., Li, T., Li, H. (eds.) Knowledge Engineering and Management. AISC, vol. 214, pp. 225–236. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-37832-4_21

    Chapter  Google Scholar 

  27. Qiao, Y., Yang, Y., Ji, L., He, J.: Analyzing malware by abstracting the frequent itemsets in API call sequences. In: Proceedings of TrustCom, pp. 265–270 (2013)

    Google Scholar 

  28. Ventura, S., Luna, J.M.: Supervised Descriptive Pattern Mining. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98140-6

    Book  Google Scholar 

  29. Schweizer, D., Zehnder, M., Wache, H., Witschel, H.F., Zanatta, D., Rodriguez, M.: Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants. In: Proceedings of ICMLA, pp. 1123–1129 (2015)

    Google Scholar 

  30. Sundarkumar, G.G., Ravi, V., Nwogu, I., Govindaraju, V.: Malware detection via API calls, topic models and machine learning. In: Proceedings of CASE, pp. 1212–1217 (2015)

    Google Scholar 

  31. Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: Proceedings of ICACCI, pp. 2337–2342 (2014)

    Google Scholar 

  32. Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41:1–41:40 (2017)

    Google Scholar 

  33. Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent PE-malware detection system based on association mining. J. Comput. Virol. 4(4), 323–334 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nawaz, M.S., Fournier-Viger, P., Nawaz, M.Z., Chen, G., Wu, Y. (2021). Metamorphic Malware Behavior Analysis Using Sequential Pattern Mining. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93733-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93732-4

  • Online ISBN: 978-3-030-93733-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics