Skip to main content
Log in

Graph embedding as a new approach for unknown malware detection

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Malware is any type of computer program which is developed to harm computers, networks, and information. Noticeable growth of malware development has made computer and network security a significant and challenging area in recent years. There is an intensive competition between malwares and antiviruses. Malware authors make every effort to develop new harmful codes using various programming tricks and exploits which are unseen for detection techniques. On the other hand, antivirus developers upgrade their methods and algorithms to recognize unknown malware. Therefore, an accurate and rapid detection method is an irrefutable demand in computer security area. This paper proposes a new malware detection method based on the OpCodes within an executable file. Proposed method generates a graph of operational codes (OpCode) within an executable file and then embeds this graph into eigenspace using “Power Iteration” method. This will help us represent an executable file as a linear combination of eigenvectors proportionate to their eigenvalues, which is beneficial to train machine learning classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The main advantages of our proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://vxheaven.org.

  2. http://www.eset.com.

  3. http://www.kaspersky.com.

  4. In order to access send an email to h-hashemi@cse.shirazu.ac.ir.

  5. https://www.aldeid.com/wiki/PEiD.

References

  1. Elhadi, A.A.E., Maarof, M.A., Osman, A.H.: Information Assurance and Security Research Group, Faculty of Computer Science and Infor.: Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9(3), 283–288 (2012)

    Article  Google Scholar 

  2. Bazrafshan, Z., Hashemi, H., Mehdi, S., Fard, H., Hamzeh, A., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. Inf. Knowl. Technol. (IKT) 2, 113–120 (2013)

    Google Scholar 

  3. Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)

    Article  MathSciNet  Google Scholar 

  4. Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), pp. 74–79 (2009)

  5. Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. 1(2), 156–168 (2007)

    Google Scholar 

  6. Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–342. IEEE (2014)

  7. Uppal, D., Sinha, R.: Exploring behavioral aspects of API calls for malware identification and categorization. In: Networks (CICN), vol. 2014 (2014)

  8. Sundarkumar, G., Ravi, V.: Malware detection via API calls, topic models and machine learning. In: (CASE), 2015 IEEE (2015)

  9. Fan, C.I., Hsiao, H.W., Chou, C.H., Tseng, Y.F.: Malware detection systems based on API log data mining. In: 2015 IEEE 39th Annual Computer Software and Application Conference (COMPSAC), pp. 255–60. IEEE (2015)

  10. Alam, S., Traore, I., Sogukpinar, I.: Annotated control flow graph for metamorphic malware detection. Comput. J. (2014)

  11. Cesare S., Xiang Y., Zhou W.: Control flow-based malware variant detection. In: IEEE Transactions on Dependable and Secure Computing, pp. 307–317. IEEE (2014)

  12. Wüchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proc. 9th ACM Symp. Information, Comput. Commun. Secur. - ASIA CCS ’14, pp. 271–282 (2014)

  13. Wüchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. (2015). arXiv:1502.01609

  14. Abou-assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)

  15. Canfora, G., Lorenzo, A.: Effectiveness of opcode ngrams for detection of multi family android malware. In: 10th International Conference on Availability, Reliability and Security (ARES), pp. 333–340. IEEE (2015)

  16. Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G.: Opcode-sequence-based semi-supervised unknown malware detection. Comput. Intell. Secur. Inf. Syst., pp. 50–57 (2011)

  17. Santos, I., Brezo, F., Nieves, J., Penya, Y.Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. Eng. Secur. Softw. Syst., pp. 35–43 (2010)

  18. Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220 (2011)

    Article  Google Scholar 

  19. Santos, I., Laorden, C., Bringas, P.G.P.: Collective classification for unknown malware detection. SECRYPT, pp. 251–256 (2011)

  20. Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011)

    Article  Google Scholar 

  21. Jalote, P., Jalote, P., Jalote, P.: An integrated approach to software engineering. Springer, NewYork (2005)

    MATH  Google Scholar 

  22. Mccabe, T.J.: A complexity measure. Softw. Eng. IEEE Trans. 4, 308–320 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  23. Wilhelm, R., Engblom, J., Ermedahl, A.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., vol. V, pp. 1–47 (2008)

  24. Allen, F.E.: Control flow analysis. ACM SIGPLAN Not. 5(7), 1–19 (1970)

    Article  Google Scholar 

  25. Zhao, Z.: A virus detection scheme based on features of Control Flow Graph. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943–947 (2011)

  26. Mitchell, T.M.: Machine learning and data mining over the past. vol. 42, no. 11 (1999)

  27. Breiman, L.: Bagging predictors. Mach. Learn. 140, 123–140 (1996)

    MATH  Google Scholar 

  28. Breu, F., Guggenbichler, S., Wollmann, J.: Random forests. Vasa, pp. 1–35 (2008)

  29. Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5–6 (1996)

  30. Arnold, W., Tesauro, G., Heights, Y.: Automatically generated Win32 heuristic virus detection. In: Proc. 2000 Int. virus Bull. Conf., no. September, pp. 51–60 (2000)

  31. Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)

  32. Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. 35(1), 1–20 (2010)

    Article  Google Scholar 

  33. Peng, H., Long, F., Ding C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

  34. Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012)

    Article  Google Scholar 

  35. Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)

    Article  Google Scholar 

  36. Shanmugam, G., Low, R.M., Stamp M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9(3), 159–170 (2013)

  37. Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2012)

    Google Scholar 

  38. Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3(2), 65–74 (2007)

    Article  Google Scholar 

  39. Dhavare, A., Low, R.M., Stamp, M.: Efficient cryptanalysis of homophonic substitution ciphers. Cryptologia 37(3), 250–281 (2013)

    Article  Google Scholar 

  40. Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)

  41. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)

    Article  Google Scholar 

  42. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001)

  43. Riesen, K., Bunke, H.: Graph classification and clustering based on vector space embedding. World Scientific, Singapore (2010)

  44. Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Brain Cogn. 52, 262 (2007)

  45. Chung, F.R.K.: Spectral Graph Theory, vol. 30. AMS Bookstore (1999)

  46. Hancock, E.R.: Structural graph matching using the EM algorithm and singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1120–1136 (2001)

    Article  Google Scholar 

  47. Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: ICPR (2), no. C, pp. 489–492 (2004)

  48. Robles-Kelly, A., Hancock, E.R.: A Riemannian approach to graph embedding. Pattern Recognit. 40(3), 1042–1056 (2007)

    Article  MATH  Google Scholar 

  49. Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. Pattern Anal. Mach. Intell. IEEE 10(5), 695–703 (1988)

    Article  MATH  Google Scholar 

  50. Luo, B., Wilson, R.C., Hancock, E.R., Wilson, R.C.: Spectral embedding of graphs. Pattern Recognit. 36(10), 2213–2230 (2003)

    Article  MATH  Google Scholar 

  51. Lin, F., Cohen, W.W.: Power iteration clustering. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 655–662 (2010)

  52. Harrington, P.: Machine Learning in Action, vol. 37, no. 3 (2012)

  53. Eskandari, M., Khorshidpour, Z., Hashemi, S.: HDM-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)

  54. Kohavi, R.: A study of cross-validation and bootstrap for accuracy esti-mation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Hamzeh.

Appendix

Appendix

In this part, in order to complete our experiments, we introduce some additional experimental results such as comparison on False Negative Ratio (FNR), False Positive Ratio (FPR), Precision and Recall. The comparisons illustrate the proposed method outperformed some previously develop methods based on OpCode patterns in [3, 53]. Figures 4 and 7 illustrate that the proposed method and Santos method follow similar trends in FNR and Recall metrics respectively. Moreover, the proposed method achieves approximately median result among rival methods. Figures 5 and 6 depict the ability of the proposed method to obtain best FPR and Precision respectively (Fig. 8). Also in order to see details of evaluation metric refer to Sect. 4D.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hashemi, H., Azmoodeh, A., Hamzeh, A. et al. Graph embedding as a new approach for unknown malware detection. J Comput Virol Hack Tech 13, 153–166 (2017). https://doi.org/10.1007/s11416-016-0278-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-016-0278-y

Keywords

Navigation