Graph embedding as a new approach for unknown malware detection

Hashemi, Hashem; Azmoodeh, Amin; Hamzeh, Ali; Hashemi, Sattar

doi:10.1007/s11416-016-0278-y

Graph embedding as a new approach for unknown malware detection

Original Paper
Published: 19 May 2016

Volume 13, pages 153–166, (2017)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Hashem Hashemi¹,
Amin Azmoodeh¹,
Ali Hamzeh¹ &
…
Sattar Hashemi¹

1785 Accesses
72 Citations
Explore all metrics

Abstract

Malware is any type of computer program which is developed to harm computers, networks, and information. Noticeable growth of malware development has made computer and network security a significant and challenging area in recent years. There is an intensive competition between malwares and antiviruses. Malware authors make every effort to develop new harmful codes using various programming tricks and exploits which are unseen for detection techniques. On the other hand, antivirus developers upgrade their methods and algorithms to recognize unknown malware. Therefore, an accurate and rapid detection method is an irrefutable demand in computer security area. This paper proposes a new malware detection method based on the OpCodes within an executable file. Proposed method generates a graph of operational codes (OpCode) within an executable file and then embeds this graph into eigenspace using “Power Iteration” method. This will help us represent an executable file as a linear combination of eigenvectors proportionate to their eigenvalues, which is beneficial to train machine learning classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM). The main advantages of our proposed method are high detection rate despite utilizing simple classifiers like KNN, acceptable computational complexity even in large scale datasets against rival methods, and low false positive rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying NLP techniques to malware detection in a practical environment

Article Open access 06 June 2021

Large language models and unsupervised feature learning: implications for log analysis

Article 04 April 2024

An overview of vulnerability assessment and penetration testing techniques

Article 28 November 2014

Notes

http://vxheaven.org.
http://www.eset.com.
http://www.kaspersky.com.
In order to access send an email to h-hashemi@cse.shirazu.ac.ir.
https://www.aldeid.com/wiki/PEiD.

References

Elhadi, A.A.E., Maarof, M.A., Osman, A.H.: Information Assurance and Security Research Group, Faculty of Computer Science and Infor.: Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9(3), 283–288 (2012)
Article Google Scholar
Bazrafshan, Z., Hashemi, H., Mehdi, S., Fard, H., Hamzeh, A., Fard, S.M.H., Hamzeh, A.: A survey on heuristic malware detection techniques. Inf. Knowl. Technol. (IKT) 2, 113–120 (2013)
Google Scholar
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)
Article MathSciNet Google Scholar
Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on computer and internet security (IITKHACK’09), pp. 74–79 (2009)
Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. 1(2), 156–168 (2007)
Google Scholar
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–342. IEEE (2014)
Uppal, D., Sinha, R.: Exploring behavioral aspects of API calls for malware identification and categorization. In: Networks (CICN), vol. 2014 (2014)
Sundarkumar, G., Ravi, V.: Malware detection via API calls, topic models and machine learning. In: (CASE), 2015 IEEE (2015)
Fan, C.I., Hsiao, H.W., Chou, C.H., Tseng, Y.F.: Malware detection systems based on API log data mining. In: 2015 IEEE 39th Annual Computer Software and Application Conference (COMPSAC), pp. 255–60. IEEE (2015)
Alam, S., Traore, I., Sogukpinar, I.: Annotated control flow graph for metamorphic malware detection. Comput. J. (2014)
Cesare S., Xiang Y., Zhou W.: Control flow-based malware variant detection. In: IEEE Transactions on Dependable and Secure Computing, pp. 307–317. IEEE (2014)
Wüchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proc. 9th ACM Symp. Information, Comput. Commun. Secur. - ASIA CCS ’14, pp. 271–282 (2014)
Wüchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. (2015). arXiv:1502.01609
Abou-assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
Canfora, G., Lorenzo, A.: Effectiveness of opcode ngrams for detection of multi family android malware. In: 10th International Conference on Availability, Reliability and Security (ARES), pp. 333–340. IEEE (2015)
Santos, I., Sanz, B., Laorden, C., Brezo, F., Bringas, P.G.: Opcode-sequence-based semi-supervised unknown malware detection. Comput. Intell. Secur. Inf. Syst., pp. 50–57 (2011)
Santos, I., Brezo, F., Nieves, J., Penya, Y.Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. Eng. Secur. Softw. Syst., pp. 35–43 (2010)
Santos, I., Brezo, F., Sanz, B., Laorden, C., Bringas, P.P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220 (2011)
Article Google Scholar
Santos, I., Laorden, C., Bringas, P.G.P.: Collective classification for unknown malware detection. SECRYPT, pp. 251–256 (2011)
Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. J. Comput. Virol. 7(4), 247–258 (2011)
Article Google Scholar
Jalote, P., Jalote, P., Jalote, P.: An integrated approach to software engineering. Springer, NewYork (2005)
MATH Google Scholar
Mccabe, T.J.: A complexity measure. Softw. Eng. IEEE Trans. 4, 308–320 (1976)
Article MathSciNet MATH Google Scholar
Wilhelm, R., Engblom, J., Ermedahl, A.: The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., vol. V, pp. 1–47 (2008)
Allen, F.E.: Control flow analysis. ACM SIGPLAN Not. 5(7), 1–19 (1970)
Article Google Scholar
Zhao, Z.: A virus detection scheme based on features of Control Flow Graph. In: 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943–947 (2011)
Mitchell, T.M.: Machine learning and data mining over the past. vol. 42, no. 11 (1999)
Breiman, L.: Bagging predictors. Mach. Learn. 140, 123–140 (1996)
MATH Google Scholar
Breu, F., Guggenbichler, S., Wollmann, J.: Random forests. Vasa, pp. 1–35 (2008)
Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural networks for computer virus recognition. IEEE Expert 11(4), 5–6 (1996)
Arnold, W., Tesauro, G., Heights, Y.: Automatically generated Win32 heuristic virus detection. In: Proc. 2000 Int. virus Bull. Conf., no. September, pp. 51–60 (2000)
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, no. 1, pp. 41–42 (2004)
Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. 35(1), 1–20 (2010)
Article Google Scholar
Peng, H., Long, F., Ding C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inf. 1(1), 1 (2012)
Article Google Scholar
Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)
Article Google Scholar
Shanmugam, G., Low, R.M., Stamp M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9(3), 159–170 (2013)
Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2012)
Google Scholar
Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3(2), 65–74 (2007)
Article Google Scholar
Dhavare, A., Low, R.M., Stamp, M.: Efficient cryptanalysis of homophonic substitution ciphers. Cryptologia 37(3), 250–281 (2013)
Article Google Scholar
Jidigam, R.K., Austin, T.H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11(4), 203–216 (2015)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2001)
Riesen, K., Bunke, H.: Graph classification and clustering based on vector space embedding. World Scientific, Singapore (2010)
Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Brain Cogn. 52, 262 (2007)
Chung, F.R.K.: Spectral Graph Theory, vol. 30. AMS Bookstore (1999)
Hancock, E.R.: Structural graph matching using the EM algorithm and singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1120–1136 (2001)
Article Google Scholar
Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: ICPR (2), no. C, pp. 489–492 (2004)
Robles-Kelly, A., Hancock, E.R.: A Riemannian approach to graph embedding. Pattern Recognit. 40(3), 1042–1056 (2007)
Article MATH Google Scholar
Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. Pattern Anal. Mach. Intell. IEEE 10(5), 695–703 (1988)
Article MATH Google Scholar
Luo, B., Wilson, R.C., Hancock, E.R., Wilson, R.C.: Spectral embedding of graphs. Pattern Recognit. 36(10), 2213–2230 (2003)
Article MATH Google Scholar
Lin, F., Cohen, W.W.: Power iteration clustering. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 655–662 (2010)
Harrington, P.: Machine Learning in Action, vol. 37, no. 3 (2012)
Eskandari, M., Khorshidpour, Z., Hashemi, S.: HDM-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy esti-mation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shiraz University, Shiraz, Iran
Hashem Hashemi, Amin Azmoodeh, Ali Hamzeh & Sattar Hashemi

Authors

Hashem Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Amin Azmoodeh
View author publications
You can also search for this author in PubMed Google Scholar
Ali Hamzeh
View author publications
You can also search for this author in PubMed Google Scholar
Sattar Hashemi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Hamzeh.

Appendix

In this part, in order to complete our experiments, we introduce some additional experimental results such as comparison on False Negative Ratio (FNR), False Positive Ratio (FPR), Precision and Recall. The comparisons illustrate the proposed method outperformed some previously develop methods based on OpCode patterns in [3, 53]. Figures 4 and 7 illustrate that the proposed method and Santos method follow similar trends in FNR and Recall metrics respectively. Moreover, the proposed method achieves approximately median result among rival methods. Figures 5 and 6 depict the ability of the proposed method to obtain best FPR and Precision respectively (Fig. 8). Also in order to see details of evaluation metric refer to Sect. 4D.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hashemi, H., Azmoodeh, A., Hamzeh, A. et al. Graph embedding as a new approach for unknown malware detection. J Comput Virol Hack Tech 13, 153–166 (2017). https://doi.org/10.1007/s11416-016-0278-y

Download citation

Received: 12 June 2015
Accepted: 03 May 2016
Published: 19 May 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11416-016-0278-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph embedding as a new approach for unknown malware detection

Abstract

Access this article

Similar content being viewed by others

Applying NLP techniques to malware detection in a practical environment

Large language models and unsupervised feature learning: implications for log analysis

An overview of vulnerability assessment and penetration testing techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph embedding as a new approach for unknown malware detection

Abstract

Access this article

Similar content being viewed by others

Applying NLP techniques to malware detection in a practical environment

Large language models and unsupervised feature learning: implications for log analysis

An overview of vulnerability assessment and penetration testing techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation