Skip to main content
Log in

Graph-based malware detection using dynamic analysis

  • Original paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aspack software. http://www.aspack.com/asprotect.html, Accessed 5 August 2010

  2. Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML’04, p. 6. ACM, New York (2004)

  3. Ben-Hur, A.: Pyml: machine learning in python. http://pyml.sourceforge.net/, Accessed 28 July 2010

  4. Bishop C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)

    Google Scholar 

  5. Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Bschkes, R., Laskov, P. (eds.) Detection of Intrusions and Malware and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 4064, pp. 129–143. Springer, Berlin (2006)

  6. Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)

    Article  Google Scholar 

  7. Cardie, C., Nowe, N.: Improving minority class prediction using case-specific feature weights. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML’97, pp. 57–65. Morgan Kaufmann Publishers Inc, San Francisco (1997)

  8. Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, vol. 107, AusPDC ’10, pp. 61–70. Australian Computer Society Inc, Darlinghurst (2010)

  9. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: In Proceedings of the 12th USENIX Security Symposium, pp. 169–186 (2003)

  10. Chung, F.R.K.: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). American Mathematical Society, Providence (1997)

  11. Dai J., Guha R., Lee J.: Efficient virus detection using dynamic instruction sequences. J. Comput. 4(5), 405–414 (2009)

    Google Scholar 

  12. Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM conference on Computer and communications security, CCS ’08, pp. 51–62. ACM, New York (2008)

  13. UPX: The Ultimate Packer for eXecutables. http://upx.sourceforge.net/, Accessed 16 August 2010

  14. Hotelling H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)

    Article  Google Scholar 

  15. Hu, X., Chiueh, T.-c., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS’09, pp. 611–620. ACM, New York (2009)

  16. Lee, Y.J., Mangasarian, O.L.: Rsvm: reduced support vector machines. In: Data Mining Institute, Computer Sciences Department, University of Wisconsin, pp. 00–07 (2001)

  17. Karim Md, Walenstein A., Lakhotia A., Parida L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1, 13–23 (2005)

    Article  Google Scholar 

  18. Kashima H., Tsuda K., Inokuchi A.: Kernels for Graphs. MIT Press, Massachusetts (2004)

    Google Scholar 

  19. Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM, New York (2004)

  20. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, vol. 3858, pp. 207–226. Springer, Berlin (2006)

  21. Lawton, K., Denney, B., Guarneri, N.D., Ruppert, V., Bothamy, C.: Bochs user manual. Online user manual, November 2010

  22. Luxburg U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  23. Microsoft, Inc. IsDebuggerPresent function. http://msdn.microsoft.com/en-us/library/ms680345(VS.85).aspx, October 2010

  24. Organisation for Economic Co-operation and Development. Malicious software (malware): A security threat to the internet economy. White Paper, June 2008

  25. Panda Security. Panda labs annual report 2009. White Paper, January 2010

  26. Quist, D., Liebrock, L., Neil, J.: Improving antivirus accuracy with hypervisor assisted analysis. J. Comput. Virol 1–11 (2010). doi:10.1007/s11416-010-0142-4

  27. Reddy, D., Dash, S., Pujari, A.: New malicious code detection using variable length n-grams. In: Information Systems Security. Lecture Notes in Computer Science, vol. 4332, pp. 276–288. Springer, Berlin (2006)

  28. Reddy D., Pujari A.: N-gram analysis for computer virus detection. J. Comput. Virol. 2, 231–239 (2006)

    Article  Google Scholar 

  29. Rieck, K., Holz, T., Willems, C., Dssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed) Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 108–125. Springer, Berlin (2008)

  30. Wang, K., Stolflo, S.J., Li, W.J.: Fileprint analysis for malware detection. In: ACM CCS WORM (2005)

  31. Schölkopf B., Smola A.J.: Learning with Kernels. MIT Press, Massachusetts (2002)

    Google Scholar 

  32. Shafiq, M., Khayam, S., Farooq, M.: Embedded malware detection using markov n-grams. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 88–107. Springer, Berlin (2008)

  33. Shankarapani, M., Ramamoorthy, S., Movva, R., Mukkamala, S.: Malware detection using assembly and api call sequences. J. Comput. Virol. pp. 1–13 (2010). doi:10.1007/s11416-010-0141-5

  34. RDGMax Software. RDG Tejon Crypter. Software package, November 2010

  35. Sonnenburg, S., Raetsch, G., Schaefer, C.: A general and efficient multiple kernel learning algorithm (2006)

  36. Stolfo, S., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection. Advances in Information Security, vol. 27, pp. 231–249. Springer, Berlin (2007)

  37. Wagner, C., Wagener, G., State, R., Engel, T.: Malware analysis with graph kernels and support vector machines. In: Malicious and Unwanted Software (MALWARE), 2009 4th International Conference, pp. 63–68 (2009)

  38. Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware (2008)

  39. Li, T., Ye, Y., Wang, D., Ye, D.: Imds: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Blake Anderson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anderson, B., Quist, D., Neil, J. et al. Graph-based malware detection using dynamic analysis. J Comput Virol 7, 247–258 (2011). https://doi.org/10.1007/s11416-011-0152-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-011-0152-x

Keywords

Navigation