Skip to main content
Log in

Detecting machine-morphed malware variants via engine attribution

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

One method malware authors use to defeat detection of their programs is to use morphing engines to rapidly generate a large number of variants. Inspired by previous works in author attribution of natural language text, we investigate a problem of attributing a malware to a morphing engine. Specifically, we present the malware engine attribution problem and formally define its three variations: MVRP, DENSITY and GEN, that reflect the challenges malware analysts face nowadays. We design and implement heuristics to address these problems and show their effectiveness on a set of well-known malware morphing engines and a real-world malware collection reaching detection accuracies of 96 % and higher. Our experiments confirm the applicability of the proposed approach in practice and indicate that engine attribution may offer a viable enhancement of current defenses against malware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. A detailed definition of an NDTM and of a polynomial time NDTM can be found in [63].

  2. The problem of finding an \({\varvec{\delta }}_{1,\alpha } \in S^{\alpha }\) such that \(||{\varvec{\delta }}_{1,\alpha }||_1=\beta \) is an NP-complete one. In fact, computing any \(\alpha \)-tuple \(x\) whose \(||.||_1\) equals a fixed \(\beta \) is an instance of the Subset Sum problem and is hence NP-complete [17]. In practice, one may want to choose to use a polynomial time approximation scheme for computing each of the \(G_{i,k}(\alpha , \beta )\).

  3. Euclidian norm shows a vector magnitude and in a given context allows to measure a difference between vectors.

References

  1. Abou-Assaleh, T., Cercone, N., Kešelj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: 28th Annual IEEE International Computer Software and Applications Conference, pp. 41–42 (2004)

  2. Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)

    Article  Google Scholar 

  3. Babić, D., Reynaud, D., Song, D.: Malware analysis with tree automata inference. In: Proceedings of the 23rd International Conference on Computer Aided Verification (CAV), pp. 116–131. Snowbird, UT (2011)

  4. Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. Forensics 1(2), 156–168 (2007)

    Article  Google Scholar 

  5. Bonfante, G., Kaczmarek, M., Marion, J.Y.: Architecture of a morphological malware detector. J. Comput. Virol. 5(3), 263–270 (2009)

    Article  Google Scholar 

  6. Borello, J.M., Me, L.: Code obfuscation techniques for metamorphic viruses. J. Comput. Virol. 4, 211–220 (2008)

    Article  Google Scholar 

  7. Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of International Symposium on Secure Software Engineering. IEEE (2006)

  8. Chouchane, M.R., Lakhotia, A.: Using engine signature to detect metamorphic malware. In: 4th Workshop on Recurring Malcode (WORM) (2006)

  9. Chouchane, M.R., Walenstein, A., Lakhotia, A.: Statistical signatures for fast filtering of instruction-substituting metamorphic malware. In: 5th Workshop on Recurring Malcode (WORM) (2007)

  10. Chouchane, M.R., Walenstein, A., Lakhotia, A.: Using Markov chains to filter machine-morphed variants of malicious programs. In: Proceedings of the 3rd International Conference on Malicious and Unwanted Software (Malware’08) (2008)

  11. Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy S &P, pp. 32–46 (2005)

  12. Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Department of Computer Science, The University of Wisconsin, Technical Report (2005)

  13. Detristan, T., Ulenspiegel, T., Malcom, Y., Underduk, M.S.V.: Polymorphic shellcode engine using spectrum analysis. Phrack 61 (2003)

  14. Egele, M., Wurzinger, P., Kruegel, C., Kirda, E.: Defending browsers against drive-by downloads: mitigating heap-spraying code injection attacks. In: Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA ’09, pp. 88–106. Springer, Berlin (2009)

  15. Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the 15 th USENIX Security, Symposium, pp. 241–256 (2006)

  16. Frantzeskou, G., Gritzalis, S., Macdonell, S.G.: Source code authorship analysis for supporting the cybercrime investigation process. In: Proceedings of 1st International Conference on e-Business and Telecommunications, Networks (ICETE04), vol. 2, pp. 85–92 (2004)

  17. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)

    MATH  Google Scholar 

  18. Gavrilova, M.L., Yampolskiy, R.V.: Applying biometric principles to avatar recognition. In: Proceedings of the 2010 International Conference on Cyberworlds, CW ’10, pp. 179–186. IEEE Computer Society, Washington, DC, USA (2010)

  19. Griffin, K., Schneider, S., Hu, X., cker Chiueh, T.: Automatic generation of string signatures for malware detection. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, pp. 101–120. Springer, Berlin (2009)

    Chapter  Google Scholar 

  20. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11 (2009)

  21. Han, E.H., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’00, pp. 424–431. Springer, London, UK (2000)

  22. Hayes, J.H., Offutt, J.: Recognizing authors: an examination of the consistent programmer hypothesis. Softw. Test. Verif. Reliab. (2009)

  23. Holmes, D.: Authorship attribution. Comput. Humanit. 28, 87–106 (1994). doi:10.1007/BF01830689

  24. Holzer, A., Kinder, J., Veith, H.: Using verification technology to specify and detect malware. In: 11th International Conference on Computer Aided Systems Theory (2007)

  25. Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. J. Comput. Virol. 4(3), 251–266 (2008)

    Article  Google Scholar 

  26. K2: Admmutate. http://www.pestpatrol.com/zks/pestinfo/a/admmutate.asp (2005)

  27. Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. Eur. Res. J. Comput. Virol. 1(1–2), 13–23 (2005)

    Article  Google Scholar 

  28. Kennedy, D., O’Gorman, J., Kearns, D., Aharoni, M.: Metasploit: The Penetration Tester’s Guide. No Starch Press, USA (2011)

    Google Scholar 

  29. Kephart, J.O., Arnold, W.C.: Automatic extraction of computer virus signatures. Virus Bull (1994)

  30. Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: 6th Conference of the Pacific Association for, Computational Linguistics, pp. 256–264 (2003)

  31. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)

    MATH  MathSciNet  Google Scholar 

  32. Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8, 1261–1276 (2007)

    MATH  Google Scholar 

  33. Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. (1996)

  34. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Proceedings of the 8th Symposium on Recent Advances in Intrusion Detection (RAID’2005). Lecture Notes in Computer Science. Springer, Berlin (2005)

  35. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Proceedings of the 8th International Conference on Recent Advances in Intrusion Detection, RAID’05, pp. 207–226. Springer, Berlin (2006)

  36. Lakhotia, A., Kumar, E.U., Venable, M.: A method for detecting obfuscated calls in malicious binaries. IEEE Trans. Softw. Eng. 31(11), 955–968 (2005)

    Article  Google Scholar 

  37. Lakhotia, A., Mohammed, M.: Imposing order on program statements to assist anti-virus scanners. In: Proceedings of the 11th Working Conference on Reverse, Engineering (2004)

  38. Lakhotia, A., Singh, P.K.: Challenges in getting ’formal’ with viruses. Virus Bull. (2003)

  39. Layton, R., Watters, P., Dazeley, R.: Unsupervised authorship analysis of phishing webpages. In: 2012 International Symposium on Communications and Information Technologies (ISCIT), pp. 1104–1109 (2012)

  40. Leder, F., Steinbock, B., Martini, P.: Classification and detection of metamorphic malware using value set analysis. In: 2009 4th International Conference on Malicious and Unwanted Software MALWARE, pp. 39–46. IEEE (2009)

  41. Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Information Assurance Workshop (2005)

  42. Li, Z., Sanghi, M., Chen, Y., Kao, M.Y., Chavez, B.: Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience. In: 2006 IEEE Symposium on Security and Privacy, pp. 15–47 (2006)

  43. Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)

    Article  Google Scholar 

  44. Lo, R.W., Levitt, K.N., Olsson, R.A.: Mcf: A malicious code filter. Comput. Secur. 14, 541–566 (1995)

    Article  Google Scholar 

  45. Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 5(2), 40–45 (2007)

    Article  Google Scholar 

  46. Mathur, R., Maida, A., Palmer, C.E.: Normalizing metamorphic malware using term rewriting. In: Proceedings of the 6th IEEE International Workshop on Source Code Analysis and Manipulation (SCAM ’06), pp. 75–84. Hill (2006)

  47. Menahem, E., Shabtai, A., Rokach, L., Elovici, Y.: Improving malware detection by applying multi-inducer ensemble. Comput. Stat. Data Anal. 53(4), 1483–1494 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  48. Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer, London (1993)

    Book  MATH  Google Scholar 

  49. Mitchell, T.M.: Machine Learning. McGraw-Hill, USA (1997)

    MATH  Google Scholar 

  50. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 23rd Annual Computer Security Applications Conference (2007)

  51. NGVCK: Ngvck download page. VXheavens-Virus eXchange Website. http://vx.netlux.org/vx.php?id=tn02

  52. Paleari, R., Martignoni, L., Fresi, G., Bruschi, R.D.: A fistful of red-pills: how to automatically generate procedures to detect cpu emulators. In: Proceedings of the USENIX Workshop on Offensive Technologies (WOOT) (2009)

  53. Payer, U., Teufl, P., Lamberger, M.: Hybrid engine for polymorphic shellcode detection. In: Proceedings of the Second International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA’05, pp. 19–31. Springer, Berlin (2005)

  54. Polychronakis, M., Anagnostakis, K.G., Markatos, E.P.: Network-level polymorphic shellcode detection using emulation. In: Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), pp. 54–73 (2006)

  55. Polychronakis, M., Anagnostakis, K.G., Markatos, E.P.: Comprehensive shellcode detection using runtime heuristics. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pp. 287–296. ACM, New York, NY, USA (2010)

  56. Preda, M.D., Christodorescu, M., Jha, S., Debray, S.: A semantics-based approach to malware detection. ACM Trans. Program. Lang. Syst. 30(5) (2008)

  57. Raffetseder, T., Kruegel, C., Kirda, E.: Detecting System Emulators. In: 10th Information Security Conference (ISC) (2007)

  58. Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The smart retrieval system: experiments in automatic document processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  59. Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Proceedings of the 16th European Conference on Research in Computer Security, ESORICS’11, pp. 172–189. Springer, Berlin (2011). http://dl.acm.org/citation.cfm?id=2041225.2041239

  60. Shafiq, Z., Khayam, S.A., Farooq, M.: Embedded malware detection using Markov n-grams. Lect. Notes Comput. Sci. 5137, 88–107 (2008)

    Article  Google Scholar 

  61. Shaner, R.A.: Patent 5991714: method of identifying data type and locating in a file (1999)

  62. Singh, P., Lakhotia, A.: Static verification of worm and virus behaviour in binary executables using model checking. In: Proceedings of the 4th IEEE Information Assurance Workshop, pp. 298–300. IEEE Computer Society, Los Alamitos, CA, USA (2003)

  63. Sipser, M.: Introduction to the theory of computation. PWS (1997)

  64. Song, Y., Locasto, M.E., Stavrou, A., Keromytis, A.D., Stolfo, S.J.: On the infeasibility of modeling polymorphic shellcode. Mach. Learn. 81, 179–205 (2010)

    Article  MathSciNet  Google Scholar 

  65. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol., pp. 538–556 (2009)

  66. Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)

    Article  Google Scholar 

  67. Symantec: Global Internet Security Threat Report (2009)

  68. Ször, P.: The Art of Computer Virus Research and Defense, 1st edn. Symantec Press, Addison Wesley Professional, Reading (2005)

    Google Scholar 

  69. Tabish, M., Shafiq, Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on Cyber Security and Intelligence Informatics, pp. 23–31 (2009)

  70. Tang, Y., Chen, S.: Defending against internet worms: a signature-based approach. In: INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 2, pp. 1384–1394 (2005)

  71. Tang, Y., Xiao, B., Lu, X.: Signature tree generation for polymorphic worms. IEEE Trans. Comput. 60(4), 565–579 (2011)

    Article  MathSciNet  Google Scholar 

  72. Team, M.D.: Metasploit Project. http://www.metasploit.com (2006)

  73. Toth, T., Kruegel, C.: Accurate buffer overflow detection via abstract payload execution. In: Proceedings of the Recent Advances in Intrusion Detection, RAID, pp. 274–291 (2002)

  74. Triumphant, Inc.: The world-wide malware signature counter (2010). http://www.triumfant.com/Signature_Counter.asp

  75. VCL: Vcl download page. VXheavens: Virus eXchange Website. http://vx.netlux.org/vx.php?id=tv03

  76. VX heavens. http://vx.netlux.org

  77. Walenstein, A., Mathur, R., Chouchane, M.R., Lakhotia, A.: Constructing malware normalizers using term rewriting. J. Comput. Virol. (2008). doi:10.1007/s11416-008-0081-5

  78. Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proceedings of Black Hat Briefings. Black Hat (2007)

  79. Wang, X., Chan Jhi, Y., Zhu, S., Liu, P.: Still: Exploit code detection via static taint and initialization analyses. In: Proceedings of the Computer Security Applications Conference, ACSAC, pp. 289–298. IEEE Computer Society (2008)

  80. Wang, X., Pan, C.C., Liu, P., Zhu, S.: Sigfree: a signature-free buffer overflow attack blocker. In: Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15. USENIX Association, Berkeley, CA, USA (2006)

  81. Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)

    Article  Google Scholar 

  82. Z0mbie: some ideas about metamorphism. http://vx.netlux.org/lib/vzo20.html

  83. Zhou, Y., Inge, M.: Malware detection using adaptive data compression. In: AISec ’08: Proceedings of the 1st ACM Workshop on Workshop on AISec, pp. 53–60 (2008)

Download references

Acknowledgments

This material is based upon work supported by the Air Force Office of Scientific Research under Award No. FA9550-09-1-0715. The authors would like to thank Edna Milgo and Sushma Vallabhaneni for their assistance in conducting the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Stakhanova.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chouchane, R., Stakhanova, N., Walenstein, A. et al. Detecting machine-morphed malware variants via engine attribution. J Comput Virol Hack Tech 9, 137–157 (2013). https://doi.org/10.1007/s11416-013-0183-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-013-0183-6

Keywords

Navigation