Abstract
One method malware authors use to defeat detection of their programs is to use morphing engines to rapidly generate a large number of variants. Inspired by previous works in author attribution of natural language text, we investigate a problem of attributing a malware to a morphing engine. Specifically, we present the malware engine attribution problem and formally define its three variations: MVRP, DENSITY and GEN, that reflect the challenges malware analysts face nowadays. We design and implement heuristics to address these problems and show their effectiveness on a set of well-known malware morphing engines and a real-world malware collection reaching detection accuracies of 96 % and higher. Our experiments confirm the applicability of the proposed approach in practice and indicate that engine attribution may offer a viable enhancement of current defenses against malware.
Similar content being viewed by others
Notes
A detailed definition of an NDTM and of a polynomial time NDTM can be found in [63].
The problem of finding an \({\varvec{\delta }}_{1,\alpha } \in S^{\alpha }\) such that \(||{\varvec{\delta }}_{1,\alpha }||_1=\beta \) is an NP-complete one. In fact, computing any \(\alpha \)-tuple \(x\) whose \(||.||_1\) equals a fixed \(\beta \) is an instance of the Subset Sum problem and is hence NP-complete [17]. In practice, one may want to choose to use a polynomial time approximation scheme for computing each of the \(G_{i,k}(\alpha , \beta )\).
Euclidian norm shows a vector magnitude and in a given context allows to measure a difference between vectors.
References
Abou-Assaleh, T., Cercone, N., Kešelj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: 28th Annual IEEE International Computer Software and Applications Conference, pp. 41–42 (2004)
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)
Babić, D., Reynaud, D., Song, D.: Malware analysis with tree automata inference. In: Proceedings of the 23rd International Conference on Computer Aided Verification (CAV), pp. 116–131. Snowbird, UT (2011)
Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digit. Forensics 1(2), 156–168 (2007)
Bonfante, G., Kaczmarek, M., Marion, J.Y.: Architecture of a morphological malware detector. J. Comput. Virol. 5(3), 263–270 (2009)
Borello, J.M., Me, L.: Code obfuscation techniques for metamorphic viruses. J. Comput. Virol. 4, 211–220 (2008)
Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of International Symposium on Secure Software Engineering. IEEE (2006)
Chouchane, M.R., Lakhotia, A.: Using engine signature to detect metamorphic malware. In: 4th Workshop on Recurring Malcode (WORM) (2006)
Chouchane, M.R., Walenstein, A., Lakhotia, A.: Statistical signatures for fast filtering of instruction-substituting metamorphic malware. In: 5th Workshop on Recurring Malcode (WORM) (2007)
Chouchane, M.R., Walenstein, A., Lakhotia, A.: Using Markov chains to filter machine-morphed variants of malicious programs. In: Proceedings of the 3rd International Conference on Malicious and Unwanted Software (Malware’08) (2008)
Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy S &P, pp. 32–46 (2005)
Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Department of Computer Science, The University of Wisconsin, Technical Report (2005)
Detristan, T., Ulenspiegel, T., Malcom, Y., Underduk, M.S.V.: Polymorphic shellcode engine using spectrum analysis. Phrack 61 (2003)
Egele, M., Wurzinger, P., Kruegel, C., Kirda, E.: Defending browsers against drive-by downloads: mitigating heap-spraying code injection attacks. In: Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA ’09, pp. 88–106. Springer, Berlin (2009)
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the 15 th USENIX Security, Symposium, pp. 241–256 (2006)
Frantzeskou, G., Gritzalis, S., Macdonell, S.G.: Source code authorship analysis for supporting the cybercrime investigation process. In: Proceedings of 1st International Conference on e-Business and Telecommunications, Networks (ICETE04), vol. 2, pp. 85–92 (2004)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco (1979)
Gavrilova, M.L., Yampolskiy, R.V.: Applying biometric principles to avatar recognition. In: Proceedings of the 2010 International Conference on Cyberworlds, CW ’10, pp. 179–186. IEEE Computer Society, Washington, DC, USA (2010)
Griffin, K., Schneider, S., Hu, X., cker Chiueh, T.: Automatic generation of string signatures for malware detection. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, pp. 101–120. Springer, Berlin (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11 (2009)
Han, E.H., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’00, pp. 424–431. Springer, London, UK (2000)
Hayes, J.H., Offutt, J.: Recognizing authors: an examination of the consistent programmer hypothesis. Softw. Test. Verif. Reliab. (2009)
Holmes, D.: Authorship attribution. Comput. Humanit. 28, 87–106 (1994). doi:10.1007/BF01830689
Holzer, A., Kinder, J., Veith, H.: Using verification technology to specify and detect malware. In: 11th International Conference on Computer Aided Systems Theory (2007)
Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. J. Comput. Virol. 4(3), 251–266 (2008)
K2: Admmutate. http://www.pestpatrol.com/zks/pestinfo/a/admmutate.asp (2005)
Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. Eur. Res. J. Comput. Virol. 1(1–2), 13–23 (2005)
Kennedy, D., O’Gorman, J., Kearns, D., Aharoni, M.: Metasploit: The Penetration Tester’s Guide. No Starch Press, USA (2011)
Kephart, J.O., Arnold, W.C.: Automatic extraction of computer virus signatures. Virus Bull (1994)
Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: 6th Conference of the Pacific Association for, Computational Linguistics, pp. 256–264 (2003)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8, 1261–1276 (2007)
Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. (1996)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Proceedings of the 8th Symposium on Recent Advances in Intrusion Detection (RAID’2005). Lecture Notes in Computer Science. Springer, Berlin (2005)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Proceedings of the 8th International Conference on Recent Advances in Intrusion Detection, RAID’05, pp. 207–226. Springer, Berlin (2006)
Lakhotia, A., Kumar, E.U., Venable, M.: A method for detecting obfuscated calls in malicious binaries. IEEE Trans. Softw. Eng. 31(11), 955–968 (2005)
Lakhotia, A., Mohammed, M.: Imposing order on program statements to assist anti-virus scanners. In: Proceedings of the 11th Working Conference on Reverse, Engineering (2004)
Lakhotia, A., Singh, P.K.: Challenges in getting ’formal’ with viruses. Virus Bull. (2003)
Layton, R., Watters, P., Dazeley, R.: Unsupervised authorship analysis of phishing webpages. In: 2012 International Symposium on Communications and Information Technologies (ISCIT), pp. 1104–1109 (2012)
Leder, F., Steinbock, B., Martini, P.: Classification and detection of metamorphic malware using value set analysis. In: 2009 4th International Conference on Malicious and Unwanted Software MALWARE, pp. 39–46. IEEE (2009)
Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Information Assurance Workshop (2005)
Li, Z., Sanghi, M., Chen, Y., Kao, M.Y., Chavez, B.: Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience. In: 2006 IEEE Symposium on Security and Privacy, pp. 15–47 (2006)
Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)
Lo, R.W., Levitt, K.N., Olsson, R.A.: Mcf: A malicious code filter. Comput. Secur. 14, 541–566 (1995)
Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 5(2), 40–45 (2007)
Mathur, R., Maida, A., Palmer, C.E.: Normalizing metamorphic malware using term rewriting. In: Proceedings of the 6th IEEE International Workshop on Source Code Analysis and Manipulation (SCAM ’06), pp. 75–84. Hill (2006)
Menahem, E., Shabtai, A., Rokach, L., Elovici, Y.: Improving malware detection by applying multi-inducer ensemble. Comput. Stat. Data Anal. 53(4), 1483–1494 (2009)
Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer, London (1993)
Mitchell, T.M.: Machine Learning. McGraw-Hill, USA (1997)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 23rd Annual Computer Security Applications Conference (2007)
NGVCK: Ngvck download page. VXheavens-Virus eXchange Website. http://vx.netlux.org/vx.php?id=tn02
Paleari, R., Martignoni, L., Fresi, G., Bruschi, R.D.: A fistful of red-pills: how to automatically generate procedures to detect cpu emulators. In: Proceedings of the USENIX Workshop on Offensive Technologies (WOOT) (2009)
Payer, U., Teufl, P., Lamberger, M.: Hybrid engine for polymorphic shellcode detection. In: Proceedings of the Second International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA’05, pp. 19–31. Springer, Berlin (2005)
Polychronakis, M., Anagnostakis, K.G., Markatos, E.P.: Network-level polymorphic shellcode detection using emulation. In: Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), pp. 54–73 (2006)
Polychronakis, M., Anagnostakis, K.G., Markatos, E.P.: Comprehensive shellcode detection using runtime heuristics. In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pp. 287–296. ACM, New York, NY, USA (2010)
Preda, M.D., Christodorescu, M., Jha, S., Debray, S.: A semantics-based approach to malware detection. ACM Trans. Program. Lang. Syst. 30(5) (2008)
Raffetseder, T., Kruegel, C., Kirda, E.: Detecting System Emulators. In: 10th Information Security Conference (ISC) (2007)
Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The smart retrieval system: experiments in automatic document processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Proceedings of the 16th European Conference on Research in Computer Security, ESORICS’11, pp. 172–189. Springer, Berlin (2011). http://dl.acm.org/citation.cfm?id=2041225.2041239
Shafiq, Z., Khayam, S.A., Farooq, M.: Embedded malware detection using Markov n-grams. Lect. Notes Comput. Sci. 5137, 88–107 (2008)
Shaner, R.A.: Patent 5991714: method of identifying data type and locating in a file (1999)
Singh, P., Lakhotia, A.: Static verification of worm and virus behaviour in binary executables using model checking. In: Proceedings of the 4th IEEE Information Assurance Workshop, pp. 298–300. IEEE Computer Society, Los Alamitos, CA, USA (2003)
Sipser, M.: Introduction to the theory of computation. PWS (1997)
Song, Y., Locasto, M.E., Stavrou, A., Keromytis, A.D., Stolfo, S.J.: On the infeasibility of modeling polymorphic shellcode. Mach. Learn. 81, 179–205 (2010)
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol., pp. 538–556 (2009)
Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)
Symantec: Global Internet Security Threat Report (2009)
Ször, P.: The Art of Computer Virus Research and Defense, 1st edn. Symantec Press, Addison Wesley Professional, Reading (2005)
Tabish, M., Shafiq, Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on Cyber Security and Intelligence Informatics, pp. 23–31 (2009)
Tang, Y., Chen, S.: Defending against internet worms: a signature-based approach. In: INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 2, pp. 1384–1394 (2005)
Tang, Y., Xiao, B., Lu, X.: Signature tree generation for polymorphic worms. IEEE Trans. Comput. 60(4), 565–579 (2011)
Team, M.D.: Metasploit Project. http://www.metasploit.com (2006)
Toth, T., Kruegel, C.: Accurate buffer overflow detection via abstract payload execution. In: Proceedings of the Recent Advances in Intrusion Detection, RAID, pp. 274–291 (2002)
Triumphant, Inc.: The world-wide malware signature counter (2010). http://www.triumfant.com/Signature_Counter.asp
VCL: Vcl download page. VXheavens: Virus eXchange Website. http://vx.netlux.org/vx.php?id=tv03
VX heavens. http://vx.netlux.org
Walenstein, A., Mathur, R., Chouchane, M.R., Lakhotia, A.: Constructing malware normalizers using term rewriting. J. Comput. Virol. (2008). doi:10.1007/s11416-008-0081-5
Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proceedings of Black Hat Briefings. Black Hat (2007)
Wang, X., Chan Jhi, Y., Zhu, S., Liu, P.: Still: Exploit code detection via static taint and initialization analyses. In: Proceedings of the Computer Security Applications Conference, ACSAC, pp. 289–298. IEEE Computer Society (2008)
Wang, X., Pan, C.C., Liu, P., Zhu, S.: Sigfree: a signature-free buffer overflow attack blocker. In: Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15. USENIX Association, Berkeley, CA, USA (2006)
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Z0mbie: some ideas about metamorphism. http://vx.netlux.org/lib/vzo20.html
Zhou, Y., Inge, M.: Malware detection using adaptive data compression. In: AISec ’08: Proceedings of the 1st ACM Workshop on Workshop on AISec, pp. 53–60 (2008)
Acknowledgments
This material is based upon work supported by the Air Force Office of Scientific Research under Award No. FA9550-09-1-0715. The authors would like to thank Edna Milgo and Sushma Vallabhaneni for their assistance in conducting the experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chouchane, R., Stakhanova, N., Walenstein, A. et al. Detecting machine-morphed malware variants via engine attribution. J Comput Virol Hack Tech 9, 137–157 (2013). https://doi.org/10.1007/s11416-013-0183-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-013-0183-6