Skip to main content
Log in

Dueling hidden Markov models for virus analysis

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Recent work has presented hidden Markov models (HMMs) as a compelling option for malware identification. However, some advanced metamorphic malware like MetaPHOR and MWOR have proven to be more challenging to detect with these techniques. In this paper, we develop the dueling HMM Strategy, which leverages our knowledge about different compilers for more precise identification. We also show how this approach may be combined with previous techniques to minimize the performance overhead. Additionally, we examine the HMMs in order to identify the meaning of these hidden states. We examine HMMs for four different compilers, hand-written assembly code, three virus construction kits, and two metamorphic malware families in order to note similarities and differences in the hidden states of the HMMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. An expanded version of this section discussing hidden Markov models is available at http://www.cs.sjsu.edu/~stamp/RUA/HMM.

  2. Alternately, we could reasonably define “most likely” as the state sequence with the highest probability from among all possible state sequences. Dynamic programming (DP) can be used to efficiently find this particular solution. Note that the DP solution and the HMM solution are not necessarily the same.

  3. In the dynamic programming (DP) sense, we would simply choose the sequence with the highest probability, namely \(UUUG\). Note that this differs from the optimal solution in the HMM sense.

  4. While NGVCK remains difficult to detect, its false positive rate plummets.

References

  1. Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hack. Tech. pp. 1–15 (2014). doi: 10.1007/s11416-014-0215-x

  2. Attaluri, S., McGhee, S., Stamp, M.: Profile hidden markov models and metamorphic virus detection. J. Comput. Virol. 5, 151–169 (2009). doi:10.1007/s11416-008-0105-1

    Article  Google Scholar 

  3. Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden markov models for virus analysis: a semantic approach. In: IEEE HICSS, pp. 5039–5048 (2013)

  4. Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: DIMVA (2006)

  5. Cave, R.L., Neuwirth, L.P.: Hidden markov models for english. In: Ferguson, J.D. (ed) Hidden Markov Models for Speech (1980)

  6. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Association for computational linguistics (1996). doi: 10.3115/981863.981904

  7. Chess, D.M., White, S.R.: An undetectable computer virus. In: Virus bulletin conference (2000)

  8. Cho, S.B., Han, S.J.: Two sophisticated techniques to improve hmm-based intrusion detection systems. In: RAID (2003)

  9. Christodorescu, M., Jha, S.: Testing malware detectors. In: ISSTA (2004)

  10. Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: Symposium on security and privacy (2005)

  11. Clang: a C language family frontend for LLVM. http://www.clang.llvm.org. Accessed November 2011

  12. Driller, T.M.: Metamorphic permutating high-obfuscating reassembler source. http://vx.netlux.org/29a/29a-6/29a-6.602. Accessed December 2011

  13. Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3, 64–74 (2007). doi:10.1007/s11416-007-0041-5

    Google Scholar 

  14. Filiol, E., Josse, S.: Malware spectral analysis: security evaluation of Bayesian network based detection models. In: EICAR conference (2011)

  15. Francois, J.M.: JAHMM: An implementation of hidden Markov models in Java. http://code.google.com/p/jahmm/. Accessed October 2011

  16. GCC, the GNU compiler collection. http://gcc.gnu.org/. Accessed November 2011

  17. Iliopoulos, D., Adami, C., Szor, P.: Darwin inside the machines: malware evolution and the consequences for computer security. CoRR abs/1111.2503 (2011)

  18. Intersimone, D.: Antique software: Turbo C version 2.01. http://edn.embarcadero.com/article/20841. Accessed November 2011

  19. Krügel, C., Kirda, E., Mutz, D., Robertson, W.K., Vigna, G.: Polymorphic worm detection using structural information of executables. In: RAID (2005)

  20. Leder, F., Steinbock, B., Martini, P.: Classification and detection of metamorphic malware using value set analysis. In: International conference on malicious and unwanted software MALWARE (2009)

  21. Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)

    Article  Google Scholar 

  22. Madenur Sridhara, S., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. 9(2), 49–58 (2013). doi:10.1007/s11416-012-0174-z

  23. MinGW | the minimalist GNU for Windows. http://www.mingw.org/. Accessed November 2011

  24. Mohammed, M.: Zeroing in on metaphoric computer viruses. Master’s thesis, University of Louisiana at Lafayette (2003)

  25. SnakeByte: next generation virus construktion kit. http://vxheavens.com/vx.php?id=tn02. Accessed December 2011

  26. Song, Y., Locasto, M.E., Stavrou, A., Keromytis, A.D., Stolfo, S.J.: On the infeasibility of modeling polymorphic shellcode—re-thinking the role of learning in intrusion detection systems. Mach. Learn. 81(2), 179–205 (2010)

    Article  MathSciNet  Google Scholar 

  27. Stamp, M.: A revealing introduction to hidden Markov models (2004). http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf. Accessed October 2011

  28. Symantec security response: W32.simile. http://www.symantec.com/security_response/writeup.jsp?docid=2002-030617-5423-99. Accessed December 2011

  29. Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Boston (2005)

    Google Scholar 

  30. Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)

    Article  Google Scholar 

  31. Zhang, Q., Reeves, D.S.: Metaaware: identifying metamorphic malware. In: ACSAC (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas H. Austin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalbhor, A., Austin, T.H., Filiol, E. et al. Dueling hidden Markov models for virus analysis. J Comput Virol Hack Tech 11, 103–118 (2015). https://doi.org/10.1007/s11416-014-0232-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-014-0232-9

Keywords

Navigation