Dueling hidden Markov models for virus analysis

Kalbhor, Ashwin; Austin, Thomas H.; Filiol, Eric; Josse, Sébastien; Stamp, Mark

doi:10.1007/s11416-014-0232-9

Dueling hidden Markov models for virus analysis

Original Paper
Published: 30 November 2014

Volume 11, pages 103–118, (2015)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Ashwin Kalbhor¹,
Thomas H. Austin¹,
Eric Filiol²,
Sébastien Josse³ &
…
Mark Stamp¹

500 Accesses
18 Citations
Explore all metrics

Abstract

Recent work has presented hidden Markov models (HMMs) as a compelling option for malware identification. However, some advanced metamorphic malware like MetaPHOR and MWOR have proven to be more challenging to detect with these techniques. In this paper, we develop the dueling HMM Strategy, which leverages our knowledge about different compilers for more precise identification. We also show how this approach may be combined with previous techniques to minimize the performance overhead. Additionally, we examine the HMMs in order to identify the meaning of these hidden states. We examine HMMs for four different compilers, hand-written assembly code, three virus construction kits, and two metamorphic malware families in order to note similarities and differences in the hidden states of the HMMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

An expanded version of this section discussing hidden Markov models is available at http://www.cs.sjsu.edu/~stamp/RUA/HMM.
Alternately, we could reasonably define “most likely” as the state sequence with the highest probability from among all possible state sequences. Dynamic programming (DP) can be used to efficiently find this particular solution. Note that the DP solution and the HMM solution are not necessarily the same.
In the dynamic programming (DP) sense, we would simply choose the sequence with the highest probability, namely \(UUUG\). Note that this differs from the optimal solution in the HMM sense.
While NGVCK remains difficult to detect, its false positive rate plummets.

References

Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hack. Tech. pp. 1–15 (2014). doi: 10.1007/s11416-014-0215-x
Attaluri, S., McGhee, S., Stamp, M.: Profile hidden markov models and metamorphic virus detection. J. Comput. Virol. 5, 151–169 (2009). doi:10.1007/s11416-008-0105-1
Article Google Scholar
Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden markov models for virus analysis: a semantic approach. In: IEEE HICSS, pp. 5039–5048 (2013)
Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: DIMVA (2006)
Cave, R.L., Neuwirth, L.P.: Hidden markov models for english. In: Ferguson, J.D. (ed) Hidden Markov Models for Speech (1980)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Association for computational linguistics (1996). doi: 10.3115/981863.981904
Chess, D.M., White, S.R.: An undetectable computer virus. In: Virus bulletin conference (2000)
Cho, S.B., Han, S.J.: Two sophisticated techniques to improve hmm-based intrusion detection systems. In: RAID (2003)
Christodorescu, M., Jha, S.: Testing malware detectors. In: ISSTA (2004)
Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: Symposium on security and privacy (2005)
Clang: a C language family frontend for LLVM. http://www.clang.llvm.org. Accessed November 2011
Driller, T.M.: Metamorphic permutating high-obfuscating reassembler source. http://vx.netlux.org/29a/29a-6/29a-6.602. Accessed December 2011
Filiol, E., Josse, S.: A statistical model for undecidable viral detection. J. Comput. Virol. 3, 64–74 (2007). doi:10.1007/s11416-007-0041-5
Google Scholar
Filiol, E., Josse, S.: Malware spectral analysis: security evaluation of Bayesian network based detection models. In: EICAR conference (2011)
Francois, J.M.: JAHMM: An implementation of hidden Markov models in Java. http://code.google.com/p/jahmm/. Accessed October 2011
GCC, the GNU compiler collection. http://gcc.gnu.org/. Accessed November 2011
Iliopoulos, D., Adami, C., Szor, P.: Darwin inside the machines: malware evolution and the consequences for computer security. CoRR abs/1111.2503 (2011)
Intersimone, D.: Antique software: Turbo C version 2.01. http://edn.embarcadero.com/article/20841. Accessed November 2011
Krügel, C., Kirda, E., Mutz, D., Robertson, W.K., Vigna, G.: Polymorphic worm detection using structural information of executables. In: RAID (2005)
Leder, F., Steinbock, B., Martini, P.: Classification and detection of metamorphic malware using value set analysis. In: International conference on malicious and unwanted software MALWARE (2009)
Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)
Article Google Scholar
Madenur Sridhara, S., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. 9(2), 49–58 (2013). doi:10.1007/s11416-012-0174-z
MinGW | the minimalist GNU for Windows. http://www.mingw.org/. Accessed November 2011
Mohammed, M.: Zeroing in on metaphoric computer viruses. Master’s thesis, University of Louisiana at Lafayette (2003)
SnakeByte: next generation virus construktion kit. http://vxheavens.com/vx.php?id=tn02. Accessed December 2011
Song, Y., Locasto, M.E., Stavrou, A., Keromytis, A.D., Stolfo, S.J.: On the infeasibility of modeling polymorphic shellcode—re-thinking the role of learning in intrusion detection systems. Mach. Learn. 81(2), 179–205 (2010)
Article MathSciNet Google Scholar
Stamp, M.: A revealing introduction to hidden Markov models (2004). http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf. Accessed October 2011
Symantec security response: W32.simile. http://www.symantec.com/security_response/writeup.jsp?docid=2002-030617-5423-99. Accessed December 2011
Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Boston (2005)
Google Scholar
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Article Google Scholar
Zhang, Q., Reeves, D.S.: Metaaware: identifying metamorphic malware. In: ACSAC (2007)

Download references

Author information

Authors and Affiliations

Department of Computer Science, San José State University, San Jose, CA, USA
Ashwin Kalbhor, Thomas H. Austin & Mark Stamp
ESIEA Laboratoire (C + V)o, Laval, France
Eric Filiol
Direction générale de l’armement (DGA), Rennes, France
Sébastien Josse

Authors

Ashwin Kalbhor
View author publications
You can also search for this author in PubMed Google Scholar
Thomas H. Austin
View author publications
You can also search for this author in PubMed Google Scholar
Eric Filiol
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Josse
View author publications
You can also search for this author in PubMed Google Scholar
Mark Stamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas H. Austin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalbhor, A., Austin, T.H., Filiol, E. et al. Dueling hidden Markov models for virus analysis. J Comput Virol Hack Tech 11, 103–118 (2015). https://doi.org/10.1007/s11416-014-0232-9

Download citation

Received: 02 July 2014
Accepted: 11 November 2014
Published: 30 November 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11416-014-0232-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dueling hidden Markov models for virus analysis

Abstract

Access this article

Similar content being viewed by others

Support vector machines and malware detection

A comparison of static, dynamic, and hybrid analysis for malware detection

Metamorphic Malware Detection Using LLVM IR and Hidden Markov Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dueling hidden Markov models for virus analysis

Abstract

Access this article

Similar content being viewed by others

Support vector machines and malware detection

A comparison of static, dynamic, and hybrid analysis for malware detection

Metamorphic Malware Detection Using LLVM IR and Hidden Markov Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation