Skip to main content

New Malicious Code Detection Using Variable Length n-grams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4332))

Abstract

Most of the commercial antivirus software fail to detect unknown and new malicious code. In order to handle this problem generic virus detection is a viable option. Generic virus detector needs features that are common to viruses. Recently Kolter et al. [16] propose an efficient generic virus detector using n-grams as features. The fixed length n-grams used there suffer from the drawback that they cannot capture meaningful sequences of different lengths. In this paper we propose a new method of variable-length n-grams extraction based on the concept of episodes and demonstrate that they outperform fixed length n-grams in malicious code detection. The proposed algorithm requires only two scans over the whole data set whereas most of the classical algorithms require scans proportional to the maximum length of n-grams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anagnostakis, K.G., Sidiroglou, S., Akritidis, P., Xinidis, K., Markatos, E., Keromytis, A.D.: Detecting targeted attacks using shadow honeypots. In: Proceedings of the 14th USENIX Security Symposium (2005)

    Google Scholar 

  2. Arnold, W., Tesauro, G.: Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 International Virus Bulletin Conference (2000)

    Google Scholar 

  3. Assaleh, T.A., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using N-grams signatures. In: Proceedings of the Second Annual Conference on Privacy, Security and Trust, pp. 193–196 (2004)

    Google Scholar 

  4. Balzer, R., Goldman, N.: Mediating Connectors. In: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems Workshop, Austin, TX, pp. 73–77 (1999)

    Google Scholar 

  5. Cavnar, W., Trenkle, J.: N-gram based text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)

    Google Scholar 

  6. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symp., Washington, DC, August 2003, pp. 169–186 (2003)

    Google Scholar 

  7. Cohen, P., Heeringa, B., Adams, N.M.: An unsupervised algorithm for segmenting categorical timeseries into episodes. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 49–62. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Dash, S.K., Reddy, K.S., Pujari, A.K.: Episode Based Masquerade Detection. In: Jajodia, S., Mazumdar, C. (eds.) ICISS 2005. LNCS, vol. 3803, pp. 251–262. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Debar, H., Dacier, M., Nassehi, M., Wespi, A.: Fixed vs. variable-length patterns for detecting suspicious process behavior. Journal of Computer Security 8(2/3) (2000)

    Google Scholar 

  10. Firoiu, L.: Segmenting Time Series with a Hybrid Neural Networks – Hidden Markov Model (2002), http://www.citeseer.ist.psu.edu/firoiu02segmenting.html

  11. Furnkranz, J.: A study using n-gram features for text categorization. Technical Report OEFAI-TR-9830, Austrian Research Institute for Artificial Intelligence (1998)

    Google Scholar 

  12. Gartner Inc. (2005), http://www.gartner.com/press_releases/asset_129199_11.html

  13. Gionis, A., Mannila, H.: Segmentation Algorithms for Time Series and Sequence Data. In: SIAM International Conference on Data Mining, Newport Beach, CA (2005)

    Google Scholar 

  14. Jiang, G., Chen, H., Ungureanu, C., Yoshihira, K.: Multi-resolution abnormal trace detection using varied-length N-grams and automata. In: Proceedings of the Second International Conference on Autonomic Computing (2005)

    Google Scholar 

  15. Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of IJCAI 1995, Montreal, August 1995, pp. 985–996 (1995)

    Google Scholar 

  16. Kolter, J.K., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)

    Google Scholar 

  17. Lo, R.W., Levitt, K.N., Olsson, R.A.: MCF: A malicious code filter. Computers & Society 14(6), 541–566 (1995)

    Article  Google Scholar 

  18. Marceau, C.: Characterizing the behavior of a program using multiple-length N-grams. In: Proceedings of the 2000 Workshop on New security paradigms (2000)

    Google Scholar 

  19. McGraw, G., Morrisett, G.: Attacking Malicious Code: A Report to the Infosec Research Council. IEEE Software (September/October 2000)

    Google Scholar 

  20. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  21. Nachenberg, C.: Understanding and managing polymorphic viruses. The Symantec Exterprise Papers, vol. XXX

    Google Scholar 

  22. Reddy, D.K.S., Pujari, A.K.: N-gram Analysis for New Computer Virus Detection. Communicated to the Journal in Computer Virology

    Google Scholar 

  23. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of IEEE Symposium on Security and Privacy (2001)

    Google Scholar 

  24. Schultz, M.G., Eskin, E., Zadok, E., Bhattacharyya, M., Stolfo, S.J.: MEF: Malicious Email Filter, A UNIX mail filter that detects malicious windows executables. In: Proceedings of USENIX Annual Technical Conference (2001)

    Google Scholar 

  25. Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Reading (2005)

    Google Scholar 

  26. VX Heavens, http://vx.netlux.org

  27. Witten, I., Frank, E.: Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  28. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of 14th International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reddy, D.K.S., Dash, S.K., Pujari, A.K. (2006). New Malicious Code Detection Using Variable Length n-grams. In: Bagchi, A., Atluri, V. (eds) Information Systems Security. ICISS 2006. Lecture Notes in Computer Science, vol 4332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11961635_19

Download citation

  • DOI: https://doi.org/10.1007/11961635_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68962-1

  • Online ISBN: 978-3-540-68963-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics