New Malicious Code Detection Using Variable Length n-grams

Reddy, D. Krishna Sandeep; Dash, Subrat Kumar; Pujari, Arun K.

doi:10.1007/11961635_19

New Malicious Code Detection Using Variable Length n-grams

D. Krishna Sandeep Reddy¹⁸,
Subrat Kumar Dash¹⁸ &
Arun K. Pujari¹⁸

Conference paper

1105 Accesses
19 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4332))

Abstract

Most of the commercial antivirus software fail to detect unknown and new malicious code. In order to handle this problem generic virus detection is a viable option. Generic virus detector needs features that are common to viruses. Recently Kolter et al. [16] propose an efficient generic virus detector using n-grams as features. The fixed length n-grams used there suffer from the drawback that they cannot capture meaningful sequences of different lengths. In this paper we propose a new method of variable-length n-grams extraction based on the concept of episodes and demonstrate that they outperform fixed length n-grams in malicious code detection. The proposed algorithm requires only two scans over the whole data set whereas most of the classical algorithms require scans proportional to the maximum length of n-grams.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anagnostakis, K.G., Sidiroglou, S., Akritidis, P., Xinidis, K., Markatos, E., Keromytis, A.D.: Detecting targeted attacks using shadow honeypots. In: Proceedings of the 14^th USENIX Security Symposium (2005)
Google Scholar
Arnold, W., Tesauro, G.: Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 International Virus Bulletin Conference (2000)
Google Scholar
Assaleh, T.A., Cercone, N., Keselj, V., Sweidan, R.: Detection of new malicious code using N-grams signatures. In: Proceedings of the Second Annual Conference on Privacy, Security and Trust, pp. 193–196 (2004)
Google Scholar
Balzer, R., Goldman, N.: Mediating Connectors. In: Proceedings of the 19^th IEEE International Conference on Distributed Computing Systems Workshop, Austin, TX, pp. 73–77 (1999)
Google Scholar
Cavnar, W., Trenkle, J.: N-gram based text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)
Google Scholar
Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symp., Washington, DC, August 2003, pp. 169–186 (2003)
Google Scholar
Cohen, P., Heeringa, B., Adams, N.M.: An unsupervised algorithm for segmenting categorical timeseries into episodes. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 49–62. Springer, Heidelberg (2002)
Chapter Google Scholar
Dash, S.K., Reddy, K.S., Pujari, A.K.: Episode Based Masquerade Detection. In: Jajodia, S., Mazumdar, C. (eds.) ICISS 2005. LNCS, vol. 3803, pp. 251–262. Springer, Heidelberg (2005)
Chapter Google Scholar
Debar, H., Dacier, M., Nassehi, M., Wespi, A.: Fixed vs. variable-length patterns for detecting suspicious process behavior. Journal of Computer Security 8(2/3) (2000)
Google Scholar
Firoiu, L.: Segmenting Time Series with a Hybrid Neural Networks – Hidden Markov Model (2002), http://www.citeseer.ist.psu.edu/firoiu02segmenting.html
Furnkranz, J.: A study using n-gram features for text categorization. Technical Report OEFAI-TR-9830, Austrian Research Institute for Artificial Intelligence (1998)
Google Scholar
Gartner Inc. (2005), http://www.gartner.com/press_releases/asset_129199_11.html
Gionis, A., Mannila, H.: Segmentation Algorithms for Time Series and Sequence Data. In: SIAM International Conference on Data Mining, Newport Beach, CA (2005)
Google Scholar
Jiang, G., Chen, H., Ungureanu, C., Yoshihira, K.: Multi-resolution abnormal trace detection using varied-length N-grams and automata. In: Proceedings of the Second International Conference on Autonomic Computing (2005)
Google Scholar
Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of IJCAI 1995, Montreal, August 1995, pp. 985–996 (1995)
Google Scholar
Kolter, J.K., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)
Google Scholar
Lo, R.W., Levitt, K.N., Olsson, R.A.: MCF: A malicious code filter. Computers & Society 14(6), 541–566 (1995)
Article Google Scholar
Marceau, C.: Characterizing the behavior of a program using multiple-length N-grams. In: Proceedings of the 2000 Workshop on New security paradigms (2000)
Google Scholar
McGraw, G., Morrisett, G.: Attacking Malicious Code: A Report to the Infosec Research Council. IEEE Software (September/October 2000)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Nachenberg, C.: Understanding and managing polymorphic viruses. The Symantec Exterprise Papers, vol. XXX
Google Scholar
Reddy, D.K.S., Pujari, A.K.: N-gram Analysis for New Computer Virus Detection. Communicated to the Journal in Computer Virology
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of IEEE Symposium on Security and Privacy (2001)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Bhattacharyya, M., Stolfo, S.J.: MEF: Malicious Email Filter, A UNIX mail filter that detects malicious windows executables. In: Proceedings of USENIX Annual Technical Conference (2001)
Google Scholar
Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Reading (2005)
Google Scholar
VX Heavens, http://vx.netlux.org
Witten, I., Frank, E.: Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of 14th International Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Lab, University of Hyderabad, Hyderabad, 500 046, India
D. Krishna Sandeep Reddy, Subrat Kumar Dash & Arun K. Pujari

Authors

D. Krishna Sandeep Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Subrat Kumar Dash
View author publications
You can also search for this author in PubMed Google Scholar
Arun K. Pujari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Statistical Institute, Computer and Statistical Service Center, 203, B.T. Road, 700108, Kolkata, India
Aditya Bagchi
MSIS Department and CIMIC, Rutgers University, USA
Vijayalakshmi Atluri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reddy, D.K.S., Dash, S.K., Pujari, A.K. (2006). New Malicious Code Detection Using Variable Length n-grams. In: Bagchi, A., Atluri, V. (eds) Information Systems Security. ICISS 2006. Lecture Notes in Computer Science, vol 4332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11961635_19

Download citation

DOI: https://doi.org/10.1007/11961635_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68962-1
Online ISBN: 978-3-540-68963-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics