Unknown Malcode Detection Using OPCODE Representation

Moskovitch, Robert; Feher, Clint; Tzachar, Nir; Berger, Eugene; Gitelman, Marina; Dolev, Shlomi; Elovici, Yuval

doi:10.1007/978-3-540-89900-6_21

Robert Moskovitch⁶,
Clint Feher⁶,
Nir Tzachar⁶,
Eugene Berger⁶,
Marina Gitelman⁶,
Shlomi Dolev⁶ &
…
Yuval Elovici⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5376))

Included in the following conference series:

European Conference on Intelligence and Security Informatics

1537 Accesses
93 Citations

Abstract

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today’s signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams representation of the binary code of the executables. We propose the use of (Operation Code) OpCodes, generated by disassembling the executables. We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated the imbalance problem, referring to real-life scenarios, in which the malicious file content is expected to be about 10% of the total files. Our results indicate that greater than 99% accuracy can be achieved through the use of a training set that has a malicious file percentage lower than 15%, which is higher than in our previous experience with byte sequence n-gram representation [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown Malcode Detection via Text Categorization and the Imbalance Problem. In: IEEE Intelligence and Security Informatics, Taiwan (2008)
Google Scholar
Gryaznov, D.: Scanners of the Year 2000: Heuritics. In: The 5th International Virus Bulletin (1999)
Google Scholar
Shin, S., Jung, J., Balakrishnan, H.: Malware Prevalence in the KaZaA File-Sharing Network. In: Internet Measurement Conference (IMC), Brazil (October 2006)
Google Scholar
Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy (2001)
Google Scholar
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram Based Detection of New Malicious Code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC 2004 (2004)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478. ACM Press, New York (2004)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Kolter, J., Maloof, M.: Learning to Detect and Classify Malicious Executables in the Wild. Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Henchiri, O., Japkowicz, N.: A Feature Selection and Evaluation Scheme for Computer Virus Detection. In: Proceedings of ICDM 2006, Hong Kong, pp. 891–895 (2006)
Google Scholar
Dolev, S., Tzachar, N.: Malware signature builder and detection for executable code, patent application
Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations Newsletter 6(1), 1–6 (2004)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Article MATH Google Scholar
Golub, T., Slonim, D., Tamaya, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, Inc., San Francisco (1993)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A brief introduction to boosting. In: International Joint Conference on Artificial Intelligence (1999)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers, Inc., San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Be’er Sheva, 84105, Israel
Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev & Yuval Elovici

Authors

Robert Moskovitch
View author publications
You can also search for this author in PubMed Google Scholar
Clint Feher
View author publications
You can also search for this author in PubMed Google Scholar
Nir Tzachar
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Berger
View author publications
You can also search for this author in PubMed Google Scholar
Marina Gitelman
View author publications
You can also search for this author in PubMed Google Scholar
Shlomi Dolev
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Elovici
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Aalborg University Esbjerg, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
Daniel Ortiz-Arroyo
Department of Computer Science and Engineering, Aalborg University Esbjerg, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
Henrik Legind Larsen
Department of MIS, University of Arizona, 85721, Tucson, AZ, USA
Daniel Dajun Zeng
Computer Science Department, Aalborg University, DK-6700, Esbjerg, Denmark
David Hicks
European Commission - European Commission - Joint Research Centre (JRC) IPSC - SeS Unit T.P., Ispra, Italy
Gerhard Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moskovitch, R. et al. (2008). Unknown Malcode Detection Using OPCODE Representation. In: Ortiz-Arroyo, D., Larsen, H.L., Zeng, D.D., Hicks, D., Wagner, G. (eds) Intelligence and Security Informatics. EuroIsI 2008. Lecture Notes in Computer Science, vol 5376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89900-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-89900-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89899-3
Online ISBN: 978-3-540-89900-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics