Malicious Code Detection Using Active Learning

Moskovitch, Robert; Nissim, Nir; Elovici, Yuval

doi:10.1007/978-3-642-01718-6_6

Robert Moskovitch²⁰,
Nir Nissim²⁰ &
Yuval Elovici²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5456))

Included in the following conference series:

International Workshop on Privacy, Security, and Trust in KDD

641 Accesses
23 Citations

Abstract

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic and other malicious purposes. Currently, dozens of new malicious codes are created every day and this number is expected to increase in the coming years. Today’s signature-based anti-viruses and heuristic-based methods are accurate, but cannot detect new malicious code. Recently, classification algorithms were used successfully for the detection of malicious code. We present a complete methodology for the detection of unknown malicious code, inspired by text categorization concepts. However, this approach can be exploited further to achieve a more accurate and efficient acquisition method of unknown malicious files. We use an Active-Learning framework that enables the selection of the unknown files for fast acquisition. We performed an extensive evaluation of a test collection consisting of more than 30,000 files. We present a rigorous evaluation setup, consisting of real-life scenarios, in which the malicious file content is expected to be low, at about 10% of the files in the stream. We define specific evaluation measures based on the known precision and recall measures, which show the accuracy of the acquisition process and the improvement in the classifier resulting from the efficient acquisition process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shin, S., Jung, J., Balakrishnan, H.: Malware Prevalence in the KaZaA File-Sharing Network. In: Internet Measurement Conference (IMC), Brazil (October 2006)
Google Scholar
Gryaznov, D.: Scanners of the Year 2000: Heuristics. In: Proceedings of the 5th International Virus Bulletin (1999)
Google Scholar
Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 178–184 (2001)
Google Scholar
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram Based Detection of New Malicious Code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, COMPSAC 2004 (2004)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478. ACM, New York (2004)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Kolter, J., Maloof, M.: Learning to Detect and Classify Malicious Executables in the Wild. Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Henchiri, O., Japkowicz, N.: A Feature Selection and Evaluation Scheme for Computer Virus Detection. In: Proceedings of ICDM 2006, pp. 891–895 (2006)
Google Scholar
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown Malcode Detection via Text Categorization and the Imbalance Problem. In: IEEE Intelligence and Security Informatics (ISI 2008), Taiwan (2008)
Google Scholar
Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1988)
MathSciNet Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proceedingsof the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, Heidelberg (1994)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2000-2001)
MATH Google Scholar
Roy, N., McCallum, A.: Toward Optimal Active Learning through Sampling Estimation of Error Reduction. In: ICML (2001)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Article MATH Google Scholar
Golub, T., Slonim, D., Tamaya, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Joachims, T.: Making large-scale support vector machine learning practical. In: Scholkopf, B., Burges, C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge (1998)
Google Scholar
Chang, C.C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–167 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Deutsche Telekom Laboratories at Ben Gurion University, Ben Gurion University, Beer Sheva, 84105, Israel
Robert Moskovitch, Nir Nissim & Yuval Elovici

Authors

Robert Moskovitch
View author publications
You can also search for this author in PubMed Google Scholar
Nir Nissim
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Elovici
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo! Research Barcelona, Avinguda Drive 177, 08018, Barcelona, Spain
Francesco Bonchi
Department of Computer Science and Communication, University of Insubria, 22100, Varese, Italy
Elena Ferrari
311 Computer Science Building, 500W. 15th St., MO 65409, Rolla, USA
Wei Jiang
Department of Biomedical Informatics, Vanderbilt University, TN 37203, nashville, USA
Bradley Malin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moskovitch, R., Nissim, N., Elovici, Y. (2009). Malicious Code Detection Using Active Learning. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds) Privacy, Security, and Trust in KDD. PInKDD 2008. Lecture Notes in Computer Science, vol 5456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01718-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-01718-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01717-9
Online ISBN: 978-3-642-01718-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics