Semi-supervised Learning for Unknown Malware Detection

Santos, Igor; Nieves, Javier; Bringas, Pablo G.

doi:10.1007/978-3-642-19934-9_53

Igor Santos⁵,
Javier Nieves⁵ &
Pablo G. Bringas⁵

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 91))

1079 Accesses
45 Citations

Abstract

Malware is any kind of computer software potentially harmful to both computers and networks. The amount of malware is increasing every year and poses a serious global security threat. Signature-based detection is the most widely used commercial antivirus method, however, it consistently fails to detect new malware. Supervised machine-learning models have been used to solve this issue, but the usefulness of supervised learning is far to be perfect because it requires that a significant amount of malicious code and benign software to be identified and labelled beforehand. In this paper, we propose a new method of malware protection that adopts a semi-supervised learning approach to detect unknown malware. This method is designed to build a machine-learning classifier using a set of labelled (malware and legitimate software) and unlabelled instances.We performed an empirical validation demonstrating that the labelling efforts are lower than when supervised learning is used, while maintaining high accuracy rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. MIT Press, Cambridge (2006)
Google Scholar
Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Google Scholar
Kang, M., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 46–53 (2007)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478. ACM, New York (2004)
Chapter Google Scholar
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC), pp. 431–441 (2007)
Google Scholar
McGill, M., Salton, G.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Morley, P.: Processing virus collections. In: Proceedings of the 2001 Virus Bulletin Conference (VB 2001), pp. 129–134. Virus Bulletin (2001)
Google Scholar
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown malcode detection via text categorization and the imbalance problem. In: Proceedings of the 6th IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 156–161 (2008)
Google Scholar
Ollmann, G.: The evolution of commercial malware development kits and colour-by-numbers custom malware. Computer Fraud & Security 2008(9), 4–7 (2008)
Article Google Scholar
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC), pp. 289–300 (2006)
Google Scholar
Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010)
Chapter Google Scholar
Santos, I., Penya, Y., Devesa, J., Bringas, P.: N-Grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS). AIDSS, pp. 317–320 (2009)
Google Scholar
Schapire, R.: The boosting approach to machine learning: An overview. Lecture Notes in Statistics pp. 149–172 (2003)
Google Scholar
Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 22nd IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Google Scholar
Zubair Shafiq, M., Khayam, S.A., Farooq, M.: Embedded malware detection using markov n-grams. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 88–107. Springer, Heidelberg (2008)
Chapter Google Scholar
Sharif, M., Yegneswaran, V., Saidi, H., Porras, P.A., Lee, W.: Eureka: A framework for enabling static malware analysis. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 481–500. Springer, Heidelberg (2008)
Chapter Google Scholar
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology 35(2), 183–193 (2009)
Article Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, pp. 595–602 (2004)
Google Scholar
Zhou, Y., Inge, W.: Malware detection using adaptive data compression. In: Proceedings of the 1st ACM Workshop on AISec, pp. 53–60. ACM, New York (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Smartness, Semantics and Security (S3Lab), DeustoTech - Computing, University of Deusto, Avenida de las Universidades 24, 48007, Bilbao, Spain
Igor Santos, Javier Nieves & Pablo G. Bringas

Authors

Igor Santos
View author publications
You can also search for this author in PubMed Google Scholar
Javier Nieves
View author publications
You can also search for this author in PubMed Google Scholar
Pablo G. Bringas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence (SNIRE), P.O. Box 2259, 98071-2259, Auburn, WA, USA
Ajith Abraham
Department of Computing Science and Control, Faculty of Science, University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Juan M. Corchado
Department of Computing Science Faculty of Science, University of Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Sara Rodríguez González & Juan F. De Paz Santana &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santos, I., Nieves, J., Bringas, P.G. (2011). Semi-supervised Learning for Unknown Malware Detection. In: Abraham, A., Corchado, J.M., González, S.R., De Paz Santana, J.F. (eds) International Symposium on Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 91. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19934-9_53

Download citation

DOI: https://doi.org/10.1007/978-3-642-19934-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19933-2
Online ISBN: 978-3-642-19934-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics