Skip to main content
Log in

Determining malicious executable distinguishing attributes and low-complexity detection

  • Original Paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

Detection of rapidly evolving malware requires classification techniques that can effectively and efficiently detect zero-day attacks. Such detection is based on a robust model of benign behavior and deviations from that model are used to detect malicious behavior. In this paper we propose a low-complexity host-based technique that uses deviations in static file attributes to detect malicious executables. We first develop simple statistical models of static file attributes derived from the empirical data of thousands of benign executables. Deviations among the attribute models of benign and malware executables are then quantified using information-theoretic (Kullback-Leibler-based) divergence measures. This quantification reveals distinguishing attributes that are considerably divergent between benign and malware executables and therefore can be used for detection. We use the benign models of divergent attributes in cross-correlation and log-likelihood frameworks to classify malicious executables. Our results, using over 4,000 malicious file samples, indicate that the proposed detector provides reasonably high detection accuracy, while having significantly lower complexity than existing detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Spafford, E.H.: The Internet Worm Program: An Analysis. Tech. Report CSD-TR-823. Department of Computer Science, Purdue University (1988)

  2. Kephart, J.O., Arnold, W.C.: Automatic extraction of computer virus signatures. In: 4th Virus Bulletin International Conference, pp. 178–184 (1994)

  3. Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985–996. Morgan Kaufmann, San Francisco (1995)

  4. Lo R.W., Levitt K.N., Olsson R.A.: MCF: a malicious code filter. Comput. Secur. 14(6), 541–566 (1995)

    Article  Google Scholar 

  5. Arnold, W., Tesauro, G.: Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 International Virus Bulletin Conference (2000)

  6. Bayer, U.: TTAnalyze: A Tool for Analyzing Malware. Distributed System and Automation Groups, Technical University of Vienna (2005)

  7. Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of ACM SIGKDD (2004)

  8. Stolfo, S.J., Wang, K., Li, W.-J.: Towards stealthy malware detection. In: Christodorescu, M., Jha, S., Maughan, D., Song, D., Wang, C. (eds.) Malware Detection. Advances in Information Security, vol. 27. Springer, US (2007)

  9. Ashcraft, K., Engler, D.: Using programmer-written compiler extensions to catch security holes. In: Proceedings of the 2002 IEEE Symposium on Security and Privacy, pp. 143–159 (2002)

  10. Krugel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of USENIX Security Symposium (2004)

  11. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49, Los Alamitos, CA, 2001. IEEE Press, USA (2001)

  12. VX heavens. http://vx.netlux.org

  13. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: ACSAC’07: Proceedings of the 23rd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (2007)

  14. Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: automating the hidden-code extraction of unpack-executing malware. In: ACSAC’06: Proceedings of the 22nd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (2006)

  15. Kullback S., Leibler R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  16. Yeung R.W.: A First Course in Information Theory. Kluwer Academic/Plenum Publishers, New York (2002)

    Google Scholar 

  17. Lin J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(3), 145–151 (1991)

    Article  MATH  Google Scholar 

  18. Johnson, D.H., Sinanovic, S.: Symmetrizing the Kullback-Leibler distance. Technical Report (2001)

  19. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Mellish, C.S. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143. Morgan Kaufmann, Menlo Park (1995)

  20. Li, K.-L., Haung, H.-K., Tian, S.-F., Xu, W.: Improving one-class SVM for anomaly detection. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Wan, 2–5 November 2003

  21. Mukkamala, S., Janoski, G.I., Sung, A.H.: Intrusion detection using support vector machines. In: Proceedings of the High Performance Computing Symposium—HPC 2002, pp. 178-183, San Diego, April 2002

  22. Brockwell P., Davis R.: Introduction to time series and forecasting. Springer, Berlin (1996)

    MATH  Google Scholar 

  23. Self S.C., Liang K.Y.: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non-standard conditions. J. Am. Stat. Soc. 82, 605–610 (1987)

    MATH  MathSciNet  Google Scholar 

  24. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of 5th Annual ACM Workshop on COLT, pp. 144–152, Pittsburgh, PA, 1992. ACM Press, New York (1992)

  25. Cortes C., Vapnik V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  26. Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Discov. 2(2), 121–167 (1998)

    Article  Google Scholar 

  27. Joachims T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods—Support Vector Learning, MIT-Press, Cambridge (1999)

    Google Scholar 

  28. Mahalanobis P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)

    MATH  MathSciNet  Google Scholar 

  29. Haagman, D., Ghavalas, B.: Trojan defence: a forensic view. Digital Investigation, vol. 2, Issue 1, pp. 23–30 (2005)

  30. Stolfo S.J., Apap F., Eskin E., Heller K., Hershkop S., Honig A., Svore K.: A Comparative evaluation of two algorithms for windows registry anomaly detection. J. Comput. Secur. 13(4), 659–693 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, H., Mirza, F. & Khayam, S.A. Determining malicious executable distinguishing attributes and low-complexity detection. J Comput Virol 7, 95–105 (2011). https://doi.org/10.1007/s11416-010-0140-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-010-0140-6

Keywords

Navigation