Abstract
Detection of rapidly evolving malware requires classification techniques that can effectively and efficiently detect zero-day attacks. Such detection is based on a robust model of benign behavior and deviations from that model are used to detect malicious behavior. In this paper we propose a low-complexity host-based technique that uses deviations in static file attributes to detect malicious executables. We first develop simple statistical models of static file attributes derived from the empirical data of thousands of benign executables. Deviations among the attribute models of benign and malware executables are then quantified using information-theoretic (Kullback-Leibler-based) divergence measures. This quantification reveals distinguishing attributes that are considerably divergent between benign and malware executables and therefore can be used for detection. We use the benign models of divergent attributes in cross-correlation and log-likelihood frameworks to classify malicious executables. Our results, using over 4,000 malicious file samples, indicate that the proposed detector provides reasonably high detection accuracy, while having significantly lower complexity than existing detectors.
Similar content being viewed by others
References
Spafford, E.H.: The Internet Worm Program: An Analysis. Tech. Report CSD-TR-823. Department of Computer Science, Purdue University (1988)
Kephart, J.O., Arnold, W.C.: Automatic extraction of computer virus signatures. In: 4th Virus Bulletin International Conference, pp. 178–184 (1994)
Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically inspired defenses against computer viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985–996. Morgan Kaufmann, San Francisco (1995)
Lo R.W., Levitt K.N., Olsson R.A.: MCF: a malicious code filter. Comput. Secur. 14(6), 541–566 (1995)
Arnold, W., Tesauro, G.: Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 International Virus Bulletin Conference (2000)
Bayer, U.: TTAnalyze: A Tool for Analyzing Malware. Distributed System and Automation Groups, Technical University of Vienna (2005)
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of ACM SIGKDD (2004)
Stolfo, S.J., Wang, K., Li, W.-J.: Towards stealthy malware detection. In: Christodorescu, M., Jha, S., Maughan, D., Song, D., Wang, C. (eds.) Malware Detection. Advances in Information Security, vol. 27. Springer, US (2007)
Ashcraft, K., Engler, D.: Using programmer-written compiler extensions to catch security holes. In: Proceedings of the 2002 IEEE Symposium on Security and Privacy, pp. 143–159 (2002)
Krugel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of USENIX Security Symposium (2004)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49, Los Alamitos, CA, 2001. IEEE Press, USA (2001)
VX heavens. http://vx.netlux.org
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: ACSAC’07: Proceedings of the 23rd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (2007)
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: automating the hidden-code extraction of unpack-executing malware. In: ACSAC’06: Proceedings of the 22nd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (2006)
Kullback S., Leibler R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Yeung R.W.: A First Course in Information Theory. Kluwer Academic/Plenum Publishers, New York (2002)
Lin J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(3), 145–151 (1991)
Johnson, D.H., Sinanovic, S.: Symmetrizing the Kullback-Leibler distance. Technical Report (2001)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Mellish, C.S. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143. Morgan Kaufmann, Menlo Park (1995)
Li, K.-L., Haung, H.-K., Tian, S.-F., Xu, W.: Improving one-class SVM for anomaly detection. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, Wan, 2–5 November 2003
Mukkamala, S., Janoski, G.I., Sung, A.H.: Intrusion detection using support vector machines. In: Proceedings of the High Performance Computing Symposium—HPC 2002, pp. 178-183, San Diego, April 2002
Brockwell P., Davis R.: Introduction to time series and forecasting. Springer, Berlin (1996)
Self S.C., Liang K.Y.: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non-standard conditions. J. Am. Stat. Soc. 82, 605–610 (1987)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of 5th Annual ACM Workshop on COLT, pp. 144–152, Pittsburgh, PA, 1992. ACM Press, New York (1992)
Cortes C., Vapnik V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowl. Discov. 2(2), 121–167 (1998)
Joachims T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds) Advances in Kernel Methods—Support Vector Learning, MIT-Press, Cambridge (1999)
Mahalanobis P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)
Haagman, D., Ghavalas, B.: Trojan defence: a forensic view. Digital Investigation, vol. 2, Issue 1, pp. 23–30 (2005)
Stolfo S.J., Apap F., Eskin E., Heller K., Hershkop S., Honig A., Svore K.: A Comparative evaluation of two algorithms for windows registry anomaly detection. J. Comput. Secur. 13(4), 659–693 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khan, H., Mirza, F. & Khayam, S.A. Determining malicious executable distinguishing attributes and low-complexity detection. J Comput Virol 7, 95–105 (2011). https://doi.org/10.1007/s11416-010-0140-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-010-0140-6