Abstract
Packing is the most common obfuscation method used by malware writers to hinder malware detection and analysis. There has been a dramatic increase in the number of new packers and variants of existing ones combined with packers employing increasingly sophisticated anti-unpacker tricks and obfuscation methods. This makes it difficult, costly and time-consuming for anti-virus (AV) researchers to carry out the traditional static packer identification and classification methods which are mainly based on the packer’s byte signature.
In this paper, we present a simple, yet fast and effective packer classification framework that applies pattern recognition techniques on automatically extracted randomness profiles of packers. This system can be run without AV researcher’s manual input. We test various statistical classification algorithms, including k −Nearest Neighbor, Best-first Decision Tree, Sequential Minimal Optimization and Naive Bayes. We test these algorithms on a large data set that consists of clean packed files and 17,336 real malware samples. Experimental results demonstrate that our packer classification system achieves extremely high effectiveness (> 99%). The experiments also confirm that the randomness profile used in the system is a very strong feature for packer classification. It can be applied with high accuracy on real malware samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The WildList Organization International: WildList, http://www.wildlist.org/
Brosch, T., Morgenstern, M.: Runtime Packers: The hidden problem? Black Hat USA (2006), http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Morgenstern.pdf
Bustamante, P.: Mal(ware)formation Statistics (2007), http://research.pandasecurity.com/malwareformation-statistics/
Morgenstern, M., Marx, A.: Runtime Packer Testing Experiences. In: 2nd International CARO Workshop (2008), www.datasecurity-event.com/uploads/runtimepacker.ppt
Ebringer, T., Sun, L., Boztaş, S.: A Fast Randomness Test that Preserves Local Detail. In: Proceedings of 18th Virus Bulletin International Conference, pp. 34–42 (2008)
Pietrek, M.: An In-depth Look into the Win32 Portable Executable File Format (2002), http://msdn.microsoft.com/msdnmag/issue/02/02/PE/print.asp
Ferrie, P.: Anti-unpacker Tricks Current. In: 2nd International CARO Workshop (2008), http://www.datasecurity-event.com/uploads/unpackers.pdf
Ferrie, P.: Anti-unpacker Tricks 2 Part One. Virus Bulletin, 4–8 (December 2008)
Ferrie, P.: Anti-unpacker Tricks 2 Part Two. Virus Bulletin, 4–9 (January 2009)
Ferrie, P.: Anti-unpacker Tricks 2 Part Three. Virus Bulletin, 4–9 (Febuary 2009)
Ferrie, P.: Anti-unpacker Tricks 2 Part Tour. Virus Bulletin, 4–7 (March 2009)
VMware workstation, http://www.vmware.com/products/ws/
PEiD, http://www.peid.info/
Carrera, E.: pefile, http://code.google.com/p/pefile/
Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically Inspired Defenses against Computer Viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985–996 (1995)
Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural Networks for Computer Virus Recognition. IEEE Expert 11(4), 5–6 (1996)
Siddiqui, M.A.: Data Mining Methods for Malware Detection. Master’s thesis, University of Central Florida, Orlando (2008)
Kolter, J.Z., Maloof, M.A.: Learning to Detect and Classify Malicious Executables in the Wild. JMLR 7, 2699–2720 (2006)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data Mining Methods for Detection of New Malicious Executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Cohen, W.W.: Learning Rules that Classify E-mail. In: Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, pp. 18–25 (1996)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. AAAI Technical Report WS-98-05, pp. 55–62 (1998)
Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naive Bayesian and a Memory-based Approach. In: Proceedings of Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 1–13 (2000)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayesian and Keyword-based Anti-spam Filtering with Encrypted Personal Messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (2000)
Perdisci, R., Lanzi, A., Lee, W.: Classification of Packed Executables for Accurate Computer Virus Detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
van Rijsbergen, C.J.: Information Retrieval, Butterworths (1979)
Syring, K.M.: GNU Utilities for Win32 (2004), http://unxutils.sourceforge.net/
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Holmes, G., Donkin, A., Witten, I.H.: Weka: A Machine Learning Workbench. In: Proceedings of 2nd Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia (1994)
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: IJCAI, pp. 1137–1145 (1995)
Chou, Y.Y., Shapiro, L.G.: A Hierarchical Multiple Classifier Learning Algorithm. In: Proceedings of 15th International Conference on Pattern Recognition (ICPR 2000), vol. 2, pp. 2152–2155 (2000)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, Inc., London (2006)
Zhang, H.: The Optimality of Naive Bayes. In: FLAIRS Conf. (2004)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based Learning Algorithms. Machine Learning 6(1), 37–66 (1991)
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Platt, J.C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research (1998)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Shi, H.J.: Best-first Decision Tree Learning. Master’s thesis, The University of Waikato (2007)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth, Monterey (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, L., Versteeg, S., Boztaş, S., Yann, T. (2010). Pattern Recognition Techniques for the Classification of Malware Packers. In: Steinfeld, R., Hawkes, P. (eds) Information Security and Privacy. ACISP 2010. Lecture Notes in Computer Science, vol 6168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14081-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-14081-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14080-8
Online ISBN: 978-3-642-14081-5
eBook Packages: Computer ScienceComputer Science (R0)