Automated Classification and Analysis of Internet Malware

Bailey, Michael; Oberheide, Jon; Andersen, Jon; Mao, Z. Morley; Jahanian, Farnam; Nazario, Jose

doi:10.1007/978-3-540-74320-0_10

Automated Classification and Analysis of Internet Malware

Michael Bailey¹,
Jon Oberheide¹,
Jon Andersen¹,
Z. Morley Mao¹,
Farnam Jahanian^1,2 &
…
Jose Nazario²

Conference paper

2990 Accesses
224 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 4637))

Abstract

Numerous attacks, such as worms, phishing, and botnets, threaten the availability of the Internet, the integrity of its hosts, and the privacy of its users. A core element of defense against these attacks is anti-virus (AV) software—a service that detects, removes, and characterizes these threats. The ability of these products to successfully characterize these threats has far-reaching effects—from facilitating sharing across organizations, to detecting the emergence of new threats, and assessing risk in quarantine and cleanup. In this paper, we examine the ability of existing host-based anti-virus products to provide semantically meaningful information about the malicious software and tools (or malware) used by attackers. Using a large, recent collection of malware that spans a variety of attack vectors (e.g., spyware, worms, spam), we show that different AV products characterize malware in ways that are inconsistent across AV products, incomplete across malware, and that fail to be concise in their semantics. To address these limitations, we propose a new classification technique that describes malware behavior in terms of system state changes (e.g., files written, processes created) rather than in sequences or patterns of system calls. To address the sheer volume of malware and diversity of its behavior, we provide a method for automatically categorizing these profiles of malware into groups that reflect similar classes of behaviors and demonstrate how behavior-based clustering provides a more direct and effective way of classifying and analyzing Internet malware.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arbor malware library (AML) (2006), http://www.arbornetworks.com/
Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.: The nepenthes platform: An efficient approach to collect malware. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, Springer, Heidelberg (2006)
Chapter Google Scholar
Barford, P., Yagneswaran, V.: An inside look at botnets. In: Series: Advances in Information Security, Springer, Heidelberg (2006)
Google Scholar
Beck, D., Connolly, J.: The Common Malware Enumeration Initiative. In: Virus Bulletin Conference (October 2006)
Google Scholar
Willems, C., Holz, T.: Cwsandbox ( 2007), http://www.cwsandbox.org/
Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy (Oakland 2005), Oakland, CA, USA, May 2005, pp. 32–46. ACM Press, New York (2005)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge, MA (1990)
Google Scholar
Crandall, J.R., Wassermann, G., de Oliveira, D.A.S., Su, Z., Wu, S.F., Chong, F.T.: Temporal Search: Detecting Hidden Malware Timebombs with Virtual Machines. In: Proceedings of ASPLOS, San Jose, CA, October 2006, ACM Press, New York (2006)
Google Scholar
Ellis, D., Aiken, J., Attwood, K., Tenaglia, S.: A Behavioral Approach to Worm Detection. In: Proceedings of the ACM Workshop on Rapid Malcode (WORM 2004), October 2004, ACM Press, New York (2004)
Google Scholar
Gao, D., Beck, D., Reiter, J.C.M.K., Song, D.X.: Behavioral distance measurement using hidden markov models. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 19–40. Springer, Heidelberg (2006)
Chapter Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
MATH Google Scholar
King, S.T., Chen, P.M.: Backtracking intrusions. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP 2003), Bolton Landing, NY, USA, October 2003, pp. 223–236. ACM Press, New York (2003)
Chapter Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to Detect and Classify Malicious Executables in the Wild. Journal of Machine Learning Research (2007)
Google Scholar
Koutsofios, E., North, S.C.: Drawing graphs with dot. Technical report, AT&T Bell Laboratories, Murray Hill, NJ (October 8, 1993)
Google Scholar
Lee, T., Mody, J.J.: Behavioral classification. In: Proceedings of EICAR 2006 (April 2006)
Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. In: SODA 2003: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics, pp. 863–872 (2003)
Google Scholar
Li, Z., Sanghi, M., Chen, Y., Kao, M., Chavez, B.: Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience. In: Proc. of IEEE Symposium on Security and Privacy, IEEE Computer Society Press, Los Alamitos (2006)
Google Scholar
Ma, J., Dunagan, J., Wang, H., Savage, S., Voelker, G.: Finding Diversity in Remote Code Injection Exploits. In: Proceedings of the USENIX/ACM Internet Measurement Conference, October 2006, ACM Press, New York (2006)
Google Scholar
McAfee: W32/Sdbot.worm (April 2003), http://vil.nai.com/vil/content/v_100454.htm
Microsoft: Microsoft security intelligence report: (January-June 2006) (October 2006), http://www.microsoft.com/technet/security/default.mspx
Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: Proceedings of the IEEE Symposium on Security and Privacy (Oakland 2007), May 2007, IEEE Computer Society Press, Los Alamitos (2007)
Google Scholar
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A Crawler-based Study of Spyware in the Web. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA (2006)
Google Scholar
Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: Proceedings 2005 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 8–11, 2005, IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Norman Solutions: Norman sandbox whitepaper (2003), http://download.norman.no/whitepapers/whitepaper_Norman_SandBox.pdf
Nykter, M., Yli-Harja, O., Shmulevich, I.: Normalized compression distance for gene expression analysis. In: Workshop on Genomic Signal Processing and Statistics (GENSIPS) (May 2005)
Google Scholar
Prince, M.B., Dahl, B.M., Holloway, L., Keller, A.M., Langheinrich, E.: Understanding how spammers steal your e-mail address: An analysis of the first six months of data from project honey pot. In: Second Conference on Email and Anti-Spam (CEAS 2005) (July 2005)
Google Scholar
Walters, B.: VMware virtual platform. j-LINUX-J 63 (July 1999)
Google Scholar
Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.T.: Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2006, San Diego, California, USA (2006)
Google Scholar
Wehner, S.: Analyzing worms and network traffic using compression. Technical report, CWI, Amsterdam (2005)
Google Scholar
Yegneswaran, V., Giffin, J.T., Barford, P., Jha, S.: An Architecture for Generating Semantics-Aware Signatures. In: Proceedings of the 14th USENIX Security Symposium, Baltimore, MD, USA, August 2005, pp. 97–112 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering and Computer Science Department, University of Michigan,
Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao & Farnam Jahanian
Arbor Networks,
Farnam Jahanian & Jose Nazario

Authors

Michael Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Jon Oberheide
View author publications
You can also search for this author in PubMed Google Scholar
Jon Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Z. Morley Mao
View author publications
You can also search for this author in PubMed Google Scholar
Farnam Jahanian
View author publications
You can also search for this author in PubMed Google Scholar
Jose Nazario
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christopher Kruegel Richard Lippmann Andrew Clark

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J. (2007). Automated Classification and Analysis of Internet Malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds) Recent Advances in Intrusion Detection. RAID 2007. Lecture Notes in Computer Science, vol 4637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74320-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-74320-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74319-4
Online ISBN: 978-3-540-74320-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics