Skip to main content

Identifying Shared Software Components to Support Malware Forensics

  • Conference paper
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2014)

Abstract

Recent reports from the anti-malware industry indicate similarity between malware code resulting from code reuse can aid in developing a profile of the attackers. We describe a method for identifying shared components in a large corpus of malware, where a component is a collection of code, such as a set of procedures, that implement a unit of functionality. We develop a general architecture for identifying shared components in a corpus using a two-stage clustering technique. While our method is parametrized on any features extracted from a binary, our implementation uses features abstracting the semantics of blocks of instructions. Our system has been found to identify shared components with extremely high accuracy in a rigorous, controlled experiment conducted independently by MITLL. Our technique provides an automated method to find between malware code functional relationships that may be used to establish evolutionary relationships and aid in forensics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), P10008 (2008)

    Google Scholar 

  2. Böhne, L.: Pandora’s bochs: Automated malware unpacking. Master’s thesis, University of Mannheim (2008)

    Google Scholar 

  3. Caillat, B., Desnos, A., Erra, R.: Binthavro: Towards a useful and fast tool for goodware and malware analysis. In: Proceedings of the 9th European Conference on Information Warfare and Security: University of Macedonia and Strategy International Thessaloniki, Greece, July 1-2, p. 405. Academic Conferences Limited (2010)

    Google Scholar 

  4. Cesare, S., Xiang, Y., Zhou, W.: Malwise–an effective and efficient classification system for packed and polymorphic malware. IEEE Transcation on Computers 62, 1193–1206 (2013)

    Article  MathSciNet  Google Scholar 

  5. Cohen, C., Havrilla, J.S.: Function hashing for malicious code analysis. In: CERT Research Annual Report 2009, pp. 26–29. Software Engineering Institute, Carnegie Mellon University (2010)

    Google Scholar 

  6. Debray, S., Patel, J.: Reverse engineering self-modifying code: Unpacker extraction. In: 2010 17th Working Conference on Reverse Engineering (WCRE), pp. 131–140 (2010)

    Google Scholar 

  7. Dullien, T., Carrera, E., Eppler, S.-M., Porst, S.: Automated attacker correlation for malicious code. Technical report, DTIC Document (2010)

    Google Scholar 

  8. Dullien, T., Rolles, R.: Graph-based comparison of executable objects (english version). SSTIC 5, 1–3 (2005)

    Google Scholar 

  9. Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR) 44(2), 6 (2012)

    Article  Google Scholar 

  10. Gao, D., Reiter, M.K., Song, D.: Binhunt: Automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Hemel, A., Kalleberg, K.T., Vermaas, R., Dolstra, E.: Finding software license violations through binary code clone detection. In: Proceedings of the 8th Working Conference on Mining Software Repositories, pp. 63–72. ACM (2011)

    Google Scholar 

  12. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)

    Article  Google Scholar 

  13. Idika, N., Mathur, A.P.: A survey of malware detection techniques. Technical report, Department of Computer Science, Purdue University (2007)

    Google Scholar 

  14. Jang, J., Brumley, D., Venkataraman, S.: BitShred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS 2011, pp. 309–320. ACM, New York (2011)

    Google Scholar 

  15. Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Proceedings of the 22nd USENIX Conference on Security, pp. 81–96. USENIX Association (2013)

    Google Scholar 

  16. Kaspersky Lab. Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected (2012) (last accessed: September 13, 2012)

    Google Scholar 

  17. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Lakhotia, A., Dalla Preda, M., Giacobazzi, R.: Fast location of similar code fragments using semantic ‘juice’. In: SIGPLAN Program Protection and Reverse Engineering Workshop, p. 5. ACM (2013)

    Google Scholar 

  19. Lakhotia, A., Walenstein, A., Miles, C., Singh, A.: Vilo: a rapid learning nearest-neighbor classifier for malware triage. Journal of Computer Virology and Hacking Techniques, 1–15 (2013)

    Google Scholar 

  20. Linger, R., Daly, T., Pleszkoch, M.: Function extraction (FX) research for computation of software behavior: 2010 development and application of semantic reduction theorems for behavior analysis. Technical Report CMU/SEI-2011-TR-009, Carnegie Mellon University, Software Engineering Institute (February 2011)

    Google Scholar 

  21. Moran, N., Bennett, J.T.: Supply chain analysis: From quartermaster to sunshop. Technical report, FireEye Labs (November 2013)

    Google Scholar 

  22. Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown malcode detection using OPCODE representation. In: Ortiz-Arroyo, D., Larsen, H.L., Zeng, D.D., Hicks, D., Wagner, G. (eds.) EuroIsI 2008. LNCS, vol. 5376, pp. 204–215. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Newman, M.E.: Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  24. O’Gorman, G., McDonald, G.: The Elderwood Project (August 2012)

    Google Scholar 

  25. Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)

    Article  Google Scholar 

  26. Pfeffer, A., Call, C., Chamberlain, J., Kellogg, L., Ouellette, J., Patten, T., Zacharias, G., Lakhotia, A., Golconda, S., Bay, J., et al.: Malware analysis and attribution using genetic information. In: 2012 7th International Conference on Malicious and Unwanted Software (MALWARE), pp. 39–45. IEEE (2012)

    Google Scholar 

  27. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2012)

    Google Scholar 

  28. Rolles, R.: Unpacking virtualization obfuscators. In: Proceedings of the 3rd USENIX Conference on Offensive Technologies, p. 1. USENIX Association (2009)

    Google Scholar 

  29. Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. Journal in Computer Virology 8(1-2), 37–52 (2012)

    Article  Google Scholar 

  30. Sæbjørnsen, A., Willcock, J., Panas, T., Quinlan, D., Su, Z.: Detecting code clones in binary executables. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, pp. 117–128. ACM (2009)

    Google Scholar 

  31. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings. 2001 IEEE Symposium on Security and Privacy, SP 2001, pp. 38–49 (2001)

    Google Scholar 

  32. Shabtai, A., Menahem, E., Elovici, Y.: F-sign: Automatic, function-based signature generation for malware. IEEE Transactions on Systems, Man, and Cybernetics, Part C 41(4), 494–508 (2011)

    Article  Google Scholar 

  33. Tahan, G., Rokach, L., Shahar, Y.: Mal-id: Automatic malware detection using common segment analysis and meta-features. The Journal of Machine Learning Research 98888, 949–979 (2012)

    MathSciNet  Google Scholar 

  34. Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Elsevier Science (2008)

    Google Scholar 

  35. Walenstein, A., Lakhotia, A.: A transformation-based model of malware derivation. In: Malicious and Unwanted Software (MALWARE), pp. 17–25. IEEE (2012)

    Google Scholar 

  36. Yavvari, C., Tokhtabayev, A., Rangwala, H., Stavrou, A.: Malware characterization using behavioral components. In: Kotenko, I., Skormin, V. (eds.) MMM-ACNS 2012. LNCS, vol. 7531, pp. 226–239. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  37. Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 185–196. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ruttenberg, B. et al. (2014). Identifying Shared Software Components to Support Malware Forensics. In: Dietrich, S. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2014. Lecture Notes in Computer Science, vol 8550. Springer, Cham. https://doi.org/10.1007/978-3-319-08509-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08509-8_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08508-1

  • Online ISBN: 978-3-319-08509-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics