Skip to main content

Automated Construction of Malware Families

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11611))

Abstract

Discovery of malware families from behavioral characteristics of a set of malware traces is an important step in the detection of malware. Malware in the wild often occur as variants of each other. In this work, a data dependent formalism is described for the construction of malware families from trace data. The malware families are represented in an edge labeled graph where the nodes represent a malware trace and edges describe relationship between the malware traces. The edge labels contain a numerical value representing similarity between the malware traces. Network theoretical concepts such as hubs are evaluated on the edge labeled graph. The formalism has been elucidated by the experiments performed on multiple data sets of malware traces.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Csmining group. http://csmining.org/index.php/malicious-software-datasets-.html. Accessed 5 May 2017

  2. Anderson, B., Lane, T., Hash, C.: Malware phylogenetics based on the multiview graphical lasso. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_1

    Chapter  Google Scholar 

  3. Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)

    Google Scholar 

  4. Carrera, E., Erdélyi, G.: Digital genome mapping-advanced binary malware analysis. In: Virus Bulletin Conference, vol. 11 (2004)

    Google Scholar 

  5. Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: 2005 IEEE Symposium on Security and Privacy, pp. 32–46. IEEE (2005)

    Google Scholar 

  6. Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: Wireless Communications and Networking Conference (WCNC), 2013 IEEE, pp. 4487–4492. IEEE (2013)

    Google Scholar 

  7. Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)

    Article  MathSciNet  Google Scholar 

  8. Deng, K., Sun, Y., Mehta, P.G., Meyn, S.P.: An information-theoretic framework to aggregate a Markov chain. In: American Control Conference, 2009. ACC 2009, pp. 731–736. IEEE (2009)

    Google Scholar 

  9. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  10. Ghosh, K., Mills, J., Dorr, J.: Phylogenetic-inspired probabilistic model abstraction in detection of malware families. In: 2017 AAAI Fall Symposium Series (2017)

    Google Scholar 

  11. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MathSciNet  Google Scholar 

  12. Goldberg, L.A., Goldberg, P.W., Phillips, C.A., Sorkin, G.B.: Constructing computer virus phylogenies. J. Algorithms 26(1), 188–208 (1998)

    Article  MathSciNet  Google Scholar 

  13. Haq, I., Chica, S., Caballero, J., Jha, S.: Malware lineage in the wild. Comput. Secur. 78, 347–363 (2018)

    Article  Google Scholar 

  14. Hayes, M., Walenstein, A., Lakhotia, A.: Evaluation of malware phylogeny modelling systems using automated variant generation. J. Comput. Virol. 5(4), 335–343 (2009)

    Article  Google Scholar 

  15. Idika, N., Mathur, A.P.: A Survey of Malware Detection Techniques, vol. 48. Purdue University (2007)

    Google Scholar 

  16. Jang, J., Brumley, D., Venkataraman, S.: Bitshred: Fast, scalable malware triage. Cylab, Carnegie Mellon University, Pittsburgh, PA, Technical report CMU-Cylab-10, vol. 22 (2010)

    Google Scholar 

  17. Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as Part of the 22nd USENIX Security Symposium (USENIX Security 13), pp. 81–96 (2013)

    Google Scholar 

  18. Jordaney, R., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: Misleading metrics: on evaluating machine learning for malware with confidence. Technical report (2016)

    Google Scholar 

  19. Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)

    Article  Google Scholar 

  20. Khoo, W.M., Lió, P.: Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families. In: SysSec Workshop (SysSec), 2011 First, pp. 3–10. IEEE (2011)

    Google Scholar 

  21. Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 11(6), 659101 (2015)

    Article  Google Scholar 

  22. Kim, H.M., Song, H.M., Seo, J.W., Kim, H.K.: Andro-simnet: android malware family classification using social network analysis. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–8. IEEE (2018)

    Google Scholar 

  23. Kim, H., Khoo, W.M., Liò, P.: Polymorphic attacks against sequence-based software birthmarks. In: 2nd ACM SIGPLAN Workshop on Software Security and Protection (2012)

    Google Scholar 

  24. Kleinberg, J.M.: Hubs, authorities, and communities. ACM Comput, Surv. (CSUR) 31(4es), 5 (1999)

    Article  Google Scholar 

  25. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  26. Lakhotia, A., Notani, V., LeDoux, C.: Malware economics and its implication to anti-malware situational awareness. In: 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), pp. 1–8. IEEE (2018)

    Google Scholar 

  27. Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)

    Article  Google Scholar 

  28. Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13

    Chapter  Google Scholar 

  29. Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor. 37(1), 145–151 (1991)

    Article  MathSciNet  Google Scholar 

  30. Liu, J., Wang, Y., Wang, Y.: Inferring phylogenetic networks of malware families from API sequences. In: 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 14–17 (2016)

    Google Scholar 

  31. Pattanayak, H.S., Verma, H.K., Sangal, A.: Community detection metrics and algorithms in social networks. In: 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), pp. 483–489. IEEE (2019)

    Google Scholar 

  32. Rached, Z., Alajaji, F., Campbell, L.L.: The Kullback-leibler divergence rate between Markov sources. IEEE Trans. Inf. Theor. 50(5), 917–921 (2004)

    Article  MathSciNet  Google Scholar 

  33. Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6

    Chapter  Google Scholar 

  34. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)

    Article  Google Scholar 

  35. Rossow, C., et al.: Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 65–79. IEEE (2012)

    Google Scholar 

  36. Singh, J., Nene, M.J.: A survey on machine learning techniques for intrusion detection systems. Int. J. Adv. Res. Comput. Commun. Eng. 2(11), 4349–4355 (2013)

    Google Scholar 

  37. Sorkin, G.: Grouping related computer viruses into families. In: Proceedings of the IBM Security ITS (1994)

    Google Scholar 

  38. Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM Trans. Priv. Secur. (TOPS) 22(1), 6 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krishnendu Ghosh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghosh, K., Mills, J. (2019). Automated Construction of Malware Families. In: Wang, G., Feng, J., Bhuiyan, M., Lu, R. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2019. Lecture Notes in Computer Science(), vol 11611. Springer, Cham. https://doi.org/10.1007/978-3-030-24907-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24907-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24906-9

  • Online ISBN: 978-3-030-24907-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics