Abstract
Discovery of malware families from behavioral characteristics of a set of malware traces is an important step in the detection of malware. Malware in the wild often occur as variants of each other. In this work, a data dependent formalism is described for the construction of malware families from trace data. The malware families are represented in an edge labeled graph where the nodes represent a malware trace and edges describe relationship between the malware traces. The edge labels contain a numerical value representing similarity between the malware traces. Network theoretical concepts such as hubs are evaluated on the edge labeled graph. The formalism has been elucidated by the experiments performed on multiple data sets of malware traces.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Csmining group. http://csmining.org/index.php/malicious-software-datasets-.html. Accessed 5 May 2017
Anderson, B., Lane, T., Hash, C.: Malware phylogenetics based on the multiview graphical lasso. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_1
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)
Carrera, E., Erdélyi, G.: Digital genome mapping-advanced binary malware analysis. In: Virus Bulletin Conference, vol. 11 (2004)
Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: 2005 IEEE Symposium on Security and Privacy, pp. 32–46. IEEE (2005)
Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: Wireless Communications and Networking Conference (WCNC), 2013 IEEE, pp. 4487–4492. IEEE (2013)
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)
Deng, K., Sun, Y., Mehta, P.G., Meyn, S.P.: An information-theoretic framework to aggregate a Markov chain. In: American Control Conference, 2009. ACC 2009, pp. 731–736. IEEE (2009)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Ghosh, K., Mills, J., Dorr, J.: Phylogenetic-inspired probabilistic model abstraction in detection of malware families. In: 2017 AAAI Fall Symposium Series (2017)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Goldberg, L.A., Goldberg, P.W., Phillips, C.A., Sorkin, G.B.: Constructing computer virus phylogenies. J. Algorithms 26(1), 188–208 (1998)
Haq, I., Chica, S., Caballero, J., Jha, S.: Malware lineage in the wild. Comput. Secur. 78, 347–363 (2018)
Hayes, M., Walenstein, A., Lakhotia, A.: Evaluation of malware phylogeny modelling systems using automated variant generation. J. Comput. Virol. 5(4), 335–343 (2009)
Idika, N., Mathur, A.P.: A Survey of Malware Detection Techniques, vol. 48. Purdue University (2007)
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: Fast, scalable malware triage. Cylab, Carnegie Mellon University, Pittsburgh, PA, Technical report CMU-Cylab-10, vol. 22 (2010)
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as Part of the 22nd USENIX Security Symposium (USENIX Security 13), pp. 81–96 (2013)
Jordaney, R., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: Misleading metrics: on evaluating machine learning for malware with confidence. Technical report (2016)
Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)
Khoo, W.M., Lió, P.: Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families. In: SysSec Workshop (SysSec), 2011 First, pp. 3–10. IEEE (2011)
Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 11(6), 659101 (2015)
Kim, H.M., Song, H.M., Seo, J.W., Kim, H.K.: Andro-simnet: android malware family classification using social network analysis. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–8. IEEE (2018)
Kim, H., Khoo, W.M., Liò, P.: Polymorphic attacks against sequence-based software birthmarks. In: 2nd ACM SIGPLAN Workshop on Software Security and Protection (2012)
Kleinberg, J.M.: Hubs, authorities, and communities. ACM Comput, Surv. (CSUR) 31(4es), 5 (1999)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lakhotia, A., Notani, V., LeDoux, C.: Malware economics and its implication to anti-malware situational awareness. In: 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), pp. 1–8. IEEE (2018)
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)
Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor. 37(1), 145–151 (1991)
Liu, J., Wang, Y., Wang, Y.: Inferring phylogenetic networks of malware families from API sequences. In: 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 14–17 (2016)
Pattanayak, H.S., Verma, H.K., Sangal, A.: Community detection metrics and algorithms in social networks. In: 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), pp. 483–489. IEEE (2019)
Rached, Z., Alajaji, F., Campbell, L.L.: The Kullback-leibler divergence rate between Markov sources. IEEE Trans. Inf. Theor. 50(5), 917–921 (2004)
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Rossow, C., et al.: Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 65–79. IEEE (2012)
Singh, J., Nene, M.J.: A survey on machine learning techniques for intrusion detection systems. Int. J. Adv. Res. Comput. Commun. Eng. 2(11), 4349–4355 (2013)
Sorkin, G.: Grouping related computer viruses into families. In: Proceedings of the IBM Security ITS (1994)
Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM Trans. Priv. Secur. (TOPS) 22(1), 6 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ghosh, K., Mills, J. (2019). Automated Construction of Malware Families. In: Wang, G., Feng, J., Bhuiyan, M., Lu, R. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2019. Lecture Notes in Computer Science(), vol 11611. Springer, Cham. https://doi.org/10.1007/978-3-030-24907-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-24907-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24906-9
Online ISBN: 978-3-030-24907-6
eBook Packages: Computer ScienceComputer Science (R0)