Automated Construction of Malware Families

Ghosh, Krishnendu; Mills, Jeffery

doi:10.1007/978-3-030-24907-6_35

Automated Construction of Malware Families

Krishnendu Ghosh¹² &
Jeffery Mills¹³

Conference paper
First Online: 11 July 2019

1506 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11611))

Abstract

Discovery of malware families from behavioral characteristics of a set of malware traces is an important step in the detection of malware. Malware in the wild often occur as variants of each other. In this work, a data dependent formalism is described for the construction of malware families from trace data. The malware families are represented in an edge labeled graph where the nodes represent a malware trace and edges describe relationship between the malware traces. The edge labels contain a numerical value representing similarity between the malware traces. Network theoretical concepts such as hubs are evaluated on the edge labeled graph. The formalism has been elucidated by the experiments performed on multiple data sets of malware traces.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Csmining group. http://csmining.org/index.php/malicious-software-datasets-.html. Accessed 5 May 2017
Anderson, B., Lane, T., Hash, C.: Malware phylogenetics based on the multiview graphical lasso. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 1–12. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_1
Chapter Google Scholar
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)
Google Scholar
Carrera, E., Erdélyi, G.: Digital genome mapping-advanced binary malware analysis. In: Virus Bulletin Conference, vol. 11 (2004)
Google Scholar
Christodorescu, M., Jha, S., Seshia, S.A., Song, D., Bryant, R.E.: Semantics-aware malware detection. In: 2005 IEEE Symposium on Security and Privacy, pp. 32–46. IEEE (2005)
Google Scholar
Creech, G., Hu, J.: Generation of a new IDS test dataset: time to retire the KDD collection. In: Wireless Communications and Networking Conference (WCNC), 2013 IEEE, pp. 4487–4492. IEEE (2013)
Google Scholar
Creech, G., Hu, J.: A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans. Comput. 63(4), 807–819 (2014)
Article MathSciNet Google Scholar
Deng, K., Sun, Y., Mehta, P.G., Meyn, S.P.: An information-theoretic framework to aggregate a Markov chain. In: American Control Conference, 2009. ACC 2009, pp. 731–736. IEEE (2009)
Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Article MathSciNet Google Scholar
Ghosh, K., Mills, J., Dorr, J.: Phylogenetic-inspired probabilistic model abstraction in detection of malware families. In: 2017 AAAI Fall Symposium Series (2017)
Google Scholar
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet Google Scholar
Goldberg, L.A., Goldberg, P.W., Phillips, C.A., Sorkin, G.B.: Constructing computer virus phylogenies. J. Algorithms 26(1), 188–208 (1998)
Article MathSciNet Google Scholar
Haq, I., Chica, S., Caballero, J., Jha, S.: Malware lineage in the wild. Comput. Secur. 78, 347–363 (2018)
Article Google Scholar
Hayes, M., Walenstein, A., Lakhotia, A.: Evaluation of malware phylogeny modelling systems using automated variant generation. J. Comput. Virol. 5(4), 335–343 (2009)
Article Google Scholar
Idika, N., Mathur, A.P.: A Survey of Malware Detection Techniques, vol. 48. Purdue University (2007)
Google Scholar
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: Fast, scalable malware triage. Cylab, Carnegie Mellon University, Pittsburgh, PA, Technical report CMU-Cylab-10, vol. 22 (2010)
Google Scholar
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as Part of the 22nd USENIX Security Symposium (USENIX Security 13), pp. 81–96 (2013)
Google Scholar
Jordaney, R., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: Misleading metrics: on evaluating machine learning for malware with confidence. Technical report (2016)
Google Scholar
Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)
Article Google Scholar
Khoo, W.M., Lió, P.: Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families. In: SysSec Workshop (SysSec), 2011 First, pp. 3–10. IEEE (2011)
Google Scholar
Ki, Y., Kim, E., Kim, H.K.: A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 11(6), 659101 (2015)
Article Google Scholar
Kim, H.M., Song, H.M., Seo, J.W., Kim, H.K.: Andro-simnet: android malware family classification using social network analysis. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–8. IEEE (2018)
Google Scholar
Kim, H., Khoo, W.M., Liò, P.: Polymorphic attacks against sequence-based software birthmarks. In: 2nd ACM SIGPLAN Workshop on Software Security and Protection (2012)
Google Scholar
Kleinberg, J.M.: Hubs, authorities, and communities. ACM Comput, Surv. (CSUR) 31(4es), 5 (1999)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Lakhotia, A., Notani, V., LeDoux, C.: Malware economics and its implication to anti-malware situational awareness. In: 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), pp. 1–8. IEEE (2018)
Google Scholar
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)
Article Google Scholar
Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13
Chapter Google Scholar
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor. 37(1), 145–151 (1991)
Article MathSciNet Google Scholar
Liu, J., Wang, Y., Wang, Y.: Inferring phylogenetic networks of malware families from API sequences. In: 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 14–17 (2016)
Google Scholar
Pattanayak, H.S., Verma, H.K., Sangal, A.: Community detection metrics and algorithms in social networks. In: 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), pp. 483–489. IEEE (2019)
Google Scholar
Rached, Z., Alajaji, F., Campbell, L.L.: The Kullback-leibler divergence rate between Markov sources. IEEE Trans. Inf. Theor. 50(5), 917–921 (2004)
Article MathSciNet Google Scholar
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70542-0_6
Chapter Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Article Google Scholar
Rossow, C., et al.: Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 65–79. IEEE (2012)
Google Scholar
Singh, J., Nene, M.J.: A survey on machine learning techniques for intrusion detection systems. Int. J. Adv. Res. Comput. Commun. Eng. 2(11), 4349–4355 (2013)
Google Scholar
Sorkin, G.: Grouping related computer viruses into families. In: Proceedings of the IBM Security ITS (1994)
Google Scholar
Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM Trans. Priv. Secur. (TOPS) 22(1), 6 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Charleston, Charleston, SC, 29424, USA
Krishnendu Ghosh
Department of Computer Science, Northern Kentucky University, Highland Heights, KY, 41099, USA
Jeffery Mills

Authors

Krishnendu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Jeffery Mills
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krishnendu Ghosh .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
Huazhong University of Science and Technology, Wuhan, China
Jun Feng
Fordham University, New York City, NY, USA
Md Zakirul Alam Bhuiyan
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, K., Mills, J. (2019). Automated Construction of Malware Families. In: Wang, G., Feng, J., Bhuiyan, M., Lu, R. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2019. Lecture Notes in Computer Science(), vol 11611. Springer, Cham. https://doi.org/10.1007/978-3-030-24907-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-24907-6_35
Published: 11 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24906-9
Online ISBN: 978-3-030-24907-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics