ABSTRACT
Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.
- Y. Bengio, O. Delalleau, and N. L. Roux. Label Propagation and Quadratic Criterion, pages 193--216. MIT Press, 2006.Google Scholar
- J. Caballero, C. Grier, C. Kreibich, and V. Paxson. Measuring pay-per-install: The commoditization of malware distribution. In USENIX Conference on Security, 2011. Google ScholarDigital Library
- M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv., 44(2), 2012. Google ScholarDigital Library
- L. Invernizzi and P. M. Comparetti. Evilseed: A guided approach to finding malicious web pages. In IEEE Security and Privacy, 2012. Google ScholarDigital Library
- L. Invernizzi, S. Miskovic, R. Torres, C. Kruegel, S. Saha, G. Vigna, S. Lee, and M. Mellia. NAZCA: Detecting malware distribution in large-scale networks. In NDSS, 2014.Google ScholarCross Ref
- J. Jang, D. Brumley, and S. Venkataraman. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In ACM CCS, pages 309--320, 2011. Google ScholarDigital Library
- A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In USENIX Security, 2013. Google ScholarDigital Library
- D. Kirat, G. Vigna, and C. Kruegel. Barecloud: bare-metal analysis-based evasive malware detection. In USENIX Security, 2014. Google ScholarDigital Library
- B. J. Kwon, J. Mondal, J. Jang, L. Bilge, and T. Dumitras. The dropper effect: Insights into malware distribution with downloader graph analytics. In ACM CCS, pages 1118--1129, 2015. Google ScholarDigital Library
- Z. Li, S. Alrwais, Y. Xie, F. Yu, and X. Wang. Finding the linchpins of the dark web: A study on topologically dedicated hosts on malicious web infrastructures. In IEEE Security and Privacy, 2013. Google ScholarDigital Library
- J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying suspicious urls: An application of large-scale online learning. In ICML, 2009. Google ScholarDigital Library
- T. Nelms, R. Perdisci, M. Antonakakis, and M. Ahamad. Webwitness: Investigating, categorizing, and mitigating malware download paths. In USENIX Security, 2015. Google ScholarDigital Library
- J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. NIPS, 10(3):61--74, 1999.Google Scholar
- C. Rossow, C. Dietrich, and H. Bos. Large-scale analysis of malware downloaders. In DIMVA. 2013. Google ScholarDigital Library
- D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security. 2008. Google ScholarDigital Library
- A. Tamersoy, K. Roundy, and D. H. Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In SIGKDD, 2014. Google ScholarDigital Library
- Y. Yamaguchi, C. Faloutsos, and H. Kitagawa. Socnl: Bayesian label propagation with confidence. In PAKDD, 2015.Google ScholarCross Ref
- J. Zhang, C. Seifert, J. W. Stokes, and W. Lee. Arrow: Generating signatures to detect drive-by downloads. In WWW, 2011. Google ScholarDigital Library
- X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarDigital Library
Index Terms
- Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph
Recommendations
A Survey on Malware Detection Using Data Mining Techniques
In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, ...
Opcode sequences as representation of executables for data-mining-based unknown malware detection
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
A framework for metamorphic malware analysis and real-time detection
Metamorphism is a technique that mutates the binary code using different obfuscations. It is difficult to write a new metamorphic malware and in general malware writers reuse old malware. To hide detection the malware writers change the obfuscations (...
Comments