skip to main content
10.1145/2983323.2983700acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

Authors Info & Claims
Published:24 October 2016Publication History

ABSTRACT

Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

References

  1. Y. Bengio, O. Delalleau, and N. L. Roux. Label Propagation and Quadratic Criterion, pages 193--216. MIT Press, 2006.Google ScholarGoogle Scholar
  2. J. Caballero, C. Grier, C. Kreibich, and V. Paxson. Measuring pay-per-install: The commoditization of malware distribution. In USENIX Conference on Security, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv., 44(2), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Invernizzi and P. M. Comparetti. Evilseed: A guided approach to finding malicious web pages. In IEEE Security and Privacy, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Invernizzi, S. Miskovic, R. Torres, C. Kruegel, S. Saha, G. Vigna, S. Lee, and M. Mellia. NAZCA: Detecting malware distribution in large-scale networks. In NDSS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Jang, D. Brumley, and S. Venkataraman. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In ACM CCS, pages 309--320, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In USENIX Security, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Kirat, G. Vigna, and C. Kruegel. Barecloud: bare-metal analysis-based evasive malware detection. In USENIX Security, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. J. Kwon, J. Mondal, J. Jang, L. Bilge, and T. Dumitras. The dropper effect: Insights into malware distribution with downloader graph analytics. In ACM CCS, pages 1118--1129, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Li, S. Alrwais, Y. Xie, F. Yu, and X. Wang. Finding the linchpins of the dark web: A study on topologically dedicated hosts on malicious web infrastructures. In IEEE Security and Privacy, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying suspicious urls: An application of large-scale online learning. In ICML, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Nelms, R. Perdisci, M. Antonakakis, and M. Ahamad. Webwitness: Investigating, categorizing, and mitigating malware download paths. In USENIX Security, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. NIPS, 10(3):61--74, 1999.Google ScholarGoogle Scholar
  14. C. Rossow, C. Dietrich, and H. Bos. Large-scale analysis of malware downloaders. In DIMVA. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Tamersoy, K. Roundy, and D. H. Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In SIGKDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Yamaguchi, C. Faloutsos, and H. Kitagawa. Socnl: Bayesian label propagation with confidence. In PAKDD, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Zhang, C. Seifert, J. W. Stokes, and W. Lee. Arrow: Generating signatures to detect drive-by downloads. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader