research-article

Spotlight: Malware Lead Generation at Scale

Authors:

Fabian Kaczmarczyck,

Bernhard Grill,

Luca Invernizzi,

Jennifer Pullman,

Cecilia M. Procopiuc,

Elie BurszteinAuthors Info & Claims

ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference

Pages 17 - 27

https://doi.org/10.1145/3427228.3427273

Published: 08 December 2020 Publication History

Abstract

Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware and trojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model. The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present Spotlight, a large-scale malware lead-generation framework. Spotlight first sifts through a large malware data set to remove known malware families, based on first and third-party threat intelligence. It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact.

We evaluate Spotlight on 67M malware samples, to show that it can produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work. To showcase Spotlight’s effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using Spotlight’s output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud.

References

[1]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).

Digital Library

[2]

Av-Test. [n.d.]. Malware Statistics and Trends. https://www.av-test.org/en/statistics/malware.

[3]

Michael Bailey, Jon Oberheide, Jon Andersen, Z Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated classification and analysis of internet malware. In International Workshop on Recent Advances in Intrusion Detection.

[4]

Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, behavior-based malware clustering. In NDSS.

[5]

VMWare Carbon Black. [n.d.]. Threat Hunting. https://www.carbonblack.com/products/solutions/use-case/threat-hunting/.

[6]

Crowdstrike. [n.d.]. Threat Hunting. https://www.crowdstrike.com/epp-101/threat-hunting/.

[7]

George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]

Chuvakin DarkReading, Anton. [n.d.]. Threat Hunting Is Not for Everyone. https://www.darkreading.com/threat-intelligence/threat-hunting-is-not-for-everyone/a/d-id/1336877.

[9]

SANS Institute David Szili. [n.d.]. Building and Maturing Your Threat Hunting Program. https://www.sans.org/media/analyst-program/building-maturing-threat-hunting-program-39025.pdf.

[10]

Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi, Elaine Shi, Davide Balzarotti, Marten van Dijk, Michael Bailey, Srinivas Devadas, Mingyan Liu, 2015. Needles in a haystack: Mining information from public dynamic analysis sandboxes for malware intelligence. In 24th {USENIX} Security Symposium ({USENIX} Security 15).

[11]

Chuvakin HelpNet Security, Anton. [n.d.]. What hinders successful threat hunting?https://www.helpnetsecurity.com/2020/05/26/successful-threat-hunting/.

[12]

Xin Hu, Kang G Shin, Sandeep Bhatkar, and Kent Griffin. 2013. Mutantx-s: Scalable malware clustering based on static features. In USENIX Annual Technical Conference (USENIX).

[13]

Wenyi Huang and Jack W. Stokes. 2016. MtNet: A Multi-Task Neural Network for Dynamic Malware Classification. In Detection of Intrusions and Malware, and Vulnerability Assessment.

[14]

Federico Maggi, Andrea Bellini, Guido Salvaneschi, and Stefano Zanero. 2011. Finding non-trivial malware naming inconsistencies. In International Conference on Information Systems Security. Springer, 144–159.

Digital Library

[15]

Benjamin Moseley and Joshua Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. In Advances in Neural Information Processing Systems.

[16]

Daniel Müllner. 2011. Modern hierarchical, agglomerative clustering algorithms. In arXiv preprint.

[17]

Daniel Plohmann and Steffen Enders. [n.d.]. Malpedia. https://malpedia.caad.fkie.fraunhofer.de/.

[18]

Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

[19]

Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses.

[20]

Michael Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, and Matt Wolff. 2016. Projecting” better than randomly”: How to reduce the dimensionality of very large data sets in a way that outperforms random projections. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[21]

Yunan Zhang, Chenghao Rong, Qingjia Huang, Yang Wu, Zeming Yang, and Jianguo Jiang. 2017. Based on multi-features and clustering ensemble method for automatic malware categorization. In 2017 IEEE Trustcom/BigDataSE/ICESS.

Cited By

Nurmi JNiemelä MBrumley B(2023)Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised AccessProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3605047(1-12)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1145/3600160.3605047
Li SMing JQiu PChen QLiu LBao HWang QJia CMeng WJensen CCremers CKirda E(2023)PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer DetectionProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616625(3078-3092)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3616625
van Liebergen KCaballero JKotzias PGates C(2023)A Deep Dive into the VirusTotal File FeedDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_8(155-176)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-35504-2_8
Show More Cited By

Index Terms

Spotlight: Malware Lead Generation at Scale

Index terms have been assigned to the content through auto-classification.

Recommendations

Scalable fine-grained behavioral clustering of HTTP-based malware

A large number of today's botnets leverage the HTTP protocol to communicate with their botmasters or perpetrate malicious activities. In this paper, we present a new scalable system for network-level behavioral clustering of HTTP-based malware that aims ...
Malware Function Classification Using APIs in Initial Behavior
ASIAJCIS '15: Proceedings of the 2015 10th Asia Joint Conference on Information Security

Malware proliferation has become a serious threat to the Internet in recent years. Most of the current malware are subspecies of existing malware that have been automatically generated by illegal tools. To conduct an efficient analysis of malware, ...
Malware classification method via binary content comparison
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation Symposium

With the wide spread uses of the Internet, the number of Internet attacks keeps increasing, and malware is the main cause of most Internet attacks. Malware is used by attackers to infect normal users' computers and to acquire private information as well ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference

December 2020

962 pages

ISBN:9781450388580

DOI:10.1145/3427228

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACSAC '20

ACSAC '20: Annual Computer Security Applications Conference

December 7 - 11, 2020

Austin, USA

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
371
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nurmi JNiemelä MBrumley B(2023)Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised AccessProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3605047(1-12)Online publication date: 29-Aug-2023
https://dl.acm.org/doi/10.1145/3600160.3605047
Li SMing JQiu PChen QLiu LBao HWang QJia CMeng WJensen CCremers CKirda E(2023)PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer DetectionProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616625(3078-3092)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3616625
van Liebergen KCaballero JKotzias PGates C(2023)A Deep Dive into the VirusTotal File FeedDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_8(155-176)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-35504-2_8
Caulfield TIlau MPym D(2022)Meta-modelling for Ecosystems SecuritySimulation Tools and Techniques10.1007/978-3-030-97124-3_22(259-283)Online publication date: 31-Mar-2022
https://doi.org/10.1007/978-3-030-97124-3_22
MacAskill NWilkins ZZincir-Heywood N(2021)Scaling Multi-Objective Optimization for Clustering Malware2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9659925(1-8)Online publication date: 5-Dec-2021
https://doi.org/10.1109/SSCI50451.2021.9659925

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten