skip to main content
10.1145/3427228.3427273acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

Spotlight: Malware Lead Generation at Scale

Published: 08 December 2020 Publication History

Abstract

Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware and trojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model. The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present Spotlight, a large-scale malware lead-generation framework. Spotlight first sifts through a large malware data set to remove known malware families, based on first and third-party threat intelligence. It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact.
We evaluate Spotlight on 67M malware samples, to show that it can produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work. To showcase Spotlight’s effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using Spotlight’s output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud.

References

[1]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).
[2]
Av-Test. [n.d.]. Malware Statistics and Trends. https://www.av-test.org/en/statistics/malware.
[3]
Michael Bailey, Jon Oberheide, Jon Andersen, Z Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated classification and analysis of internet malware. In International Workshop on Recent Advances in Intrusion Detection.
[4]
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, behavior-based malware clustering. In NDSS.
[5]
VMWare Carbon Black. [n.d.]. Threat Hunting. https://www.carbonblack.com/products/solutions/use-case/threat-hunting/.
[6]
Crowdstrike. [n.d.]. Threat Hunting. https://www.crowdstrike.com/epp-101/threat-hunting/.
[7]
George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware classification using random projections and neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8]
Chuvakin DarkReading, Anton. [n.d.]. Threat Hunting Is Not for Everyone. https://www.darkreading.com/threat-intelligence/threat-hunting-is-not-for-everyone/a/d-id/1336877.
[9]
SANS Institute David Szili. [n.d.]. Building and Maturing Your Threat Hunting Program. https://www.sans.org/media/analyst-program/building-maturing-threat-hunting-program-39025.pdf.
[10]
Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi, Elaine Shi, Davide Balzarotti, Marten van Dijk, Michael Bailey, Srinivas Devadas, Mingyan Liu, 2015. Needles in a haystack: Mining information from public dynamic analysis sandboxes for malware intelligence. In 24th {USENIX} Security Symposium ({USENIX} Security 15).
[11]
Chuvakin HelpNet Security, Anton. [n.d.]. What hinders successful threat hunting?https://www.helpnetsecurity.com/2020/05/26/successful-threat-hunting/.
[12]
Xin Hu, Kang G Shin, Sandeep Bhatkar, and Kent Griffin. 2013. Mutantx-s: Scalable malware clustering based on static features. In USENIX Annual Technical Conference (USENIX).
[13]
Wenyi Huang and Jack W. Stokes. 2016. MtNet: A Multi-Task Neural Network for Dynamic Malware Classification. In Detection of Intrusions and Malware, and Vulnerability Assessment.
[14]
Federico Maggi, Andrea Bellini, Guido Salvaneschi, and Stefano Zanero. 2011. Finding non-trivial malware naming inconsistencies. In International Conference on Information Systems Security. Springer, 144–159.
[15]
Benjamin Moseley and Joshua Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. In Advances in Neural Information Processing Systems.
[16]
Daniel Müllner. 2011. Modern hierarchical, agglomerative clustering algorithms. In arXiv preprint.
[17]
Daniel Plohmann and Steffen Enders. [n.d.]. Malpedia. https://malpedia.caad.fkie.fraunhofer.de/.
[18]
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
[19]
Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses.
[20]
Michael Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, and Matt Wolff. 2016. Projecting” better than randomly”: How to reduce the dimensionality of very large data sets in a way that outperforms random projections. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
[21]
Yunan Zhang, Chenghao Rong, Qingjia Huang, Yang Wu, Zeming Yang, and Jianguo Jiang. 2017. Based on multi-features and clustering ensemble method for automatic malware categorization. In 2017 IEEE Trustcom/BigDataSE/ICESS.

Cited By

View all
  • (2023)Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised AccessProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3605047(1-12)Online publication date: 29-Aug-2023
  • (2023)PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer DetectionProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616625(3078-3092)Online publication date: 15-Nov-2023
  • (2023)A Deep Dive into the VirusTotal File FeedDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_8(155-176)Online publication date: 12-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference
December 2020
962 pages
ISBN:9781450388580
DOI:10.1145/3427228
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Check for updates

Author Tags

  1. Malware classification
  2. Malware clustering
  3. Malware hunting
  4. Malware prioritization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACSAC '20

Acceptance Rates

Overall Acceptance Rate 104 of 497 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised AccessProceedings of the 18th International Conference on Availability, Reliability and Security10.1145/3600160.3605047(1-12)Online publication date: 29-Aug-2023
  • (2023)PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer DetectionProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616625(3078-3092)Online publication date: 15-Nov-2023
  • (2023)A Deep Dive into the VirusTotal File FeedDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_8(155-176)Online publication date: 12-Jul-2023
  • (2022)Meta-modelling for Ecosystems SecuritySimulation Tools and Techniques10.1007/978-3-030-97124-3_22(259-283)Online publication date: 31-Mar-2022
  • (2021)Scaling Multi-Objective Optimization for Clustering Malware2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9659925(1-8)Online publication date: 5-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media