ABSTRACT
The security community today is increasingly using machine learning (ML) for malware detection for its ability to scale to a large number of files and capture patterns that are difficult to describe explicitly. However, deploying an ML-based malware detector is challenging in practice due to concept drift. As the behaviors of malware and goodware constantly evolve, the shift in their data distribution often leads to serious errors in the deployed detectors. In addition, such dynamic evolvement further adds to the pressure of labeling new malware variants for model updating, which is already an expensive process. In this talk, I will introduce our recent exploration of the challenges introduced by malware concept drift and the potential solutions. I will first discuss the problem of detecting drifting samples to proactively inform ML detectors when not to make decisions. We explore the idea of self-supervision for drift detection and design the corresponding explanation methods to make sense of the detected concept drift. Second, to facilitate malware labeling and model updating, I will share our recent results from combining cheap unsupervised methods with the existing limited/biased labels to generate higher-quality labels. Finally, I will discuss the emerging threat of poisoning and backdoor attacks that exploit the dynamic updating process of malware detectors, and potential directions to robustify this process.
Index Terms
- Open Challenges of Malware Detection under Concept Drift
Recommendations
Application of Anomaly Detection Models to Malware Detection in the Presence of Concept Drift
Hybrid Artificial Intelligent SystemsAbstractMachine learning is one of the main approaches to malware detection in the literature, since machine learning models are more adaptive than signature based solutions. One of the main challenges in the application of machine learning to malware ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Android malware concept drift using system calls: Detection, characterization and challenges
Highlights- Demonstrates the existence of concept drift issues in Android malware detection.
AbstractThe majority of Android malware detection solutions have focused on the achievement of high performance in old and short snapshots of historical data, which makes them prone to lack the generalization and adaptation capabilities needed ...
Comments