Authors:
Miles Q. Li
1
;
Benjamin C. M. Fung
2
;
Philippe Charland
3
and
Steven H. H. Ding
4
Affiliations:
1
School of Computer Science, McGill University, Montreal, Canada
;
2
School of Information Studies, McGill University, Montreal, Canada
;
3
Mission Critical Cyber Security Section, Defence R&D Canada, Quebec, Canada
;
4
School of Computing, Queen’s University, Kingston, Canada
Keyword(s):
Cybersecurity, Malware Classification, Reverse Engineering, Clustering.
Abstract:
Malicious executables are comprised of functions that can be represented in assembly code. In the assembly code mining literature, many software reverse engineering tools have been created to disassemble executables, search function clones, and find vulnerabilities, among others. The development of a machine learning-based malware classification model that can simultaneously achieve excellent classification performance and provide insightful interpretation for the classification results remains to be a hot research topic. In this paper, we propose a novel and dedicated machine learning model for the research problem of malware classification. Our proposed model generates assembly code function clusters based on function representation learning and provides excellent interpretability for the classification results. It does not require a large or balanced dataset to train which meets the situation of real-life scenarios. Experiments show that our proposed approach outperforms previous
state-of-the-art malware classification models and provides meaningful interpretation of classification results.
(More)