Applying Natural Language Processing for detecting malicious patterns in Android applications
Section snippets
Introduction and motivation
Due to the ubiquitous nature of mobile phones, recently we have seen a dramatic increase in mobile malware. Android being the most popular mobile phone operating system (OS), is host to most of these malicious apps. Mobile malware programs increased by 24 million from 2018 to 2019 (McAfee Mobile Threat Report, 2020), and in 2019 companies spent on average 2.4 million USD defending against malware (The ultimate list of cybe, 2019). We need to develop methods to defend and minimize these attacks
Related work
In this section, we briefly highlight recent research works that have applied NLP techniques for detecting malicious patterns in Android and Windows apps.
Overview of the system
The system proposed in this paper after converting an APK into a MAIL program generates the CFG of each function in the MAIL program. Each CFG contains either a single or multiple execution paths of the MAIL function. We extract these paths from each CFG, and call them MAIL CFG Paths, i.e., all the CFG paths in a MAIL program. This process of extracting each of these paths can be compared with extracting sentences from a natural language. We then, build a similarity index with these MAIL CFG
Empirical evaluation
We carried out an empirical study to evaluate and validate the performance of our proposed model. All the experiments were carried out on a desktop PC running Windows 8.1 equipped with an Intel Core(TM) i-7-4510U @ 2 GHz with 8 GB of RAM. In this section, we present the dataset, evaluation metrics, threshold computation, validation experiments, obtained results, and analysis (comparison with other works and limitations).
Conclusion
Modern NLP techniques have greatly improved and are used in practice for accomplishing various tasks, such as machine translation, summarization of larger texts, and question-answering, etc. In this paper, we have exploited this fact and applied NLP techniques to build a similarity index model of MAIL CFG Paths, that is used to find malicious patterns in Android apps. We have demonstrated through experiments that our model outperforms many other such models. Our proposed model, when tested with
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
References (37)
- et al.
DroidNative: automating and optimizing detection of android native code malware variants
Comput. Secur.
(2017) An introduction to ROC analysis
Pattern Recogn. Lett.
(2006)Comparison of the predicted and observed secondary structure of t4 phage lysozyme
Biochim. Biophys. Acta Protein Struct.
(1975)- et al.
A heuristics approach to mine behavioural data logs in mobile malware detection system
Data Knowl. Eng.
(2018) - et al.
Compilers: Principles, Techniques, and Tools
(2006) - et al.
MAIL: malware analysis intermediate language - a step towards automating and optimizing malware detection
- et al.
Malware detection using assembly code and control flow graph optimization
- et al.
N-gram-based text categorization
- et al.
WIRE – a formal intermediate language for binary analysis
- et al.
The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation
BMC Genom.
(2020)
Semantics-aware malware detection
The χ2 test of goodness of fit
Ann. Math. Stat.
Improving information-retrieval with latent semantic indexing
REIL: a platform-independent intermediate representation of disassembled code for static code analysis
Distributional structure
Word
Detecting unknown malware from ascii strings with natural language processing techniques
Design: dynamic fingerprinting for the automatic detection of android malware
Cited by (5)
Detection approaches for android malware: Taxonomy and review analysis
2024, Expert Systems with ApplicationsInterpol review of digital evidence for 2019–2022
2023, Forensic Science International: SynergyDetection of Harassment Toward Women in Twitter During Pandemic Based on Machine Learning
2024, International Journal of Advanced Computer Science and ApplicationsA Case Study for Declarative Pattern Mining in Digital Forensics
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Analysing Android Apps Classification and Categories Validation by Using Latent Dirichlet Allocation
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)