research-article

Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms

Authors:
Rajesh Kumar

School of Computer Science Engineering, Vellore Institute of Technology, India

School of Computer Science Engineering, Vellore Institute of Technology, India
View Profile

,
Geetha Subbiah

School of Computer Science Engineering, Vellore Institute of Technology, India

School of Computer Science Engineering, Vellore Institute of Technology, India
View Profile

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary ComputingAugust 2022Pages 453–460https://doi.org/10.1145/3549206.3549284

Published:24 October 2022Publication History

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

Pages 453–460

ABSTRACT

Vulnerabilities in various software products can be used to attack the security systems in any organization anywhere. Malware is downloaded after a click on the hyperlink by the unsuspecting user and used as the exploitation tool for the vulnerabilities in systems for attacks. Detecting a large number of malware effectively can be possible by machine learning. However, Machine learning based systems have misclassification as false positives and false negatives. Novelty in this paper is to improve the efficiency and robustness of ensemble bagging algorithm Extra tree to detect malware effectively and robustly by explainable machine learning. The paper uses waterfall plots based on Shapley value to detect the trends in features for misclassification. The trends in the five topmost features for misclassification are used to make inductive rules. The inductive rules are applied to overcome misclassification and enhance the performance of bagging algorithms. The inductive rules can be applied to effectively detect unknown future malware known as zero-day malware preventing the attack on security systems. The accuracy for the Extra tree bagging algorithm is 98.1% for future unknown malware. Considering, that the misclassified samples are also detected by the inductive rules the accuracy is 100%. Heatmap based on Shapley value of features confirms the topmost features for all the misclassified samples in the dataset and strengthens the inductive rule.

References

Rajesh Kumar and S. Geetha. 2022. Effective Malware Detection using Shapely Boosting Algorithm. Int J Adv Comput Sci Appl 13, 1 (2022), 101–111. DOI:https://doi.org/10.14569/IJACSA.2022.0130113Google Scholar
Dennis Dang, Fabio Di Troia, and Mark Stamp. 2021. Malware classification using long short-term memory models. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 743–752. DOI:https://doi.org/10.5220/0010378007430752Google ScholarCross Ref
Zhiguo Chen, Xiaorui Zhang, and Sungryul Kim. 2021. A Learning-based Static Malware Detection System with Integrated Feature. (2021). DOI:https://doi.org/10.32604/iasc.2021.016933Google Scholar
Mingdong Tang and Quan Qian. 2019. Dynamic API call sequence visualisation for malware classification. IET Inf Secur 13, 4 (2019), 367–377. DOI:https://doi.org/10.1049/iet-ifs.2018.5268Google ScholarDigital Library
Hyrum S. Anderson and Phil Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. (2018).Google Scholar
Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean, and Charles Nicholas. 2018. An investigation of byte n-gram features for malware classification. J Comput Virol Hacking Tech 14, 1 (2018). DOI:https://doi.org/10.1007/s11416-016-0283-1Google Scholar
Jeyaprakash Hemalatha, S. Abijah Roseline, Subbiah Geetha, Seifedine Kadry, and Robertas Damaševičius. 2021. An efficient densenet‐based deep learning model for Malware detection. Entropy 23, 3 (2021). DOI:https://doi.org/10.3390/e23030344Google Scholar
Maryam Al-Janabi and Ahmad Mousa Altamimi. 2020. A comparative analysis of machine learning techniques for classification and detection of malware. Proc - 2020 21st Int Arab Conf Inf Technol ACIT 2020 (2020). DOI:https://doi.org/10.1109/ACIT50332.2020.9300081Google ScholarCross Ref
Binayak Panda and Satya Narayan Tripathy. 2020. Detection of Anomalous In-Memory Process based on DLL Sequence. Int J Adv Comput Sci Appl 11, 10 (2020), 185–194. DOI:https://doi.org/10.14569/IJACSA.2020.0111025Google Scholar
Arvind Mahindru and A. L. Sangal. 2020. MLDroid—framework for Android malware detection using machine learning techniques. Springer London. DOI:https://doi.org/10.1007/s00521-020-05309-4Google ScholarDigital Library
Lifan Xu, Dongping Zhang, Nuwan Jayasena, and John Cavazos. 2018. HADM: Hybrid Analysis for Detection of Malware. Lect Notes Networks Syst 16, (2018), 702–724. DOI:https://doi.org/10.1007/978-3-319-56991-8_51Google ScholarCross Ref
R. Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, and Sitalakshmi Venkatraman. 2019. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 7, (2019), 46717–46738. DOI:https://doi.org/10.1109/ACCESS.2019.2906934Google ScholarCross Ref
Aparna Sunil Kale, Fabio Di Troia, and Mark Stamp. 2021. Malware classification with word embedding features. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 733–742. DOI:https://doi.org/10.5220/0010377907330742Google ScholarCross Ref
Sitalakshmi Venkatraman and Mamoun Alazab. 2018. Use of Data Visualisation for Zero-Day Malware Detection. Secur Commun Networks 2018, (2018). DOI:https://doi.org/10.1155/2018/1728303Google ScholarCross Ref
Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google Scholar
Navid Kardan and Kenneth O. Stanley. 2016. Fitted Learning: Models with Awareness of their Limits. (2016).Google Scholar
Richard Harang and Felipe N Ducau. 2018. Measuring the speed of the Red Queen's Race. (2018).Google Scholar
Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2, 1 (2020), 56–67. DOI:https://doi.org/10.1038/s42256-019-0138-9Google ScholarCross Ref
Edward Raff, Jared Sylvester, and Charles Nicholas. 2017. Learning the PE header, malware detection with minimal domain knowledge. arXiv (2017), 121–132. DOI:https://doi.org/10.1145/3128572.3140442Google ScholarDigital Library
M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, and Muddassar Farooq. 2009. PE-miner: Mining structural information to detect malicious executables in realtime. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 5758 LNCS, (2009), 121–141. DOI:https://doi.org/10.1007/978-3-642-04342-0_7Google ScholarDigital Library
Rajesh Kumar and Geetha. S. 2020. Malware classification using XGboost-Gradient boosted decision tree. Adv Sci Technol Eng Syst 5, 5 (September 2020), 536–549. DOI:https://doi.org/10.25046/AJ050566Google Scholar
Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google Scholar
Wookhyun Jung and Sangwon Kim. 2015. Poster: Deep Learning for Zero-day Flash Malware Detection. Proc IEEE Symp Secur Priv (2015), 2–3.Google Scholar

Recommendations

Intrusion Detection System Using Bagging Ensemble Method of Machine Learning
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and Automation

Intrusion detection system is widely used to protect and reduce damage to information system. It protects virtual and physical computer networks against threats and vulnerabilities. Presently, machine learning techniques are widely extended to implement ...
Read More
Zero-Day Malware Classification and Detection Using Machine Learning
Abstract
A zero-day vulnerability is a weakness of the computer software and hardware that has yet to be discovered by people who might be interested in fixing it. Hackers may use these vulnerabilities to harm computer programs, data, other systems, or a ...
Read More
A novel malware analysis for malware detection and classification using machine learning algorithms
SIN '17: Proceedings of the 10th International Conference on Security of Information and Networks

Nowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing
August 2022
710 pages
ISBN:9781450396752
DOI:10.1145/3549206
General Chairs:
Sartaj Sahni
University of Florida, USA
,
Vikas Saxena
JIIT Noida, India
,
Program Chair:
Sundaraja Sitharama Iyengar
Florida International University, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bagging Algorithm
Cyber Security
Machine Learning
Malware Detection
Zero-day Malware
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 101
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

ABSTRACT

References

Cited By

Recommendations

Intrusion Detection System Using Bagging Ensemble Method of Machine Learning

Zero-Day Malware Classification and Detection Using Machine Learning

A novel malware analysis for malware detection and classification using machine learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms

IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

ABSTRACT

References

Cited By

Recommendations

Intrusion Detection System Using Bagging Ensemble Method of Machine Learning

Zero-Day Malware Classification and Detection Using Machine Learning

A novel malware analysis for malware detection and classification using machine learning algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media