skip to main content
10.1145/3549206.3549284acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms

Authors Info & Claims
Published:24 October 2022Publication History

ABSTRACT

Vulnerabilities in various software products can be used to attack the security systems in any organization anywhere. Malware is downloaded after a click on the hyperlink by the unsuspecting user and used as the exploitation tool for the vulnerabilities in systems for attacks. Detecting a large number of malware effectively can be possible by machine learning. However, Machine learning based systems have misclassification as false positives and false negatives. Novelty in this paper is to improve the efficiency and robustness of ensemble bagging algorithm Extra tree to detect malware effectively and robustly by explainable machine learning. The paper uses waterfall plots based on Shapley value to detect the trends in features for misclassification. The trends in the five topmost features for misclassification are used to make inductive rules. The inductive rules are applied to overcome misclassification and enhance the performance of bagging algorithms. The inductive rules can be applied to effectively detect unknown future malware known as zero-day malware preventing the attack on security systems. The accuracy for the Extra tree bagging algorithm is 98.1% for future unknown malware. Considering, that the misclassified samples are also detected by the inductive rules the accuracy is 100%. Heatmap based on Shapley value of features confirms the topmost features for all the misclassified samples in the dataset and strengthens the inductive rule.

References

  1. Rajesh Kumar and S. Geetha. 2022. Effective Malware Detection using Shapely Boosting Algorithm. Int J Adv Comput Sci Appl 13, 1 (2022), 101–111. DOI:https://doi.org/10.14569/IJACSA.2022.0130113Google ScholarGoogle Scholar
  2. Dennis Dang, Fabio Di Troia, and Mark Stamp. 2021. Malware classification using long short-term memory models. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 743–752. DOI:https://doi.org/10.5220/0010378007430752Google ScholarGoogle ScholarCross RefCross Ref
  3. Zhiguo Chen, Xiaorui Zhang, and Sungryul Kim. 2021. A Learning-based Static Malware Detection System with Integrated Feature. (2021). DOI:https://doi.org/10.32604/iasc.2021.016933Google ScholarGoogle Scholar
  4. Mingdong Tang and Quan Qian. 2019. Dynamic API call sequence visualisation for malware classification. IET Inf Secur 13, 4 (2019), 367–377. DOI:https://doi.org/10.1049/iet-ifs.2018.5268Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hyrum S. Anderson and Phil Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. (2018).Google ScholarGoogle Scholar
  6. Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean, and Charles Nicholas. 2018. An investigation of byte n-gram features for malware classification. J Comput Virol Hacking Tech 14, 1 (2018). DOI:https://doi.org/10.1007/s11416-016-0283-1Google ScholarGoogle Scholar
  7. Jeyaprakash Hemalatha, S. Abijah Roseline, Subbiah Geetha, Seifedine Kadry, and Robertas Damaševičius. 2021. An efficient densenet‐based deep learning model for Malware detection. Entropy 23, 3 (2021). DOI:https://doi.org/10.3390/e23030344Google ScholarGoogle Scholar
  8. Maryam Al-Janabi and Ahmad Mousa Altamimi. 2020. A comparative analysis of machine learning techniques for classification and detection of malware. Proc - 2020 21st Int Arab Conf Inf Technol ACIT 2020 (2020). DOI:https://doi.org/10.1109/ACIT50332.2020.9300081Google ScholarGoogle ScholarCross RefCross Ref
  9. Binayak Panda and Satya Narayan Tripathy. 2020. Detection of Anomalous In-Memory Process based on DLL Sequence. Int J Adv Comput Sci Appl 11, 10 (2020), 185–194. DOI:https://doi.org/10.14569/IJACSA.2020.0111025Google ScholarGoogle Scholar
  10. Arvind Mahindru and A. L. Sangal. 2020. MLDroid—framework for Android malware detection using machine learning techniques. Springer London. DOI:https://doi.org/10.1007/s00521-020-05309-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lifan Xu, Dongping Zhang, Nuwan Jayasena, and John Cavazos. 2018. HADM: Hybrid Analysis for Detection of Malware. Lect Notes Networks Syst 16, (2018), 702–724. DOI:https://doi.org/10.1007/978-3-319-56991-8_51Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, and Sitalakshmi Venkatraman. 2019. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 7, (2019), 46717–46738. DOI:https://doi.org/10.1109/ACCESS.2019.2906934Google ScholarGoogle ScholarCross RefCross Ref
  13. Aparna Sunil Kale, Fabio Di Troia, and Mark Stamp. 2021. Malware classification with word embedding features. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 733–742. DOI:https://doi.org/10.5220/0010377907330742Google ScholarGoogle ScholarCross RefCross Ref
  14. Sitalakshmi Venkatraman and Mamoun Alazab. 2018. Use of Data Visualisation for Zero-Day Malware Detection. Secur Commun Networks 2018, (2018). DOI:https://doi.org/10.1155/2018/1728303Google ScholarGoogle ScholarCross RefCross Ref
  15. Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google ScholarGoogle Scholar
  16. Navid Kardan and Kenneth O. Stanley. 2016. Fitted Learning: Models with Awareness of their Limits. (2016).Google ScholarGoogle Scholar
  17. Richard Harang and Felipe N Ducau. 2018. Measuring the speed of the Red Queen's Race. (2018).Google ScholarGoogle Scholar
  18. Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2, 1 (2020), 56–67. DOI:https://doi.org/10.1038/s42256-019-0138-9Google ScholarGoogle ScholarCross RefCross Ref
  19. Edward Raff, Jared Sylvester, and Charles Nicholas. 2017. Learning the PE header, malware detection with minimal domain knowledge. arXiv (2017), 121–132. DOI:https://doi.org/10.1145/3128572.3140442Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, and Muddassar Farooq. 2009. PE-miner: Mining structural information to detect malicious executables in realtime. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 5758 LNCS, (2009), 121–141. DOI:https://doi.org/10.1007/978-3-642-04342-0_7Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rajesh Kumar and Geetha. S. 2020. Malware classification using XGboost-Gradient boosted decision tree. Adv Sci Technol Eng Syst 5, 5 (September 2020), 536–549. DOI:https://doi.org/10.25046/AJ050566Google ScholarGoogle Scholar
  22. Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google ScholarGoogle Scholar
  23. Wookhyun Jung and Sangwon Kim. 2015. Poster: Deep Learning for Zero-day Flash Malware Detection. Proc IEEE Symp Secur Priv (2015), 2–3.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    IC3-2022: Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing
    August 2022
    710 pages
    ISBN:9781450396752
    DOI:10.1145/3549206

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 October 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)7

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format