ABSTRACT
Vulnerabilities in various software products can be used to attack the security systems in any organization anywhere. Malware is downloaded after a click on the hyperlink by the unsuspecting user and used as the exploitation tool for the vulnerabilities in systems for attacks. Detecting a large number of malware effectively can be possible by machine learning. However, Machine learning based systems have misclassification as false positives and false negatives. Novelty in this paper is to improve the efficiency and robustness of ensemble bagging algorithm Extra tree to detect malware effectively and robustly by explainable machine learning. The paper uses waterfall plots based on Shapley value to detect the trends in features for misclassification. The trends in the five topmost features for misclassification are used to make inductive rules. The inductive rules are applied to overcome misclassification and enhance the performance of bagging algorithms. The inductive rules can be applied to effectively detect unknown future malware known as zero-day malware preventing the attack on security systems. The accuracy for the Extra tree bagging algorithm is 98.1% for future unknown malware. Considering, that the misclassified samples are also detected by the inductive rules the accuracy is 100%. Heatmap based on Shapley value of features confirms the topmost features for all the misclassified samples in the dataset and strengthens the inductive rule.
- Rajesh Kumar and S. Geetha. 2022. Effective Malware Detection using Shapely Boosting Algorithm. Int J Adv Comput Sci Appl 13, 1 (2022), 101–111. DOI:https://doi.org/10.14569/IJACSA.2022.0130113Google Scholar
- Dennis Dang, Fabio Di Troia, and Mark Stamp. 2021. Malware classification using long short-term memory models. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 743–752. DOI:https://doi.org/10.5220/0010378007430752Google ScholarCross Ref
- Zhiguo Chen, Xiaorui Zhang, and Sungryul Kim. 2021. A Learning-based Static Malware Detection System with Integrated Feature. (2021). DOI:https://doi.org/10.32604/iasc.2021.016933Google Scholar
- Mingdong Tang and Quan Qian. 2019. Dynamic API call sequence visualisation for malware classification. IET Inf Secur 13, 4 (2019), 367–377. DOI:https://doi.org/10.1049/iet-ifs.2018.5268Google ScholarDigital Library
- Hyrum S. Anderson and Phil Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. (2018).Google Scholar
- Edward Raff, Richard Zak, Russell Cox, Jared Sylvester, Paul Yacci, Rebecca Ward, Anna Tracy, Mark McLean, and Charles Nicholas. 2018. An investigation of byte n-gram features for malware classification. J Comput Virol Hacking Tech 14, 1 (2018). DOI:https://doi.org/10.1007/s11416-016-0283-1Google Scholar
- Jeyaprakash Hemalatha, S. Abijah Roseline, Subbiah Geetha, Seifedine Kadry, and Robertas Damaševičius. 2021. An efficient densenet‐based deep learning model for Malware detection. Entropy 23, 3 (2021). DOI:https://doi.org/10.3390/e23030344Google Scholar
- Maryam Al-Janabi and Ahmad Mousa Altamimi. 2020. A comparative analysis of machine learning techniques for classification and detection of malware. Proc - 2020 21st Int Arab Conf Inf Technol ACIT 2020 (2020). DOI:https://doi.org/10.1109/ACIT50332.2020.9300081Google ScholarCross Ref
- Binayak Panda and Satya Narayan Tripathy. 2020. Detection of Anomalous In-Memory Process based on DLL Sequence. Int J Adv Comput Sci Appl 11, 10 (2020), 185–194. DOI:https://doi.org/10.14569/IJACSA.2020.0111025Google Scholar
- Arvind Mahindru and A. L. Sangal. 2020. MLDroid—framework for Android malware detection using machine learning techniques. Springer London. DOI:https://doi.org/10.1007/s00521-020-05309-4Google ScholarDigital Library
- Lifan Xu, Dongping Zhang, Nuwan Jayasena, and John Cavazos. 2018. HADM: Hybrid Analysis for Detection of Malware. Lect Notes Networks Syst 16, (2018), 702–724. DOI:https://doi.org/10.1007/978-3-319-56991-8_51Google ScholarCross Ref
- R. Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, and Sitalakshmi Venkatraman. 2019. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access 7, (2019), 46717–46738. DOI:https://doi.org/10.1109/ACCESS.2019.2906934Google ScholarCross Ref
- Aparna Sunil Kale, Fabio Di Troia, and Mark Stamp. 2021. Malware classification with word embedding features. ICISSP 2021 - Proc 7th Int Conf Inf Syst Secur Priv (2021), 733–742. DOI:https://doi.org/10.5220/0010377907330742Google ScholarCross Ref
- Sitalakshmi Venkatraman and Mamoun Alazab. 2018. Use of Data Visualisation for Zero-Day Malware Detection. Secur Commun Networks 2018, (2018). DOI:https://doi.org/10.1155/2018/1728303Google ScholarCross Ref
- Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google Scholar
- Navid Kardan and Kenneth O. Stanley. 2016. Fitted Learning: Models with Awareness of their Limits. (2016).Google Scholar
- Richard Harang and Felipe N Ducau. 2018. Measuring the speed of the Red Queen's Race. (2018).Google Scholar
- Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2, 1 (2020), 56–67. DOI:https://doi.org/10.1038/s42256-019-0138-9Google ScholarCross Ref
- Edward Raff, Jared Sylvester, and Charles Nicholas. 2017. Learning the PE header, malware detection with minimal domain knowledge. arXiv (2017), 121–132. DOI:https://doi.org/10.1145/3128572.3140442Google ScholarDigital Library
- M. Zubair Shafiq, S. Momina Tabish, Fauzan Mirza, and Muddassar Farooq. 2009. PE-miner: Mining structural information to detect malicious executables in realtime. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 5758 LNCS, (2009), 121–141. DOI:https://doi.org/10.1007/978-3-642-04342-0_7Google ScholarDigital Library
- Rajesh Kumar and Geetha. S. 2020. Malware classification using XGboost-Gradient boosted decision tree. Adv Sci Technol Eng Syst 5, 5 (September 2020), 536–549. DOI:https://doi.org/10.25046/AJ050566Google Scholar
- Mahmood Yousefi-Azar, Leonard G.C. Hamey, Vijay Varadharajan, and Shiping Chen. 2018. Malytics: A malware detection scheme. IEEE Access 6, (2018), 49418–49431. DOI:https://doi.org/10.1109/ACCESS.2018.2864871Google Scholar
- Wookhyun Jung and Sangwon Kim. 2015. Poster: Deep Learning for Zero-day Flash Malware Detection. Proc IEEE Symp Secur Priv (2015), 2–3.Google Scholar
Recommendations
Intrusion Detection System Using Bagging Ensemble Method of Machine Learning
ICCUBEA '15: Proceedings of the 2015 International Conference on Computing Communication Control and AutomationIntrusion detection system is widely used to protect and reduce damage to information system. It protects virtual and physical computer networks against threats and vulnerabilities. Presently, machine learning techniques are widely extended to implement ...
Zero-Day Malware Classification and Detection Using Machine Learning
AbstractA zero-day vulnerability is a weakness of the computer software and hardware that has yet to be discovered by people who might be interested in fixing it. Hackers may use these vulnerabilities to harm computer programs, data, other systems, or a ...
A novel malware analysis for malware detection and classification using machine learning algorithms
SIN '17: Proceedings of the 10th International Conference on Security of Information and NetworksNowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an ...
Comments