ABSTRACT
An efficient machine learning model for malware detection requires a large dataset to train. Yet it is not easy to collect such a large dataset without violating or leaving vulnerable to potential violation various aspects of data privacy. Our work proposes a federated learning framework that permits multiple parties to collaborate on learning behavioral graphs for malware detection. Our proposed graph classification framework allows the participating parties to freely decide their preferred classifier model without acknowledging their preferences to the others involved. This mitigates the chance of any data poisoning attacks. In our experiments, our classification model using the partially federated learning achieved the F1-score of 0.97, close to the performance of the centralized data training models. Moreover, the impact of the label flipping attack against our model is less than 0.02.
- Charles-Henry Bertrand Van Ouytsel and Axel Legay. 2023. Malware Analysis with Symbolic Execution and Graph Kernel. In Secure IT Systems: 27th Nordic Conference, NordSec 2022, Reykjavic, Iceland, November 30–December 2, 2022, Proceedings. Springer, 292–310.Google Scholar
- Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning Attacks against Support Vector Machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning (Edinburgh, Scotland) (ICML’12). Omnipress, Madison, WI, USA, 1467–1474.Google ScholarDigital Library
- Ferhat Ozgur Catak, Ahmet Faruk Yazı, Ogerta Elezaj, and Javed Ahmed. 2020. Deep learning based Sequential model for malware analysis using Windows exe API Calls. PeerJ Computer Science 6 (2020), e285.Google ScholarCross Ref
- Khanh-Huu-The Dam, Thomas Given-Wilson, and Axel Legay. 2021. Unsupervised behavioural mining and clustering for malware family identification. In SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021, Chih-Cheng Hung, Jiman Hong, Alessio Bechini, and Eunjee Song (Eds.). ACM, 374–383. https://doi.org/10.1145/3412841.3441919Google ScholarDigital Library
- Khanh-Huu-The Dam and Tayssir Touili. 2018. Learning Malware Using Generalized Graph Kernels. In Proceedings of the 13th International Conference on Availability, Reliability and Security, ARES 2018, Hamburg, Germany, August 27-30, 2018, Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 28:1–28:6. https://doi.org/10.1145/3230833.3230840Google ScholarDigital Library
- Khanh-Huu-The Dam and Tayssir Touili. 2022. Extracting malicious behaviours. Int. J. Inf. Comput. Secur. 17, 3/4 (2022), 365–404. https://doi.org/10.1504/IJICS.2022.122380Google ScholarDigital Library
- Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2020. The Limitations of Federated Learning in Sybil Settings. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 301–316. https://www.usenix.org/conference/raid2020/presentation/fungGoogle Scholar
- Craig Gentry. 2009. A fully homomorphic encryption scheme. Stanford university.Google Scholar
- Ruei-Hau Hsu, Yi-Cheng Wang, Chun-I Fan, Bo Sun, Tao Ban, Takeshi Takahashi, Ting-Wei Wu, and Shang-Wei Kao. 2020. A Privacy-Preserving Federated Learning System for Android Malware Detection Based on Edge Computing. In 2020 15th Asia Joint Conference on Information Security (AsiaJCIS). 128–136. https://doi.org/10.1109/AsiaJCIS50894.2020.00031Google ScholarCross Ref
- Xiang Huang, Li Ma, Wenyin Yang, and Yong Zhong. 2021. A method for windows malware detection based on deep learning. Journal of Signal Processing Systems 93 (2021), 265–273.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep Learning for Classification of Malware System Call Sequences. In AI 2016: Advances in Artificial Intelligence, Byeong Ho Kang and Quan Bai (Eds.). Springer International Publishing, Cham, 137–149.Google Scholar
- J Zico Kolter and Marcus A Maloof. 2006. Learning to detect and classify malicious executables in the wild.Journal of Machine Learning Research 7, 12 (2006).Google Scholar
- Kuang-Yao Lin and Wei-Ren Huang. 2020. Using federated learning on malware classification. In 2020 22nd International Conference on Advanced Communication Technology (ICACT). IEEE, 585–589.Google ScholarCross Ref
- Yehuda Lindell. 2020. Secure multiparty computation. Commun. ACM 64, 1 (2020), 86–96.Google ScholarDigital Library
- Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, 2020. Ibm federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987 (2020).Google Scholar
- Gopinath M. and Sibi Chakkaravarthy Sethuraman. 2023. A comprehensive survey on deep learning based malware detection techniques. Computer Science Review 47 (2023), 100529. https://doi.org/10.1016/j.cosrev.2022.100529Google ScholarDigital Library
- Hugo Daniel Macedo and Tayssir Touili. 2013. Mining malware specifications through static reachability analysis. In European Symposium on Research in Computer Security. Springer, 517–535.Google ScholarCross Ref
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.Google Scholar
- Sang Ni, Quan Qian, and Rui Zhang. 2018. Malware identification using visualization images and deep learning. Computers & Security 77 (2018), 871–885.Google ScholarDigital Library
- Stavros D Nikolopoulos and Iosif Polenakis. 2017. A graph-based model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques 13, 1 (2017), 29–46.Google ScholarCross Ref
- Charles-Henry Bertrand Van Ouytsel, Khanh-Huu-The Dam, and Axel Legay. 2022. Symbolic analysis meets federated learning to enhance malware identifier. In ARES 2022: The 17th International Conference on Availability, Reliability and Security, Vienna,Austria, August 23 - 26, 2022. ACM, 150:1–150:10. https://doi.org/10.1145/3538969.3538996Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
- D Krishna Sandeep Reddy and Arun K Pujari. 2006. N-gram analysis for computer virus detection. Journal in computer virology 2 (2006), 231–239.Google ScholarCross Ref
- Lara Saidia Fascí, Marco Fisichella, Gianluca Lax, and Chenyi Qian. 2023. Disarming visualization-based approaches in malware detection systems. Computers & Security 126 (2023), 103062. https://doi.org/10.1016/j.cose.2022.103062Google ScholarDigital Library
- M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings 2001 IEEE Symposium on Security and Privacy. 38–49. https://doi.org/10.1109/SECPRI.2001.924286Google ScholarCross Ref
- Stefano Sebastio, Eduard Baranov, Fabrizio Biondi, Olivier Decourbe, Thomas Given-Wilson, Axel Legay, Cassius Puodzius, and Jean Quilbeuf. 2020. Optimizing symbolic execution for malware behavior classification. Computers & Security 93 (2020), 101775. https://doi.org/10.1016/j.cose.2020.101775Google ScholarCross Ref
- Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP). 138–157. https://doi.org/10.1109/SP.2016.17Google ScholarCross Ref
- Shiva Darshan SL and CD Jaidhar. 2019. Windows malware detector using convolutional neural network based on visualization images. IEEE Transactions on Emerging Topics in Computing 9, 2 (2019), 1057–1069.Google Scholar
- Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International conference on machine learning. PMLR, 843–852.Google Scholar
- Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. 2017. Certified defenses for data poisoning attacks. Advances in neural information processing systems 30 (2017).Google Scholar
- Rahim Taheri, Reza Javidan, Mohammad Shojafar, Zahra Pooranian, Ali Miri, and Mauro Conti. 2020. On defending against label flipping attacks on malware detection systems. Neural Computing and Applications 32 (2020), 14781–14800.Google ScholarDigital Library
- Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. 2020. Data poisoning attacks against federated learning systems. In Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer, 480–501.Google Scholar
- Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In 2017 International Joint Conference on Neural Networks (IJCNN). 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342Google ScholarCross Ref
Index Terms
- Mitigate Data Poisoning Attack by Partially Federated Learning
Recommendations
Blockchain-Based Fairness-Enhanced Federated Learning Scheme Against Data Poisoning Attack
Smart Computing and CommunicationAbstractThe federated learning technology provides a new method for data integration, which realizes sharing of a global model and prevent the leakage of user’s original data information. In order to resist data poisoning attack from some participants, ...
Bandit-based data poisoning attack against federated learning for autonomous driving models
AbstractIn Internet of Things (IoT) applications, federated learning is commonly used for distributedly training models in a privacy-preserving manner. Recently, federated learning is broadly applied to autonomous driving for training ...
Data Poisoning Attacks Against Federated Learning Systems
Computer Security – ESORICS 2020AbstractFederated learning (FL) is an emerging paradigm for distributed training of large-scale deep neural networks in which participants’ data remains on their own devices with only model updates being shared with a central server. However, the ...
Comments