skip to main content
10.1145/3600160.3605032acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

Mitigate Data Poisoning Attack by Partially Federated Learning

Published:29 August 2023Publication History

ABSTRACT

An efficient machine learning model for malware detection requires a large dataset to train. Yet it is not easy to collect such a large dataset without violating or leaving vulnerable to potential violation various aspects of data privacy. Our work proposes a federated learning framework that permits multiple parties to collaborate on learning behavioral graphs for malware detection. Our proposed graph classification framework allows the participating parties to freely decide their preferred classifier model without acknowledging their preferences to the others involved. This mitigates the chance of any data poisoning attacks. In our experiments, our classification model using the partially federated learning achieved the F1-score of 0.97, close to the performance of the centralized data training models. Moreover, the impact of the label flipping attack against our model is less than 0.02.

References

  1. Charles-Henry Bertrand Van Ouytsel and Axel Legay. 2023. Malware Analysis with Symbolic Execution and Graph Kernel. In Secure IT Systems: 27th Nordic Conference, NordSec 2022, Reykjavic, Iceland, November 30–December 2, 2022, Proceedings. Springer, 292–310.Google ScholarGoogle Scholar
  2. Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning Attacks against Support Vector Machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning (Edinburgh, Scotland) (ICML’12). Omnipress, Madison, WI, USA, 1467–1474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ferhat Ozgur Catak, Ahmet Faruk Yazı, Ogerta Elezaj, and Javed Ahmed. 2020. Deep learning based Sequential model for malware analysis using Windows exe API Calls. PeerJ Computer Science 6 (2020), e285.Google ScholarGoogle ScholarCross RefCross Ref
  4. Khanh-Huu-The Dam, Thomas Given-Wilson, and Axel Legay. 2021. Unsupervised behavioural mining and clustering for malware family identification. In SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021, Chih-Cheng Hung, Jiman Hong, Alessio Bechini, and Eunjee Song (Eds.). ACM, 374–383. https://doi.org/10.1145/3412841.3441919Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Khanh-Huu-The Dam and Tayssir Touili. 2018. Learning Malware Using Generalized Graph Kernels. In Proceedings of the 13th International Conference on Availability, Reliability and Security, ARES 2018, Hamburg, Germany, August 27-30, 2018, Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 28:1–28:6. https://doi.org/10.1145/3230833.3230840Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Khanh-Huu-The Dam and Tayssir Touili. 2022. Extracting malicious behaviours. Int. J. Inf. Comput. Secur. 17, 3/4 (2022), 365–404. https://doi.org/10.1504/IJICS.2022.122380Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2020. The Limitations of Federated Learning in Sybil Settings. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 301–316. https://www.usenix.org/conference/raid2020/presentation/fungGoogle ScholarGoogle Scholar
  8. Craig Gentry. 2009. A fully homomorphic encryption scheme. Stanford university.Google ScholarGoogle Scholar
  9. Ruei-Hau Hsu, Yi-Cheng Wang, Chun-I Fan, Bo Sun, Tao Ban, Takeshi Takahashi, Ting-Wei Wu, and Shang-Wei Kao. 2020. A Privacy-Preserving Federated Learning System for Android Malware Detection Based on Edge Computing. In 2020 15th Asia Joint Conference on Information Security (AsiaJCIS). 128–136. https://doi.org/10.1109/AsiaJCIS50894.2020.00031Google ScholarGoogle ScholarCross RefCross Ref
  10. Xiang Huang, Li Ma, Wenyin Yang, and Yong Zhong. 2021. A method for windows malware detection based on deep learning. Journal of Signal Processing Systems 93 (2021), 265–273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  12. Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep Learning for Classification of Malware System Call Sequences. In AI 2016: Advances in Artificial Intelligence, Byeong Ho Kang and Quan Bai (Eds.). Springer International Publishing, Cham, 137–149.Google ScholarGoogle Scholar
  13. J Zico Kolter and Marcus A Maloof. 2006. Learning to detect and classify malicious executables in the wild.Journal of Machine Learning Research 7, 12 (2006).Google ScholarGoogle Scholar
  14. Kuang-Yao Lin and Wei-Ren Huang. 2020. Using federated learning on malware classification. In 2020 22nd International Conference on Advanced Communication Technology (ICACT). IEEE, 585–589.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yehuda Lindell. 2020. Secure multiparty computation. Commun. ACM 64, 1 (2020), 86–96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, 2020. Ibm federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987 (2020).Google ScholarGoogle Scholar
  17. Gopinath M. and Sibi Chakkaravarthy Sethuraman. 2023. A comprehensive survey on deep learning based malware detection techniques. Computer Science Review 47 (2023), 100529. https://doi.org/10.1016/j.cosrev.2022.100529Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hugo Daniel Macedo and Tayssir Touili. 2013. Mining malware specifications through static reachability analysis. In European Symposium on Research in Computer Security. Springer, 517–535.Google ScholarGoogle ScholarCross RefCross Ref
  19. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.Google ScholarGoogle Scholar
  20. Sang Ni, Quan Qian, and Rui Zhang. 2018. Malware identification using visualization images and deep learning. Computers & Security 77 (2018), 871–885.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stavros D Nikolopoulos and Iosif Polenakis. 2017. A graph-based model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques 13, 1 (2017), 29–46.Google ScholarGoogle ScholarCross RefCross Ref
  22. Charles-Henry Bertrand Van Ouytsel, Khanh-Huu-The Dam, and Axel Legay. 2022. Symbolic analysis meets federated learning to enhance malware identifier. In ARES 2022: The 17th International Conference on Availability, Reliability and Security, Vienna,Austria, August 23 - 26, 2022. ACM, 150:1–150:10. https://doi.org/10.1145/3538969.3538996Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D Krishna Sandeep Reddy and Arun K Pujari. 2006. N-gram analysis for computer virus detection. Journal in computer virology 2 (2006), 231–239.Google ScholarGoogle ScholarCross RefCross Ref
  25. Lara Saidia Fascí, Marco Fisichella, Gianluca Lax, and Chenyi Qian. 2023. Disarming visualization-based approaches in malware detection systems. Computers & Security 126 (2023), 103062. https://doi.org/10.1016/j.cose.2022.103062Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings 2001 IEEE Symposium on Security and Privacy. 38–49. https://doi.org/10.1109/SECPRI.2001.924286Google ScholarGoogle ScholarCross RefCross Ref
  27. Stefano Sebastio, Eduard Baranov, Fabrizio Biondi, Olivier Decourbe, Thomas Given-Wilson, Axel Legay, Cassius Puodzius, and Jean Quilbeuf. 2020. Optimizing symbolic execution for malware behavior classification. Computers & Security 93 (2020), 101775. https://doi.org/10.1016/j.cose.2020.101775Google ScholarGoogle ScholarCross RefCross Ref
  28. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP). 138–157. https://doi.org/10.1109/SP.2016.17Google ScholarGoogle ScholarCross RefCross Ref
  29. Shiva Darshan SL and CD Jaidhar. 2019. Windows malware detector using convolutional neural network based on visualization images. IEEE Transactions on Emerging Topics in Computing 9, 2 (2019), 1057–1069.Google ScholarGoogle Scholar
  30. Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International conference on machine learning. PMLR, 843–852.Google ScholarGoogle Scholar
  31. Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. 2017. Certified defenses for data poisoning attacks. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  32. Rahim Taheri, Reza Javidan, Mohammad Shojafar, Zahra Pooranian, Ali Miri, and Mauro Conti. 2020. On defending against label flipping attacks on malware detection systems. Neural Computing and Applications 32 (2020), 14781–14800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. 2020. Data poisoning attacks against federated learning systems. In Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer, 480–501.Google ScholarGoogle Scholar
  34. Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In 2017 International Joint Conference on Neural Networks (IJCNN). 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mitigate Data Poisoning Attack by Partially Federated Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security
            August 2023
            1440 pages
            ISBN:9798400707728
            DOI:10.1145/3600160

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 August 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate228of451submissions,51%
          • Article Metrics

            • Downloads (Last 12 months)61
            • Downloads (Last 6 weeks)10

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format