research-article

Mitigate Data Poisoning Attack by Partially Federated Learning

Authors:
Khanh Huu The Dam

Université catholique de Louvain, Belgium

Université catholique de Louvain, Belgium

0000-0001-7203-9658
View Profile

,
Axel Legay

Université catholique de Louvain, Belgium

Université catholique de Louvain, Belgium

0000-0003-2287-8925
View Profile

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and SecurityAugust 2023Article No.: 75Pages 1–9https://doi.org/10.1145/3600160.3605032

Published:29 August 2023Publication History

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

Pages 1–9

ABSTRACT

An efficient machine learning model for malware detection requires a large dataset to train. Yet it is not easy to collect such a large dataset without violating or leaving vulnerable to potential violation various aspects of data privacy. Our work proposes a federated learning framework that permits multiple parties to collaborate on learning behavioral graphs for malware detection. Our proposed graph classification framework allows the participating parties to freely decide their preferred classifier model without acknowledging their preferences to the others involved. This mitigates the chance of any data poisoning attacks. In our experiments, our classification model using the partially federated learning achieved the F1-score of 0.97, close to the performance of the centralized data training models. Moreover, the impact of the label flipping attack against our model is less than 0.02.

References

Charles-Henry Bertrand Van Ouytsel and Axel Legay. 2023. Malware Analysis with Symbolic Execution and Graph Kernel. In Secure IT Systems: 27th Nordic Conference, NordSec 2022, Reykjavic, Iceland, November 30–December 2, 2022, Proceedings. Springer, 292–310.Google Scholar
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning Attacks against Support Vector Machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning (Edinburgh, Scotland) (ICML’12). Omnipress, Madison, WI, USA, 1467–1474.Google ScholarDigital Library
Ferhat Ozgur Catak, Ahmet Faruk Yazı, Ogerta Elezaj, and Javed Ahmed. 2020. Deep learning based Sequential model for malware analysis using Windows exe API Calls. PeerJ Computer Science 6 (2020), e285.Google ScholarCross Ref
Khanh-Huu-The Dam, Thomas Given-Wilson, and Axel Legay. 2021. Unsupervised behavioural mining and clustering for malware family identification. In SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021, Chih-Cheng Hung, Jiman Hong, Alessio Bechini, and Eunjee Song (Eds.). ACM, 374–383. https://doi.org/10.1145/3412841.3441919Google ScholarDigital Library
Khanh-Huu-The Dam and Tayssir Touili. 2018. Learning Malware Using Generalized Graph Kernels. In Proceedings of the 13th International Conference on Availability, Reliability and Security, ARES 2018, Hamburg, Germany, August 27-30, 2018, Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 28:1–28:6. https://doi.org/10.1145/3230833.3230840Google ScholarDigital Library
Khanh-Huu-The Dam and Tayssir Touili. 2022. Extracting malicious behaviours. Int. J. Inf. Comput. Secur. 17, 3/4 (2022), 365–404. https://doi.org/10.1504/IJICS.2022.122380Google ScholarDigital Library
Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. 2020. The Limitations of Federated Learning in Sybil Settings. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 301–316. https://www.usenix.org/conference/raid2020/presentation/fungGoogle Scholar
Craig Gentry. 2009. A fully homomorphic encryption scheme. Stanford university.Google Scholar
Ruei-Hau Hsu, Yi-Cheng Wang, Chun-I Fan, Bo Sun, Tao Ban, Takeshi Takahashi, Ting-Wei Wu, and Shang-Wei Kao. 2020. A Privacy-Preserving Federated Learning System for Android Malware Detection Based on Edge Computing. In 2020 15th Asia Joint Conference on Information Security (AsiaJCIS). 128–136. https://doi.org/10.1109/AsiaJCIS50894.2020.00031Google ScholarCross Ref
Xiang Huang, Li Ma, Wenyin Yang, and Yong Zhong. 2021. A method for windows malware detection based on deep learning. Journal of Signal Processing Systems 93 (2021), 265–273.Google ScholarDigital Library
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep Learning for Classification of Malware System Call Sequences. In AI 2016: Advances in Artificial Intelligence, Byeong Ho Kang and Quan Bai (Eds.). Springer International Publishing, Cham, 137–149.Google Scholar
J Zico Kolter and Marcus A Maloof. 2006. Learning to detect and classify malicious executables in the wild.Journal of Machine Learning Research 7, 12 (2006).Google Scholar
Kuang-Yao Lin and Wei-Ren Huang. 2020. Using federated learning on malware classification. In 2020 22nd International Conference on Advanced Communication Technology (ICACT). IEEE, 585–589.Google ScholarCross Ref
Yehuda Lindell. 2020. Secure multiparty computation. Commun. ACM 64, 1 (2020), 86–96.Google ScholarDigital Library
Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, 2020. Ibm federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987 (2020).Google Scholar
Gopinath M. and Sibi Chakkaravarthy Sethuraman. 2023. A comprehensive survey on deep learning based malware detection techniques. Computer Science Review 47 (2023), 100529. https://doi.org/10.1016/j.cosrev.2022.100529Google ScholarDigital Library
Hugo Daniel Macedo and Tayssir Touili. 2013. Mining malware specifications through static reachability analysis. In European Symposium on Research in Computer Security. Springer, 517–535.Google ScholarCross Ref
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.Google Scholar
Sang Ni, Quan Qian, and Rui Zhang. 2018. Malware identification using visualization images and deep learning. Computers & Security 77 (2018), 871–885.Google ScholarDigital Library
Stavros D Nikolopoulos and Iosif Polenakis. 2017. A graph-based model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques 13, 1 (2017), 29–46.Google ScholarCross Ref
Charles-Henry Bertrand Van Ouytsel, Khanh-Huu-The Dam, and Axel Legay. 2022. Symbolic analysis meets federated learning to enhance malware identifier. In ARES 2022: The 17th International Conference on Availability, Reliability and Security, Vienna,Austria, August 23 - 26, 2022. ACM, 150:1–150:10. https://doi.org/10.1145/3538969.3538996Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
D Krishna Sandeep Reddy and Arun K Pujari. 2006. N-gram analysis for computer virus detection. Journal in computer virology 2 (2006), 231–239.Google ScholarCross Ref
Lara Saidia Fascí, Marco Fisichella, Gianluca Lax, and Chenyi Qian. 2023. Disarming visualization-based approaches in malware detection systems. Computers & Security 126 (2023), 103062. https://doi.org/10.1016/j.cose.2022.103062Google ScholarDigital Library
M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings 2001 IEEE Symposium on Security and Privacy. 38–49. https://doi.org/10.1109/SECPRI.2001.924286Google ScholarCross Ref
Stefano Sebastio, Eduard Baranov, Fabrizio Biondi, Olivier Decourbe, Thomas Given-Wilson, Axel Legay, Cassius Puodzius, and Jean Quilbeuf. 2020. Optimizing symbolic execution for malware behavior classification. Computers & Security 93 (2020), 101775. https://doi.org/10.1016/j.cose.2020.101775Google ScholarCross Ref
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy (SP). 138–157. https://doi.org/10.1109/SP.2016.17Google ScholarCross Ref
Shiva Darshan SL and CD Jaidhar. 2019. Windows malware detector using convolutional neural network based on visualization images. IEEE Transactions on Emerging Topics in Computing 9, 2 (2019), 1057–1069.Google Scholar
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International conference on machine learning. PMLR, 843–852.Google Scholar
Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. 2017. Certified defenses for data poisoning attacks. Advances in neural information processing systems 30 (2017).Google Scholar
Rahim Taheri, Reza Javidan, Mohammad Shojafar, Zahra Pooranian, Ali Miri, and Mauro Conti. 2020. On defending against label flipping attacks on malware detection systems. Neural Computing and Applications 32 (2020), 14781–14800.Google ScholarDigital Library
Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. 2020. Data poisoning attacks against federated learning systems. In Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer, 480–501.Google Scholar
Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In 2017 International Joint Conference on Neural Networks (IJCNN). 3854–3861. https://doi.org/10.1109/IJCNN.2017.7966342Google ScholarCross Ref

Index Terms

Mitigate Data Poisoning Attack by Partially Federated Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Blockchain-Based Fairness-Enhanced Federated Learning Scheme Against Data Poisoning Attack
Smart Computing and Communication
Abstract
The federated learning technology provides a new method for data integration, which realizes sharing of a global model and prevent the leakage of user’s original data information. In order to resist data poisoning attack from some participants, ...
Read More
Bandit-based data poisoning attack against federated learning for autonomous driving models
Abstract
In Internet of Things (IoT) applications, federated learning is commonly used for distributedly training models in a privacy-preserving manner. Recently, federated learning is broadly applied to autonomous driving for training ...
Read More
Data Poisoning Attacks Against Federated Learning Systems
Computer Security – ESORICS 2020
Abstract
Federated learning (FL) is an emerging paradigm for distributed training of large-scale deep neural networks in which participants’ data remains on their own devices with only model updates being shared with a central server. However, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security
August 2023
1440 pages
ISBN:9798400707728
DOI:10.1145/3600160

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data Privacy
Data poisoning attack
Federated Learning
Malware detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate228of451submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 61
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Mitigate Data Poisoning Attack by Partially Federated Learning

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Blockchain-Based Fairness-Enhanced Federated Learning Scheme Against Data Poisoning Attack

Bandit-based data poisoning attack against federated learning for autonomous driving models

Data Poisoning Attacks Against Federated Learning Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Mitigate Data Poisoning Attack by Partially Federated Learning

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Blockchain-Based Fairness-Enhanced Federated Learning Scheme Against Data Poisoning Attack

Bandit-based data poisoning attack against federated learning for autonomous driving models

Data Poisoning Attacks Against Federated Learning Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media