Abstract
Vertical Federated Learning (VFL) enables multiple parties to collaboratively train a machine learning model over vertically distributed datasets without data privacy leakage. However, there is a limitation of the current VFL solutions: current VFL models fail to conduct inference on non-overlapping samples during inference. This limitation seriously damages the VFL model’s availability because, in practice, overlapping samples may only take up a small portion of the whole data at each party which means a large part of inference tasks will fail. In this article, we propose a novel VFL framework which enables federated inference on non-overlapping data. Our framework regards the distributed features as privileged information which is available in the training period but disappears during inference. We distill the knowledge of such privileged features and transfer them to the parties’ local model which only processes local features. Furthermore, we adopt Oblivious Transfer (OT) to preserve data ID privacy during training and inference. Empirically, we evaluate the model on the real-world dataset collected from Criteo and Taobao. Besides, we also provide a security analysis of the proposed framework.
- [1] . 2015. A Guide to Fully Homomorphic Encryption. Cryptology ePrint Archive Report 2015/1192. https://eprint.iacr.org/2015/1192.Google Scholar
- [2] . 2013. More efficient oblivious transfer and extensions for faster secure computation. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, 535–548.
DOI: Google ScholarDigital Library - [3] . 1995. Precomputing oblivious transfer. In Proceedings of the Advances in Cryptology, (Ed.). Springer Berlin, Berlin, 97–109.Google ScholarCross Ref
- [4] . 1996. Correlated pseudorandomness and the complexity of private computations. In Proceedings of the 28th Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, New York, NY, 479–488.
DOI: Google ScholarDigital Library - [5] . 1990. Non-interactive oblivious transfer and applications. In Proceedings of the Advances in Cryptology, (Ed.). Springer New York, New York, 547–557.Google ScholarCross Ref
- [6] . 2021. Secure federated matrix factorization. IEEE Intelligent Systems 36, 5 (2021), 11–20.
DOI: Google ScholarCross Ref - [7] . 2020. VAFL: a Method of Vertical Asynchronous Federated Learning.
arxiv:2007.06081 . Retrieved from https://arxiv.org/abs/2007.06081.Google Scholar - [8] Criteo Challenge. 2014. Criteo Display Advertising Challenge. https://www.kaggle.com/c/criteo-display-ad-challenge/data. Access on 20 Feb. 2021.Google Scholar
- [9] . 2013. Practical covertly secure MPC for dishonest majority – or: Breaking the SPDZ limits. In Proceedings of the Computer Security, , , and (Eds.). Springer Berlin, Berlin, 1–18.Google ScholarCross Ref
- [10] . 2015. ABY-A framework for efficient mixed-protocol secure two-party computation. In Proceedings of the NDSS.Google ScholarCross Ref
- [11] . 2019. Privacy-preserving heterogeneous federated transfer learning. In Proceedings of the 2019 IEEE International Conference on Big Data. 2552–2559.
DOI: Google ScholarCross Ref - [12] . 2009. Fully homomorphic encryption using ideal lattices. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, New York, NY, 169–178.
DOI: Google ScholarDigital Library - [13] . 2015. Fully homomorphic encryption from approximate ideal lattices. Ruan Jian Xue Bao/Journal of Software 26, 10 (2015), 2696–2719.
DOI: Google ScholarCross Ref - [14] . 2018. Federated Learning for Mobile Keyboard Prediction. Google ScholarCross Ref
- [15] . 2018. FDML: A collaborative machine learning framework for distributed features. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), 2232–2240.Google Scholar
- [16] . 2003. Extending oblivious transfers efficiently. In Proceedings of the Advances in Cryptology, (Ed.). Springer Berlin, Berlin, 145–161.Google ScholarCross Ref
- [17] . 2019. Advances and open problems in federated learning. arXiv:1912.04977. Retrieved from https://arxiv.org/abs/1912.04977.Google Scholar
- [18] . 2013. Improved ot extension for transferring short secrets. In Proceedings of the Advances in Cryptology, and (Eds.). Springer Berlin, Berlin, 54–70.Google ScholarCross Ref
- [19] . 2004. Privacy-preserving inter-database operations. In Proceedings of the Intelligence and Security Informatics, , , , and (Eds.). Springer Berlin, Berlin, 66–82.Google ScholarCross Ref
- [20] . 2019. A communication efficient collaborative learning framework for distributed features. Google ScholarCross Ref
- [21] Sudipan Saha and Tahir Ahmad. 2020. Federated Transfer Learning: concept and applications. Google ScholarCross Ref
- [22] . 2020. A secure federated transfer learning framework. IEEE Intelligent Systems 35, 4 (2020), 70–82.
DOI: Google ScholarCross Ref - [23] . 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, and (Eds.). PMLR, Fort Lauderdale, FL, 1273–1282. Retrieved from http://proceedings.mlr.press/v54/mcmahan17a.html.Google Scholar
- [24] . 2018. ABY\( ^{3} \): A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, 35–52.
DOI: Google ScholarDigital Library - [25] . 2017. SecureML: A system for scalable privacy-preserving machine learning. In Proceedings of the 2017 IEEE Symposium on Security and Privacy, 19–38.
DOI: Google ScholarCross Ref - [26] . 2012. A new approach to practical active-secure two-party computation. In Proceedings of the Advances in Cryptology, and (Eds.). Springer Berlin, Berlin, 681–700.Google ScholarDigital Library
- [27] . 1999. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the Advances in Cryptology, (Ed.). Springer Berlin, Berlin, 223–238.Google ScholarCross Ref
- [28] . 1999. Efficient public-key cryptosystems provably secure against active adversaries. In Proceedings of the Advances in Cryptology, , , and (Eds.). Springer Berlin, Berlin, 165–179.Google ScholarCross Ref
- [29] . 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2009), 1345–1359.Google ScholarDigital Library
- [30] . 2019. Secure and efficient federated transfer learning. In Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 2569–2576.Google ScholarCross Ref
- [31] . 2018. Taobao Display/Click Dataset. https://tianchi.aliyun.com/dataset/dataDetail?dataId=56, Access on 23 Feb. 2021.Google Scholar
- [32] . 2015. Learning using privileged information: Similarity control and knowledge transfer. Journal of Machine Learning Research 16, 1 (2015), 2023–2049.Google ScholarDigital Library
- [33] . 2009. A new learning paradigm: Learning using privileged information. Neural Networks 22, 5 (2009), 544–557.
DOI: Google ScholarDigital Library - [34] . 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing 10, 3152676 (2017), 10–55.Google Scholar
- [35] . 2019. FATE: An Industrial Grade Federated Learning Framework. Retrieved from https://github.com/FederatedAI/FATE, Access on 20 Feb. 2021.Google Scholar
- [36] . 2020. Privileged features distillation at taobao recommendations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, 2590–2598.
DOI: Google ScholarDigital Library - [37] . 2019. Federated machine learning: Concept and applications. arXiv 10, 2 (2019), 1–19.Google Scholar
- [38] Kaiqiang Xu, Xinchen Wan, Hao Wang, Zhenghang Ren, Xudong Liao, Decang Sun, Chaoliang Zeng, and Kai Chen. 2021. TACC: A Full-stack Cloud Computing Infrastructure for Machine Learning Tasks. Google ScholarCross Ref
Index Terms
- Improving Availability of Vertical Federated Learning: Relaxing Inference on Non-overlapping Data
Recommendations
A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions
Invited TutorialThe past four years have witnessed the rapid development of federated learning (FL). However, new privacy concerns have also emerged during the aggregation of the distributed intermediate results. The emerging privacy-preserving FL (PPFL) has been ...
Vertical federated learning-based feature selection with non-overlapping sample utilization
AbstractVertical federated learning (VFL) is a privacy preserving collaborative machine learning technique designed for distributed learning scenarios in which data from different parties have overlap in the sample space. In this paper, a VFL ...
Highlights- In this paper, we bridge this gap by proposing a novel VFL-based feature selection method—Vertical Federated Learning-based Feature Selection (VFLFS). To the ...
SVFL: Secure Vertical Federated Learning on Linear Models
Science of Cyber SecurityAbstractFederated learning (FL) is a popular technique that enables multiple parties to train a machine learning model collaboratively without disclosing the raw data to each other. A vertically partitioned federated learning configuration is applicable ...
Comments