Abstract
Federated Learning (FL) is a collective of distributed learning paradigm that aims to preserve privacy in data. Recent studies have shown FL models to be vulnerable to reconstruction attacks that compromise data privacy by inverting gradients computed on confidential data. To address the challenge of defending against these attacks, it is common to employ methods that guarantee data confidentiality using the principles of Differential Privacy (DP). However, in many cases, especially for machine learning models trained on unstructured data such as text, evaluating privacy requires to consider also the finite space of embedding for client’s private data. In this study, we show how privacy in a distributed FL setup is sensitive to the underlying finite embeddings of the confidential data. We show that privacy can be quantified for a client batch that uses either noise, or a mixture of finite embeddings, by introducing a normalised rank distance (\(d_{rank}\)). This measure has the advantage of taking into account the size of a finite vocabulary embedding, and align the privacy budget to a partitioned space. We further explore the impact of noise and client batch size on the privacy budget and compare it to the standard \(\epsilon \) derived from Local-DP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
https://www.investopedia.com/financial-term-dictionary-4769738
Concise Medical Dictionary. Oxford University Press, January 2010. https://doi.org/10.1093/acref/9780199557141.001.0001, https://doi.org/10.1093/acref/9780199557141.001.0001
(Oct 2023), https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, October 2016. https://doi.org/10.1145/2976749.2978318
Adnan, M., Kalra, S., Cresswell, J.C., Taylor, G.W., Tizhoosh, H.R.: Federated learning and differential privacy for medical image analysis. Sci. Rep. 12(1), February 2022. https://doi.org/10.1038/s41598-022-05539-7
Carvalho, R.S., Vasiloudis, T., Feyisetan, O.: TEM: High utility metric differential privacy on text (2021). https://doi.org/10.48550/ARXIV.2107.07928, https://arxiv.org/abs/2107.07928
Chamikara, M.A.P., Liu, D., Camtepe, S., Nepal, S., Grobler, M., Bertok, P., Khalil, I.: Local differential privacy for federated learning (2022). https://doi.org/10.48550/ARXIV.2202.06053, https://arxiv.org/abs/2202.06053
Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_5
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Elkordy, A.R., Zhang, J., Ezzeldin, Y.H., Psounis, K., Avestimehr, S.: How much privacy does federated learning with secure aggregation guarantee? (2022). https://doi.org/10.48550/ARXIV.2208.02304, https://arxiv.org/abs/2208.02304
Eloul, S., Silavong, F., Kamthe, S., Georgiadis, A., Moran, S.J.: Enhancing privacy against inversion attacks in federated learning by using mixing gradients strategies (2022)
Feyisetan, O., Aggarwal, A., Xu, Z., Teissier, N.: Research challenges in designing differentially private text generation mechanisms (2020). https://doi.org/10.48550/ARXIV.2012.05403, https://arxiv.org/abs/2012.05403
Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations (2019). https://doi.org/10.48550/ARXIV.1910.08902, https://arxiv.org/abs/1910.08902
Gafni, T., Shlezinger, N., Cohen, K., Eldar, Y.C., Poor, H.V.: Federated learning: a signal processing perspective. IEEE Sign. Process. Mag. 39(3), 14–41 (2022). https://doi.org/10.1109/msp.2021.3125282
Hu, L., Habernal, I., Shen, L., Wang, D.: Differentially private natural language models: Recent advances and future directions (2023)
Imola, J., Kasiviswanathan, S., White, S., Aggarwal, A., Teissier, N.: Balancing utility and scalability in metric differential privacy. In: UAI 2022 (2022). https://www.amazon.science/publications/balancing-utility-and-scalability-in-metric-differential-privacy
Klymenko, O., Meisenbacher, S., Matthes, F.: Differential privacy in natural language processing the story so far. In: Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.privatenlp-1.1, https://doi.org/10.18653
Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the Laplace mechanism in differential privacy (2015). https://doi.org/10.48550/ARXIV.1504.00065, https://arxiv.org/abs/1504.00065
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (Jun 2011), https://aclanthology.org/P11-1015
Meehan, C., Mrini, K., Chaudhuri, K.: Sentence-level privacy for document embeddings (2022). https://doi.org/10.48550/ARXIV.2205.04605, https://arxiv.org/abs/2205.04605
Wei, K., et al.: Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 15, 3454–3469 (2020). https://doi.org/10.1109/TIFS.2020.2988575
Zhao, Y., et al.: Local differential privacy based federated learning for Internet of Things (2020)
Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/file/60a6c4002cc7b29142def8871531281a-Paper.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Starting with LHS of 3 and substituting probability density function of Laplace distribution we get
The first inequality follows from triangle inequality and in the last step \(\epsilon \) is substituted for its definition.
Disclaimer
This paper was prepared for informational purposes by the Global Technology Applied Research center of JPMorgan Chase & Co. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates makes any explicit or implied representation or warranty and none of them accept any liability in connection with this paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential legal, compliance, tax, or accounting effects thereof. This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Papadopoulos, G., Satsangi, Y., Eloul, S., Pistoia, M. (2024). Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14611. Springer, Cham. https://doi.org/10.1007/978-3-031-56066-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-56066-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56065-1
Online ISBN: 978-3-031-56066-8
eBook Packages: Computer ScienceComputer Science (R0)