Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data

Papadopoulos, Georgios; Satsangi, Yash; Eloul, Shaltiel; Pistoia, Marco

doi:10.1007/978-3-031-56066-8_21

Georgios Papadopoulos ORCID: orcid.org/0009-0009-9732-3423¹⁴,
Yash Satsangi¹⁴,
Shaltiel Eloul¹⁴ &
…
Marco Pistoia¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14611))

Included in the following conference series:

European Conference on Information Retrieval

240 Accesses

Abstract

Federated Learning (FL) is a collective of distributed learning paradigm that aims to preserve privacy in data. Recent studies have shown FL models to be vulnerable to reconstruction attacks that compromise data privacy by inverting gradients computed on confidential data. To address the challenge of defending against these attacks, it is common to employ methods that guarantee data confidentiality using the principles of Differential Privacy (DP). However, in many cases, especially for machine learning models trained on unstructured data such as text, evaluating privacy requires to consider also the finite space of embedding for client’s private data. In this study, we show how privacy in a distributed FL setup is sensitive to the underlying finite embeddings of the confidential data. We show that privacy can be quantified for a client batch that uses either noise, or a mixture of finite embeddings, by introducing a normalised rank distance ($d_{rank}$). This measure has the advantage of taking into account the size of a finite vocabulary embedding, and align the privacy budget to a partitioned space. We further explore the impact of noise and client batch size on the privacy budget and compare it to the standard $\epsilon $ derived from Local-DP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://www.merriam-webster.com/legal#::text=Search
https://www.investopedia.com/financial-term-dictionary-4769738
Concise Medical Dictionary. Oxford University Press, January 2010. https://doi.org/10.1093/acref/9780199557141.001.0001, https://doi.org/10.1093/acref/9780199557141.001.0001
(Oct 2023), https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, October 2016. https://doi.org/10.1145/2976749.2978318
Adnan, M., Kalra, S., Cresswell, J.C., Taylor, G.W., Tizhoosh, H.R.: Federated learning and differential privacy for medical image analysis. Sci. Rep. 12(1), February 2022. https://doi.org/10.1038/s41598-022-05539-7
Carvalho, R.S., Vasiloudis, T., Feyisetan, O.: TEM: High utility metric differential privacy on text (2021). https://doi.org/10.48550/ARXIV.2107.07928, https://arxiv.org/abs/2107.07928
Chamikara, M.A.P., Liu, D., Camtepe, S., Nepal, S., Grobler, M., Bertok, P., Khalil, I.: Local differential privacy for federated learning (2022). https://doi.org/10.48550/ARXIV.2202.06053, https://arxiv.org/abs/2202.06053
Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_5
Chapter Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Chapter Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Elkordy, A.R., Zhang, J., Ezzeldin, Y.H., Psounis, K., Avestimehr, S.: How much privacy does federated learning with secure aggregation guarantee? (2022). https://doi.org/10.48550/ARXIV.2208.02304, https://arxiv.org/abs/2208.02304
Eloul, S., Silavong, F., Kamthe, S., Georgiadis, A., Moran, S.J.: Enhancing privacy against inversion attacks in federated learning by using mixing gradients strategies (2022)
Google Scholar
Feyisetan, O., Aggarwal, A., Xu, Z., Teissier, N.: Research challenges in designing differentially private text generation mechanisms (2020). https://doi.org/10.48550/ARXIV.2012.05403, https://arxiv.org/abs/2012.05403
Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations (2019). https://doi.org/10.48550/ARXIV.1910.08902, https://arxiv.org/abs/1910.08902
Gafni, T., Shlezinger, N., Cohen, K., Eldar, Y.C., Poor, H.V.: Federated learning: a signal processing perspective. IEEE Sign. Process. Mag. 39(3), 14–41 (2022). https://doi.org/10.1109/msp.2021.3125282
Article Google Scholar
Hu, L., Habernal, I., Shen, L., Wang, D.: Differentially private natural language models: Recent advances and future directions (2023)
Google Scholar
Imola, J., Kasiviswanathan, S., White, S., Aggarwal, A., Teissier, N.: Balancing utility and scalability in metric differential privacy. In: UAI 2022 (2022). https://www.amazon.science/publications/balancing-utility-and-scalability-in-metric-differential-privacy
Klymenko, O., Meisenbacher, S., Matthes, F.: Differential privacy in natural language processing the story so far. In: Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.privatenlp-1.1, https://doi.org/10.18653
Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the Laplace mechanism in differential privacy (2015). https://doi.org/10.48550/ARXIV.1504.00065, https://arxiv.org/abs/1504.00065
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (Jun 2011), https://aclanthology.org/P11-1015
Meehan, C., Mrini, K., Chaudhuri, K.: Sentence-level privacy for document embeddings (2022). https://doi.org/10.48550/ARXIV.2205.04605, https://arxiv.org/abs/2205.04605
Wei, K., et al.: Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 15, 3454–3469 (2020). https://doi.org/10.1109/TIFS.2020.2988575
Article Google Scholar
Zhao, Y., et al.: Local differential privacy based federated learning for Internet of Things (2020)
Google Scholar
Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/file/60a6c4002cc7b29142def8871531281a-Paper.pdf

Download references

Author information

Authors and Affiliations

Global Technology Applied Research, JPMorgan Chase, New York, USA
Georgios Papadopoulos, Yash Satsangi, Shaltiel Eloul & Marco Pistoia

Authors

Georgios Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Yash Satsangi
View author publications
You can also search for this author in PubMed Google Scholar
Shaltiel Eloul
View author publications
You can also search for this author in PubMed Google Scholar
Marco Pistoia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Papadopoulos .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Appendices

Appendix

Starting with LHS of 3 and substituting probability density function of Laplace distribution we get

$$\begin{aligned} \begin{aligned} \frac{\Pr (\textbf{M}(B, f, b))=y)}{\Pr (\textbf{M}(B', f, b))=y)} =\varPi _i \frac{\exp \big (-\frac{|y_i - \frac{\sum _{t}f({\textbf {x}}_t)_i }{|B|} |}{br}\big )}{\exp \big (-\frac{|y_i - \frac{\sum _{t'}f({\textbf {x}}_{t'})_i }{|B|} |}{br}\big )} \nonumber \\ =\varPi _i \frac{\exp \big (-\frac{ \frac{| |B| y_i -\sum _{t}f({\textbf {x}}_t)_i }{|B|} |}{br}\big )}{\exp \big (-\frac{ \frac{| |B| y_i -\sum _{t'}f({\textbf {x}}_{t'})_i }{|B|} |}{br}\big )} \nonumber \\ =\varPi _i \frac{\exp \big (- \frac{| |B| y_i -\sum _{t}f({\textbf {x}}_t)_i |}{|B|br}\big )}{\exp \big (- \frac{| |B| y_i -\sum _{t'}f({\textbf {x}}_{t'})_i | }{|B|br} \big )} \nonumber \\ = \varPi _i {\exp \big (\frac{||B|y_i - \sum _{t'}f({\textbf {x}}_{t'})_i| - ||B|y_i - \sum _{t}f({\textbf {x}}_{t})_i|}{|B|br}\big )} \nonumber \\ \le \varPi _i\exp {\big (\frac{| \sum _{t'}f({\textbf {x}}_{t'})_i - \sum _{t}f({\textbf {x}}_{t})_i |}{br|B|}\big ) } \nonumber \\ = \exp (\sum _i \big (\frac{| \sum _{t'}f({\textbf {x}}_{t'})_i - \sum _{t}f({\textbf {x}}_{t})_i |}{br|B|}\big ) \nonumber \\ = \exp (\epsilon ) \end{aligned} \end{aligned}$$

The first inequality follows from triangle inequality and in the last step $\epsilon $ is substituted for its definition.

Disclaimer

This paper was prepared for informational purposes by the Global Technology Applied Research center of JPMorgan Chase & Co. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates makes any explicit or implied representation or warranty and none of them accept any liability in connection with this paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential legal, compliance, tax, or accounting effects thereof. This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papadopoulos, G., Satsangi, Y., Eloul, S., Pistoia, M. (2024). Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14611. Springer, Cham. https://doi.org/10.1007/978-3-031-56066-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-56066-8_21
Published: 15 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56065-1
Online ISBN: 978-3-031-56066-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data