Skip to main content

Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14611))

Included in the following conference series:

  • 240 Accesses

Abstract

Federated Learning (FL) is a collective of distributed learning paradigm that aims to preserve privacy in data. Recent studies have shown FL models to be vulnerable to reconstruction attacks that compromise data privacy by inverting gradients computed on confidential data. To address the challenge of defending against these attacks, it is common to employ methods that guarantee data confidentiality using the principles of Differential Privacy (DP). However, in many cases, especially for machine learning models trained on unstructured data such as text, evaluating privacy requires to consider also the finite space of embedding for client’s private data. In this study, we show how privacy in a distributed FL setup is sensitive to the underlying finite embeddings of the confidential data. We show that privacy can be quantified for a client batch that uses either noise, or a mixture of finite embeddings, by introducing a normalised rank distance (\(d_{rank}\)). This measure has the advantage of taking into account the size of a finite vocabulary embedding, and align the privacy budget to a partitioned space. We further explore the impact of noise and client batch size on the privacy budget and compare it to the standard \(\epsilon \) derived from Local-DP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://www.merriam-webster.com/legal#::text=Search

  2. https://www.investopedia.com/financial-term-dictionary-4769738

  3. Concise Medical Dictionary. Oxford University Press, January 2010. https://doi.org/10.1093/acref/9780199557141.001.0001, https://doi.org/10.1093/acref/9780199557141.001.0001

  4. (Oct 2023), https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words

  5. Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, October 2016. https://doi.org/10.1145/2976749.2978318

  6. Adnan, M., Kalra, S., Cresswell, J.C., Taylor, G.W., Tizhoosh, H.R.: Federated learning and differential privacy for medical image analysis. Sci. Rep. 12(1), February 2022. https://doi.org/10.1038/s41598-022-05539-7

  7. Carvalho, R.S., Vasiloudis, T., Feyisetan, O.: TEM: High utility metric differential privacy on text (2021). https://doi.org/10.48550/ARXIV.2107.07928, https://arxiv.org/abs/2107.07928

  8. Chamikara, M.A.P., Liu, D., Camtepe, S., Nepal, S., Grobler, M., Bertok, P., Khalil, I.: Local differential privacy for federated learning (2022). https://doi.org/10.48550/ARXIV.2202.06053, https://arxiv.org/abs/2202.06053

  9. Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_5

    Chapter  Google Scholar 

  10. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1

    Chapter  Google Scholar 

  11. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    Chapter  Google Scholar 

  12. Elkordy, A.R., Zhang, J., Ezzeldin, Y.H., Psounis, K., Avestimehr, S.: How much privacy does federated learning with secure aggregation guarantee? (2022). https://doi.org/10.48550/ARXIV.2208.02304, https://arxiv.org/abs/2208.02304

  13. Eloul, S., Silavong, F., Kamthe, S., Georgiadis, A., Moran, S.J.: Enhancing privacy against inversion attacks in federated learning by using mixing gradients strategies (2022)

    Google Scholar 

  14. Feyisetan, O., Aggarwal, A., Xu, Z., Teissier, N.: Research challenges in designing differentially private text generation mechanisms (2020). https://doi.org/10.48550/ARXIV.2012.05403, https://arxiv.org/abs/2012.05403

  15. Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations (2019). https://doi.org/10.48550/ARXIV.1910.08902, https://arxiv.org/abs/1910.08902

  16. Gafni, T., Shlezinger, N., Cohen, K., Eldar, Y.C., Poor, H.V.: Federated learning: a signal processing perspective. IEEE Sign. Process. Mag. 39(3), 14–41 (2022). https://doi.org/10.1109/msp.2021.3125282

    Article  Google Scholar 

  17. Hu, L., Habernal, I., Shen, L., Wang, D.: Differentially private natural language models: Recent advances and future directions (2023)

    Google Scholar 

  18. Imola, J., Kasiviswanathan, S., White, S., Aggarwal, A., Teissier, N.: Balancing utility and scalability in metric differential privacy. In: UAI 2022 (2022). https://www.amazon.science/publications/balancing-utility-and-scalability-in-metric-differential-privacy

  19. Klymenko, O., Meisenbacher, S., Matthes, F.: Differential privacy in natural language processing the story so far. In: Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.privatenlp-1.1, https://doi.org/10.18653

  20. Koufogiannis, F., Han, S., Pappas, G.J.: Optimality of the Laplace mechanism in differential privacy (2015). https://doi.org/10.48550/ARXIV.1504.00065, https://arxiv.org/abs/1504.00065

  21. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (Jun 2011), https://aclanthology.org/P11-1015

  22. Meehan, C., Mrini, K., Chaudhuri, K.: Sentence-level privacy for document embeddings (2022). https://doi.org/10.48550/ARXIV.2205.04605, https://arxiv.org/abs/2205.04605

  23. Wei, K., et al.: Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 15, 3454–3469 (2020). https://doi.org/10.1109/TIFS.2020.2988575

    Article  Google Scholar 

  24. Zhao, Y., et al.: Local differential privacy based federated learning for Internet of Things (2020)

    Google Scholar 

  25. Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/file/60a6c4002cc7b29142def8871531281a-Paper.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Papadopoulos .

Editor information

Editors and Affiliations

Appendices

Appendix

Starting with LHS of 3 and substituting probability density function of Laplace distribution we get

$$\begin{aligned} \begin{aligned} \frac{\Pr (\textbf{M}(B, f, b))=y)}{\Pr (\textbf{M}(B', f, b))=y)} =\varPi _i \frac{\exp \big (-\frac{|y_i - \frac{\sum _{t}f({\textbf {x}}_t)_i }{|B|} |}{br}\big )}{\exp \big (-\frac{|y_i - \frac{\sum _{t'}f({\textbf {x}}_{t'})_i }{|B|} |}{br}\big )} \nonumber \\ =\varPi _i \frac{\exp \big (-\frac{ \frac{| |B| y_i -\sum _{t}f({\textbf {x}}_t)_i }{|B|} |}{br}\big )}{\exp \big (-\frac{ \frac{| |B| y_i -\sum _{t'}f({\textbf {x}}_{t'})_i }{|B|} |}{br}\big )} \nonumber \\ =\varPi _i \frac{\exp \big (- \frac{| |B| y_i -\sum _{t}f({\textbf {x}}_t)_i |}{|B|br}\big )}{\exp \big (- \frac{| |B| y_i -\sum _{t'}f({\textbf {x}}_{t'})_i | }{|B|br} \big )} \nonumber \\ = \varPi _i {\exp \big (\frac{||B|y_i - \sum _{t'}f({\textbf {x}}_{t'})_i| - ||B|y_i - \sum _{t}f({\textbf {x}}_{t})_i|}{|B|br}\big )} \nonumber \\ \le \varPi _i\exp {\big (\frac{| \sum _{t'}f({\textbf {x}}_{t'})_i - \sum _{t}f({\textbf {x}}_{t})_i |}{br|B|}\big ) } \nonumber \\ = \exp (\sum _i \big (\frac{| \sum _{t'}f({\textbf {x}}_{t'})_i - \sum _{t}f({\textbf {x}}_{t})_i |}{br|B|}\big ) \nonumber \\ = \exp (\epsilon ) \end{aligned} \end{aligned}$$

The first inequality follows from triangle inequality and in the last step \(\epsilon \) is substituted for its definition.

Fig. 3.
figure 3

From Fig. 1 The figure shows how the original text changes as we increase the Laplace noise we add to the embeddings and it maps the texts to the epsilon and d-rank. For the example we use a vocabulary of 1,000 tokens.

Fig. 4.
figure 4

From Fig. 2 The figure shows how the original text changes as we increase the size of the batches (k) we add to the embeddings and it maps the texts to the epsilon and d-rank. For the example we use a vocabulary of 1,000 tokens.

Disclaimer

This paper was prepared for informational purposes by the Global Technology Applied Research center of JPMorgan Chase & Co. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates makes any explicit or implied representation or warranty and none of them accept any liability in connection with this paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential legal, compliance, tax, or accounting effects thereof. This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Papadopoulos, G., Satsangi, Y., Eloul, S., Pistoia, M. (2024). Ranking Distance Metric for Privacy Budget in Distributed Learning of Finite Embedding Data. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14611. Springer, Cham. https://doi.org/10.1007/978-3-031-56066-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56066-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56065-1

  • Online ISBN: 978-3-031-56066-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics