On Membership Inference Attacks to Generative Language Models Across Language Domains

Oh, Myung Gyo; Park, Leo Hyun; Kim, Jaeuk; Park, Jaewoo; Kwon, Taekyoung

doi:10.1007/978-3-031-25659-2_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13720))

Included in the following conference series:

International Conference on Information Security Applications

645 Accesses

Abstract

The confidentiality threat against training data has become a significant security problem in neural language models. Recent studies have shown that memorized training data can be extracted by injecting well-chosen prompts into generative language models. While these attacks have achieved remarkable success in the English-based Transformer architecture, it is unclear whether they are still effective in other language domains. This paper studies the effectiveness of attacks against Korean models and the potential for attack improvements that might be beneficial for future defense studies.

The contribution of this study is two-fold. First, we perform a membership inference attack against the state-of-the-art Korea-based GPT model. We found approximate training data with 20% to 90% precision in the top 100 samples and confirmed that the proposed attack technique for naive GPT is valid across the language domains. Second, in this process, we observed that the redundancy of the selected sentences could hardly be detected with the existing attack method. Since the information appearing in a few documents is more likely to be meaningful, it is desirable to increase the uniqueness of the sentences to improve the effectiveness of the attack. Thus, we propose a deduplication strategy to replace the traditional word-level similarity metric with the BPE token level. As a result, we show 6% to 22% of the underestimated samples among the selected samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Two, three, and five identical sentences appear 6, 1, and 4 times, respectively.

References

difflib – helpers for computing deltas. https://docs.python.org/3/library/difflib.html, Accessed 01 May 2022
Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Black, S., Gao, L., Wang, P., Leahy, C., Biderman, S.: Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow. If you use this software, please cite it using these metadata 58 (2021)
Google Scholar
Brown, H., Lee, K., Mireshghallah, F., Shokri, R., Tramèr, F.: What does it mean for a language model to preserve privacy? arXiv preprint arXiv:2202.05520 (2022)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C.: Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646 (2022)
Carlini, N., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2633–2650 (2021)
Google Scholar
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Chapter Google Scholar
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Gailly, J.l., Adler, M.: Zlib compression library (2004)
Google Scholar
Hayes, J., Melis, L., Danezis, G., De Cristofaro, E.: Logan: membership inference attacks against generative models. In: Proceedings on Privacy Enhancing Technologies (PoPETs), De Gruyter, vol. 2019, pp. 133–152 (2019)
Google Scholar
Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751 (2019)
Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: a survey. ACM Comput. Surv. (CSUR) 54, 1–37 (2021)
Google Scholar
Jagannatha, A., Rawat, B.P.S., Yu, H.: Membership inference attack susceptibility of clinical language models. arXiv preprint arXiv:2104.08305 (2021)
Kandpal, N., Wallace, E., Raffel, C.: Deduplicating training data mitigates privacy risks in language models. arXiv preprint arXiv:2202.06539 (2022)
Kim, B., et al.: What changes can large-scale language models bring? intensive study on hyperclova: billions-scale Korean generative pretrained transformers. arXiv preprint arXiv:2109.04650 (2021)
Kim, I., Han, G., Ham, J., Baek, W.: Kogpt: Kakaobrain Korean (hangul) generative pre-trained transformer. https://github.com/kakaobrain/kogpt (2021)
Lee, K., et al.: Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499 (2021)
Lehman, E., Jain, S., Pichotta, K., Goldberg, Y., Wallace, B.C.: Does bert pretrained on clinical notes reveal sensitive data? arXiv preprint arXiv:2104.07762 (2021)
Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., Shokri, R.: Quantifying privacy risks of masked language models using membership inference attacks. arXiv preprint arXiv:2203.03929 (2022)
Nasr, M., Songi, S., Thakurta, A., Papemoti, N., Carlin, N.: Adversary instantiation: lower bounds for differentially private machine learning. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 866–882. IEEE (2021)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Roller, S., et al.: Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637 (2020)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Shejwalkar, V., Inan, H.A., Houmansadr, A., Sim, R.: Membership inference attacks against nlp classification models. In: NeurIPS 2021 Workshop Privacy in Machine Learning (2021)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
Google Scholar
Song, C., Raghunathan, A.: Information leakage in embedding models. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 377–390 (2020)
Google Scholar
Thakkar, O.D., Ramaswamy, S., Mathews, R., Beaufays, F.: Understanding unintended memorization in language models under federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 1–10 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1–11 (2017)
Google Scholar
Wang, B., Komatsuzaki, A.: Gpt-j-6b: A 6 billion parameter autoregressive language model (2021)
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Zhang, C., Ippolito, D., Lee, K., Jagielski, M., Tramèr, F., Carlini, N.: Counterfactual memorization in neural language models. arXiv preprint arXiv:2112.12938 (2021)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
Google Scholar
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.: Extractive summarization as text matching. arXiv preprint arXiv:2004.08795 (2020)

Download references

Acknowledgments

We thank the reviewers for their insightful feedback. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1A2C1088802).

Author information

Authors and Affiliations

Graduate School of Information, Yonsei University, Seoul, 03722, South Korea
Myung Gyo Oh, Leo Hyun Park, Jaeuk Kim, Jaewoo Park & Taekyoung Kwon

Authors

Myung Gyo Oh
View author publications
You can also search for this author in PubMed Google Scholar
Leo Hyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Jaeuk Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jaewoo Park
View author publications
You can also search for this author in PubMed Google Scholar
Taekyoung Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taekyoung Kwon .

Editor information

Editors and Affiliations

Soonchunhyang University, Asan-Si, Korea (Republic of)
Ilsun You
Dankook University, Yongin, Korea (Republic of)
Taek-Young Youn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oh, M.G., Park, L.H., Kim, J., Park, J., Kwon, T. (2023). On Membership Inference Attacks to Generative Language Models Across Language Domains. In: You, I., Youn, TY. (eds) Information Security Applications. WISA 2022. Lecture Notes in Computer Science, vol 13720. Springer, Cham. https://doi.org/10.1007/978-3-031-25659-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-25659-2_11
Published: 04 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25658-5
Online ISBN: 978-3-031-25659-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Membership Inference Attacks to Generative Language Models Across Language Domains