Abstract
The growing number of cases indicates that large language model (LLM) brings transformative advancements while raising privacy concerns. Despite promising recent surveys proposed in the literature, there is still a lack of comprehensive analysis dedicated to text privacy specifically for LLM. By comprehensively collecting LLM privacy research, we summarize five privacy issues and their corresponding solutions during both model training and invocation and extend our analysis to three research focuses in LLM application. Moreover, we propose five further research directions and provide prospects for LLM native security mechanisms. Notably, we find that most LLM privacy research is still in the technical exploration phase, with the hope that this work can assist in LLM privacy development.
Similar content being viewed by others
References
Mitchell R. Samsung fab data leak: how ChatGPT exposed sensitive information. See electropages.com/blog/2023/04/how-chatgpt-exposed-sensitive-information website, 2023
Gupta M, Akiri C, Aryal K, Parker E, Praharaj L. From ChatGPT to THreatGPT: impact of generative AI in cybersecurity and privacy. IEEE Access, 2023, 11: 80218–80245
Nasr M, Carlini N, Hayase J, Jagielski M, Cooper A F, Ippolito D, Choquette-Choo C A, Wallace E, Tramèr F, Lee K. Scalable extraction of training data from (production) language models. 2023, arXiv preprint arXiv: 2311.17035
Coles C. 11% of data employees paste into ChatGPT is confidential. Cyberhaven. See cyberhaven.com/blog/4–2-of-workers-have-pasted-company-data-into-chatgpt website, 2023
Staab R, Vero M, Balunovic M, Vechev M. Beyond memorization: violating privacy via inference with large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Mehnaz S, Dibbo S V, Kabir E, Li N, Bertino E. Are your sensitive attributes private? Novel model inversion attribute inference attacks on classification models. In: Proceedings of the 31st USENIX Security Symposium (USENIX Security 22). 2022, 4579–4596
Carlini N, Tramèr F, Wallace E, Jagielski M, Herbert-Voss A, Lee K, Roberts A, Brown T B, Song D, Erlingsson U, Oprea A, Raffel C. Extracting training data from large language models. In: Bailey M D, Greenstadt R, eds. 30th USENIX Security Symposium. Berkeley: USENIX Association, 2021, 2633–2650
Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, Peng W, Feng X, Qin B, Liu T. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. 2023, arXiv preprint arXiv: 2311.05232
Hartmann V, Suri A, Bindschaedler V, Evans D, Tople S, West R. SoK: memorization in general-purpose large language models. 2023, arXiv preprint arXiv: 2310.18362
Wang Y, Pan Y, Yan M, Su Z, Luan T H. A survey on ChatGPT: AI-generated contents, challenges, and solutions. IEEE Open Journal of the Computer Society, 2023, 4: 280–302
Kshetri N. Cybercrime and privacy threats of large language models. IT Professional, 2023, 25(3): 9–13
Wang T, Zhang Y, Qi S, Zhao R, Xia Z, Weng J. Security and privacy on generative data in AIGC: a survey. ACM Computing Surveys, 2024
Mozes M, He X, Kleinberg B, Griffin L D. Use of LLMs for illicit purposes: threats, prevention measures, and vulnerabilities. 2023, arXiv preprint arXiv: 2308.12833
Smith V, Shamsabadi A S, Ashurst C, Weller A. Identifying and mitigating privacy risks stemming from language models: a survey. 2023, arXiv preprint arXiv: 2310.01424
Yao Y, Duan J, Xu K, Cai Y, Sun Z, Zhang Y. A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High-Confidence Computing, 2024, 4(2): 100211
Iqbal U, Kohno T, Roesner F. LLM platform security: applying a systematic evaluation framework to OpenAI’s ChatGPT plugins. In: Proceedings of the 7th AAAI/ACM Conference on AI, Ethics, and Society. 2024, 611–623
Brown H, Lee K, Mireshghallah F, Shokri R, Tramèr F. What does it mean for a language model to preserve privacy? In: Proceedings of 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022, 2280–2292
Zhang D, Finckenberg-Broman P, Hoang T, Pan S, Xing Z, Staples M, Xu X. Right to be forgotten in the era of large language models: implications, challenges, and solutions. AI and Ethics, 20241-10
Neel S, Chang P. Privacy issues in large language models: a survey. 2023, arXiv preprint arXiv: 2312.06717
Das B C, Amini M H, Wu Y. Security and privacy challenges of large language models: a survey. 2024, arXiv preprint arXiv: 2402.00888
Ye J, Chen X, Xu N, Zu C, Shao Z, Liu S, Cui Y, Zhou Z, Gong C, Shen Y, Zhou J, Chen S, Gui T, Zhang Q, Huang X. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. 2023, arXiv preprint arXiv: 2303.10420
Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, de Las Casas D, Hendricks L A, Welbl J, Clark A, Hennigan T, Noland E, Millican K, van den Driessche G, Damoc B, Guy A, Osindero S, Simonyan K, Elsen E, Vinyals O, Rae J W, Sifre L. An empirical analysis of compute-optimal large language model training. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 30016–30030
Yang H, Liu X, Wang C D. FinGPT: open-source financial large language models. 2023, arXiv preprint arXiv: 2306.06031
Li Y, Li Z, Zhang K, Dan R, Jiang S, Zhang Y. ChatDoctor: a medical chat model fine-tuned on a large language model meta-AI (LLaMA) using medical domain knowledge. 2023, arXiv preprint arXiv: 2303.14070
Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022
Lester B, Al-Rfou R, Constant N. The power of scale for parameter-efficient prompt tuning. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 3045–3059
Zhao Z, Wallace E, Feng S, Klein D, Singh S. Calibrate before use: improving few-shot performance of language models. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 12697–12706
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe L. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 27730–27744
Ullah I, Hassan N, Gill S S, Suleiman B, Ahanger T A, Shah Z, Qadir J, Kanhere S S. Privacy preserving large language models: ChatGPT case study based vision and framework. 2023, arXiv preprint arXiv: 2310.12523
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774
Jayaraman B, Ghosh E, Chase M, Roy S, Dai W, Evans D. Combing for credentials: active pattern extraction from smart reply. In: Proceedings of 2024 IEEE Symposium on Security and Privacy (SP). 2023, 1443–1461
Mireshghallah F, Goyal K, Uniyal A, Berg-Kirkpatrick T, Shokri R. Quantifying privacy risks of masked language models using membership inference attacks. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 8332–8347
Mouhammad N, Daxenberger J, Schiller B, Habernal I. Crowdsourcing on sensitive data with privacy-preserving text rewriting. In: Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 2023, 73–84
Tang X, Shin R, Inan H A, Manoel A, Mireshghallah F, Lin Z, Gopi S, Kulkarni J, Sim R. Privacy-preserving in-context learning with differentially private few-shot generation. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Tang C, Liu Z, Ma C, Wu Z, Li Y, Liu W, Zhu D, Li Q, Li X, Liu T, Fan L. PolicyGPT: automated analysis of privacy policies with large language models. 2023, arXiv preprint arXiv: 2309.10238
Pan S, Hoang T, Zhang D, Xing Z, Xu X, Lu Q, Staples M. Toward the cure of privacy policy reading phobia: automated generation of privacy nutrition labels from privacy policies. 2023, arXiv preprint arXiv: 2306.10923
Chanenson J, Pickering M, Apthorpe N. Automating governing knowledge commons and contextual integrity (GKC-CI) privacy policy annotations with large language models. 2023, arXiv preprint arXiv: 2311.02192
Lukas N, Salem A, Sim R, Tople S, Wutschitz L, Zanella-Béguelin S. Analyzing leakage of personally identifiable information in language models. In: Proceedings of 2013 IEEE Symposium on Security and Privacy (SP). 2023, 346–363
Chen X, Tang S, Zhu R, Yan S, Jin L, Wang Z, Su L, Zhang Z, Wang X, Tang H. The Janus interface: how fine-tuning in large language models amplifies the privacy risks. 2023, arXiv preprint arXiv: 2310.15469
Huang J, Shao H, Chang K C C. Are large pre-trained language models leaking your personal information? In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 2038–2047
Kandpal N, Wallace E, Raffel C. Deduplicating training data mitigates privacy risks in language models. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 10697–10707
Lee K, Ippolito D, Nystrom A, Zhang C, Eck D, Callison-Burch C, Carlini N. Deduplicating training data makes language models better. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 8424–8445
Zhang C, Ippolito D, Lee K, Jagielski M, Tramèr F, Carlini N. Counterfactual memorization in neural language models. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 39321–39362
Yue X, Inan H A, Li X, Kumar G, McAnallen J, Shajari H, Sun H, Levitan D, Sim R. Synthetic text generation with differential privacy: a simple and practical recipe. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 1321–1342
Majmudar J, Dupuy C, Peris C, Smaili S, Gupta R, Zemel R. Differentially private decoding in large language models. 2022, arXiv preprint arXiv: 2205.13621
Xiao Y, Jin Y, Bai Y, Wu Y, Yang X, Luo X, Yu W, Zhao X, Liu Y, Gu Q, Chen H, Wang W, Cheng W. PrivacyMind: large language models can be contextual privacy protection learners. 2023, arXiv preprint arXiv: 2310.02469
Behnia R, Ebrahimi M R, Pacheco J, Padmanabhan B. EW-Tune: a framework for privately fine-tuning large language models with differential privacy. In: Proceedings of 2022 IEEE International Conference on Data Mining Workshops (ICDMW). 2022, 560–566
West P, Lu X, Holtzman A, Bhagavatula C, Hwang J D, Choi Y. Reflective decoding: beyond unidirectional generation with off-the-shelf language models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 1435–1450
Gokaslan A, Cohen V. Openwebtext corpus. See Skylion007.github.io/OpenWebTextCorpus website, 2019
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): 140
Beyer A, Kauermann G, Schütze H. Embedding space correlation as a measure of domain similarity. In: Proceedings of the 12th Language Resources and Evaluation Conference. 2020, 2431–2439
Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T. One billion word benchmark for measuring progress in statistical language modeling. 2013, arXiv preprint arXiv: 1312.3005
Zellers R, Holtzman A, Rashkin H, Bisk Y, Farhadi A, Roesner F, Choi Y. Defending against neural fake news. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 812
Tur G, Hakkani-Tür D, Heck L. What is left to be understood in ATIS? In: Proceedings of 2010 IEEE Spoken Language Technology Workshop. 2010, 19–24
Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, Primet M, Dureau J. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. 2018, arXiv preprint arXiv: 1805.10190
Li J, Ott M, Cardie C. Identifying manipulated offerings on review portals. In: Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. 2013, 1933–1942
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S R. Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 2018, 353–355
Socher R, Perelygin A, Wu J, Chuang J, Manning C D, Ng A Y, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. 2013, 1631–1642
Williams A, Nangia N, Bowman S R. A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2017, 1112–1122
Chalkidis I, Androutsopoulos I, Aletras N. Neural legal judgment prediction in English. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 4317–4323
Klimt B, Yang Y. The Enron corpus: a new dataset for email classification research. In: Proceedings of the 15th European Conference on Machine Learning. 2004, 217–226
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 649–657
Liu J, Cyphers S, Pasupat P, McGraw I, Glass J. A conversational movie search system based on conditional random fields. In: Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012, 2454–2457
Villalobos P, Sevilla J, Heim L, Besiroglu T, Hobbhahn M, Ho A. Will we run out of data? An analysis of the Limits of scaling datasets in machine learning. 2022, arXiv preprint arXiv: 2211.04325
Chen J, Yang D. Unlearn what you want to forget: efficient unlearning for LLMs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 12041–12052
Borkar J. What can we learn from data leakage and unlearning for law? 2023, arXiv preprint arXiv: 2307.10476
Hu Z, Zhang Y, Xiao M, Wang W, Feng F, He X. Exact and efficient unlearning for large language model-based recommendation. 2024, arXiv preprint arXiv: 2404.10327
Ziegler C N, McNee S M, Konstan J A, Lausen G. Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 22–32
Harper F M, Konstan J A. The MovieLens datasets: history and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 2015, 5(4): 19
Jang J, Yoon D, Yang S, Cha S, Lee M, Logeswaran L, Seo M. Knowledge unlearning for mitigating privacy risks in language models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 14389–14408
Wang L, Chen T, Yuan W, Zeng X, Wong K F, Yin H. KGA: a general machine unlearning framework based on knowledge gap alignment. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 13264–13276
Tuggener D, von Däniken P, Peetz T, Cieliebak M. LEDGAR: a large-scale multi-label corpus for text classification of legal provisions in contracts. In: Proceedings of the 12th Language Resources and Evaluation Conference. 2020, 1235–1241
Cettolo M, Niehues J, Stüker S, Bentivogli L, Federico M. Report on the 11th IWSLT evaluation campaign. In: Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign. 2014, 2–17
Zhang S, Dinan E, Urbanek J, Szlam A, Kiela D, Weston J. Personalizing dialogue agents: I have a dog, do you have pets too? In: Proceedings of the 6th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, 2204–2213
Maas A, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142–150
Gliwa B, Mochol I, Biesek M, Wawer A. SAMSum corpus: a human-annotated dialogue dataset for abstractive summarization. In: Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019, 70–79
Zaman K, Choshen L, Srivastava S. Fuse to forget: bias reduction and selective memorization through model fusion. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 18763–18783
Rangel F, Rosso P, Verhoeven B, Daelemans W, Potthast M, Stein B. Overview of the 4th author profiling task at pan 2016: cross-genre evaluations. In: Proceedings of the Working Notes Papers of the CLEF 2016 Evaluation Labs. 2016, 750–784
Pawelczyk M, Neel S, Lakkaraju H. In-context unlearning: language models as few-shot unlearners. In: Proceedings of the 41st International Conference on Machine Learning. 2024
Bourtoule L, Chandrasekaran V, Choquette-Choo C A, Jia H, Travers A, Zhang B, Lie D, Papernot N. Machine unlearning. In: Proceedings of 2021 IEEE Symposium on Security and Privacy (SP). 2021, 141–159
Koch K, Soll M. No matter how you slice it: machine unlearning with SISA comes at the expense of minority classes. In: Proceedings of 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 2023, 622–637
Li X L, Liang P. Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 4582–4597
Wen R, Wang T, Backes M, Zhang Y, Salem A. Last one standing: a comparative analysis of security and privacy of soft prompt tuning, LoRA, and in-context learning. 2023, arXiv preprint arXiv: 2310.11397
Liu R, Wang T, Cao Y, Xiong L. PreCurious: how innocent pre-trained language models turn into privacy traps. 2024, arXiv preprint arXiv: 2403.09562
Qi X, Zeng Y, Xie T, Chen P Y, Jia R, Mittal P, Henderson P. Fine-tuning aligned language models compromises safety, even when users do not intend to! In: Proceedings of the 12th International Conference on Learning Representations. 2024
Li H, Guo D, Fan W, Xu M, Huang J, Meng F, Song Y. Multi-step jailbreaking privacy attacks on ChatGPT. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 4138–4153
Deng G, Liu Y, Li Y, Wang K, Zhang Y, Li Z, Wang H, Zhang T, Liu Y. MASTERKEY: automated jailbreaking of large language model Chatbots. In: Proc. ISOC NDSS. 2024
Carlini N, Liu C, Erlingsson Ú, Kos J, Song D. The secret sharer: evaluating and testing unintended memorization in neural networks. In: Proceedings of the 28th USENIX Conference on Security Symposium. 2019, 267–284
Ai4Privacy. PII Masking 300k Dataset. See huggingface.co/datasets/ai4privacy/pii-masking-300k website, 2024
Merity S, Xiong C, Bradbury J, Socher R. Pointer sentinel mixture models. In: Proceedings of the 5th International Conference on Learning Representations. 2016
Kornblith S, Norouzi M, Lee H, Hinton G. Similarity of neural network representations revisited. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3519–3529
Zhang Y, Ippolito D. Prompts should not be seen as secrets: systematically measuring prompt extraction attack success. 2023, arXiv preprint arXiv: 2307.06865
Duan H, Dziedzic A, Papernot N, Boenisch F. Flocks of stochastic parrots: differentially private prompt learning for large language models. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 3358
Zhang X, Chen C, Xie Y, Chen X, Zhang J, Xiang Y. A survey on privacy inference attacks and defenses in cloud-based deep neural network. Computer Standards & Interfaces, 2023, 83: 103672
Plant R, Gkatzia D, Giuffrida V. CAPE: Context-Aware Private Embeddings for private language learning. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 7970–7978
Li Y, Tan Z, Liu Y. Privacy-preserving prompt tuning for large language model services. 2023, arXiv preprint arXiv: 2305.06212
Feyisetan O, Balle B, Drake T, Diethe T. Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. In: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020, 178–186
Zhang Z, Zhang X, Xie W, Lu Y. Responsible task automation: empowering large language models as responsible task Automators. 2023, arXiv preprint arXiv: 2306.01242
Ribeiro B, Rolla V, Santos R. INCOGNITUS: a toolbox for automated clinical notes anonymization. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2023, 187–194
Chen Y, Li T, Liu H, Yu Y. Hide and seek (HaS): a lightweight framework for prompt privacy protection. 2023, arXiv preprint arXiv: 2309.03057
Kan Z, Qiao L, Yu H, Peng L, Gao Y, Li D. Protecting user privacy in remote conversational systems: a privacy-preserving framework based on text sanitization. 2023, arXiv preprint arXiv: 2306.08223
Goyal S, Choudhury A R, Raje S, Chakaravarthy V, Sabharwal Y, Verma A. PoWER-BERT: accelerating BERT inference via progressive word-vector elimination. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 346
Modarressi A, Mohebbi H, Pilehvar M T. AdapLeR: speeding up inference by adaptive length reduction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 1–15
Zhou X, Lu J, Gui T, Ma R, Fei Z, Wang Y, Ding Y, Cheung Y, Zhang Q, Huang X J. TextFusion: privacy-preserving pre-trained model inference via token fusion. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 8360–8371
Beckerich M, Plein L, Coronado S. RatGPT: Turning online LLMs into proxies for malware attacks. 2023, arXiv preprint arXiv: 2308.09183
Li Y, Wei F, Zhao J, Zhang C, Zhang H. Rain: your language models can align themselves without finetuning. 2023, arXiv preprint arXiv: 2309.07124
Pisano M, Ly P, Sanders A, Yao B, Wang D, Strzalkowski T, Si M. Bergeron: combating adversarial attacks through a conscience-based alignment framework. 2023, arXiv preprint arXiv: 2312.00029
Cao B, Cao Y, Lin L, Chen J. Defending against alignment-breaking attacks via robustly aligned LLM. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 10542–10560
Inan H, Upasani K, Chi J, Rungta R, Iyer K, Mao Y, Tontchev M, Hu Q, Fuller B, Testuggine D, Testuggine M. Llama guard: LLM-based input-output safeguard for human-AI conversations. 2023, arXiv preprint arXiv: 2312.06674
Meta. Llm-guard. See llm-guard.com website, 2024
Microsoft. Presidio analyzer. See pypi.org/project/presidio-analyzer/website, 2024
Xu Z, Liu Y, Deng G, Li Y, Picek S. LLM jailbreak attack versus defense techniques–a comprehensive study. 2024, arXiv preprint arXiv: 2402.13457
Yang X, Chen K, Zhang W, Liu C, Qi Y, Zhang J, Fang H, Yu N. Watermarking text generated by black-box language models. 2023, arXiv preprint arXiv: 2305.08883
Tu S, Sun Y, Bai Y, Yu J, Hou L, Li J. WaterBench: towards holistic evaluation of watermarks for large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 1517–1542
Singh K, Zou J. New evaluation metrics capture quality degradation due to LLM watermarking. 2023, arXiv preprint arXiv: 2312.02382
Li Y, Li Q, Cui L, Bi W, Wang Z, Wang L, Yang L, Shi S, Zhang Y. MAGE: machine-generated text detection in the wild. 2023, arXiv preprint arXiv: 2305.13242
Antoun W, Mouilleron V, Sagot B, Seddah D. Towards a robust detection of language model generated text: is ChatGPT that easy to detect? 2023, arXiv preprint arXiv: 2306.05871
Chen Y, Kang H, Zhai V, Li L, Singh R, Ramakrishnan B. GPT-sentinel: distinguishing human and ChatGPT generated content. 2023, arXiv preprint arXiv: 2305.07969
Sarvazyan A M, González J Á, Rosso P, Franco-Salvador M. Supervised machine-generated text detectors: family and scale matters. In: Proceedings of the 14th International Conference of the CLEF Association on Experimental IR Meets Multilinguality, Multimodality, and Interaction. 2023, 121–132
Bhattacharjee A, Kumarage T, Moraffah R, Liu H. ConDA: contrastive domain adaptation for AI-generated text detection. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 598–610
Liu Y, Zhang Z, Zhang W, Yue S, Zhao X, Cheng X, Zhang Y, Hu H. ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models. 2023, arXiv preprint arXiv: 2304.07666
Bhattacharjee A, Liu H. Fighting fire with fire: can ChatGPT detect AI-generated text?. ACM SIGKDD Explorations Newsletter, 2023, 25(2): 14–21
Koike R, Kaneko M, Okazaki N. OUTFOX: LLM-generated essay detection through in-context learning with adversarially generated examples. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 21258–21266
Yu X, Qi Y, Chen K, Chen G, Yang X, Zhu P, Zhang W, Yu N. GPT paternity test: GPT generated text detection with GPT genetic inheritance. 2023, arXiv preprint arXiv: 2305.12519
Qiu W, Lie D, Austin L. Calpric: inclusive and fine-grain labeling of privacy policies with crowdsourcing and active learning. In: Proceedings of the 32nd USENIX Conference on Security Symposium. 2023, 60
Cui H, Trimananda R, Markopoulou A, Jordan S. POLIGRAPH: automated privacy policy analysis using knowledge graphs. In: Proceedings of the 32nd USENIX Conference on Security Symposium. 2023, 59
Chen C, Feng X, Li Y, Lyu L, Zhou J, Zheng X, Yin J. Integration of large language models and federated learning. 2023, arXiv preprint arXiv: 2307.08925
Kuang W, Qian B, Li Z, Chen D, Gao D, Pan X, Xie Y, Li Y, Ding B, Zhou J. FederatedScope-LLM: a comprehensive package for fine-tuning large language models in federated learning. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 5260–5271
Gupta S, Huang Y, Zhong Z, Gao T, Li K, Chen D. Recovering private text in federated learning of language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 8130–8143
Xiao G, Lin J, Han S. Offsite-tuning: transfer learning without full model. 2023, arXiv preprint arXiv: 2302.04870
Fan T, Kang Y, Ma G, Chen W, Wei W, Fan L, Yang Q. Fate-LLM: a industrial grade federated learning framework for large language models. 2023, arXiv preprint arXiv: 2310.10049
Gong Y. Multilevel large language models for everyone. 2023, arXiv preprint arXiv: 2307.13221
Xi Z, Chen W, Guo X, He W, Ding Y, et al. The rise and potential of large language model based agents: a survey. 2023, arXiv preprint arXiv: 2309.07864
Wu Q, Bansal G, Zhang J, Wu Y, Zhang S, Zhu E, Li B, Jiang L, Zhang X, Wang C. AutoGen: enabling next-gen LLM applications via multi-agent conversation. 2023, arXiv preprint arXiv: 2308.08155
Zhang H, Du W, Shan J, Zhou Q, Du Y, Tenenbaum J B, Shu T, Gan C. Building cooperative embodied agents modularly with large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Liu X, Lai H, Yu H, Xu Y, Zeng A, Du Z, Zhang P, Dong Y, Tang J. WebGLM: towards an efficient web-enhanced question answering system with human preferences. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 4549–4560
Zeng S, Zhang J, He P, Liu Y, Xing Y, Xu H, Ren J, Chang Y, Wang S, Yin D, Tang J. The good and the bad: Exploring privacy issues in retrieval-augmented generation (RAG). In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. 2024, 4505–4524
Chen Y, Mendes E, Das S, Xu W, Ritter A. Can language models be instructed to protect personal information? 2023, arXiv preprint arXiv: 2310.02224
Niu L, Mirza S, Maradni Z, Pöpper C. CodexLeaks: privacy leaks from code generation language models in GitHub copilot. In: Proceedings of the 32nd USENIX Conference on Security Symposium. 2023, 120
He X, Zannettou S, Shen Y, Zhang Y. You only prompt once: on the capabilities of prompt learning on large language models to tackle toxic content. In: Proceedings of 2024 IEEE Symposium on Security and Privacy (SP). 2024, 770–787
Samson L, Barazani N, Ghebreab S, Asano Y M. Privacy-aware visual language models. 2024, arXiv preprint arXiv: 2405.17423
Caldarella S, Mancini M, Ricci E, Aljundi R. The phantom menace: unmasking privacy leakages in vision-language models. 2024, arXiv preprint arXiv: 2408.01228
Bengio Y, Hu E J. Scaling in the service of reasoning & model-based ML. See yoshuabengio.org/2023/03/21/scaling-in-the-service-of-reasoning-model-based-ml/website, 2023
Acknowledgements
This work was supported by the National Key R&D Program of China (2021YFC3300600).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Hongyi LI received the BE degree from Tianjin University, China in 2022. She is currently working toward the PhD degree with the School of Computer Science, Fudan University, China. Her current research interests include financial technology security and data privacy.
Jiawei YE is currently a faculty member with the School of Computer Science at Fudan University and a senior engineer. His primary research interests include network and information security, sensitive information protection.
Jie WU is currently a professor in the School of Computer Science, Fudan University, China. His main research interests include network multimedia and information security.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Li, H., Ye, J. & Wu, J. Privacy dilemmas and opportunities in large language models: a brief review. Front. Comput. Sci. 19, 1910356 (2025). https://doi.org/10.1007/s11704-024-40583-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-024-40583-8