Privacy Preserving Text Representation Learning Using BERT

Alnasser, Walaa; Beigi, Ghazaleh; Liu, Huan

doi:10.1007/978-3-030-80387-2_9

Privacy Preserving Text Representation Learning Using BERT

Walaa Alnasser¹²,
Ghazaleh Beigi¹³ &
Huan Liu¹²

Conference paper
First Online: 04 July 2021

1333 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12720))

Abstract

The availability of user generated textual data in different activities online, such as tweets and reviews has been used in many machine learning models. However, the user generated text could be a privacy leakage source for the individuals’ private-attributes. In this paper, we study the privacy issues in the user generated text and propose a privacy-preserving text representation learning framework, \({DP}_{BERT}\), which learns the textual representation. Our proposed framework uses BERT to extract the sentences embedding to learn the textual representation that (1) is differentially private to protect against identity leakage (e.g., if a target instance in the data or not), (2) protects against leakage of private-attributes information (e.g., age, gender, location), and (3) maintains the high utility of the given text.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alnasser, W., Beigi, G., Liu, H.: An overview on protecting user private-attribute information on social networks. In: Cruz-Cunha, M.M., Mateus-Coelho, N.R. (eds.) Handbook of Research on Cyber Crime and Information Privacy, Chap. 6 (2020)
Google Scholar
Beigi, G., Liu, H.: A survey on privacy in social media: identification, mitigation, and applications. ACM/IMS Trans. Data Sci. 1(1) (2020). https://doi.org/10.1145/3343038
Beigi, G., Shu, K., Guo, R., Wang, S., Liu, H.: Privacy preserving text representation learning, pp. 275–276 (2019). https://doi.org/10.1145/3342220.3344925
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12(29), 1069–1109 (2011). http://jmlr.org/papers/v12/chaudhuri11a.html
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)
Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Chapter MATH Google Scholar
Ethayarajh, K.: Unsupervised random walk sentence embeddings: a strong but simple baseline. In: Proceedings of the Third Workshop on Representation Learning for NLP, pp. 91–100. Association for Computational Linguistics, Melbourne, July 2018. https://doi.org/10.18653/v1/W18-3012. https://www.aclweb.org/anthology/W18-3012
Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4) (2010). https://doi.org/10.1145/1749603.1749605
Hovy, D., Johannsen, A., Søgaard, A.: User review sites as a resource for large-scale sociolinguistic studies. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 452–461. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2015). https://doi.org/10.1145/2736277.2741141
Hovy, D., Søgaard, A.: Tagging performance correlates with author age. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 483–488. Association for Computational Linguistics, Beijing, July 2015. https://doi.org/10.3115/v1/P15-2079. https://www.aclweb.org/anthology/P15-2079
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)
Google Scholar
Liu, P., et al.: Local differential privacy for social network publishing. Neurocomputing 391, 273–279 (2020). https://doi.org/10.1016/j.neucom.2018.11.104. http://www.sciencedirect.com/science/article/pii/S0925231219304229
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1410. https://www.aclweb.org/anthology/D19-1410
dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78. Dublin City University and Association for Computational Linguistics, Dublin, August 2014. https://www.aclweb.org/anthology/C14-1008
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002). https://doi.org/10.1142/S0218488502001648
Article MathSciNet MATH Google Scholar
Wang, B., Kuo, C.C.J.: SBERT-WK: a sentence embedding method by dissecting BERT-based word models (2020)
Google Scholar

Download references

Acknowledgement

This work, in part, is supported by the Saudi Arabian Cultural Mission (SACM) in the United States.

Author information

Authors and Affiliations

Arizona State University, Tempe, USA
Walaa Alnasser & Huan Liu
Google, Sunnyvale, USA
Ghazaleh Beigi

Authors

Walaa Alnasser
View author publications
You can also search for this author in PubMed Google Scholar
Ghazaleh Beigi
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walaa Alnasser .

Editor information

Editors and Affiliations

United States Military Academy, West Point, NY, USA
Robert Thomson
University of Arkansas at Little Rock, Little Rock, AR, USA
Muhammad Nihal Hussain
Bucknell University, Lewisburg, PA, USA
Christopher Dancy
United States Military Academy, West Point, NY, USA
Aryn Pyke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alnasser, W., Beigi, G., Liu, H. (2021). Privacy Preserving Text Representation Learning Using BERT. In: Thomson, R., Hussain, M.N., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2021. Lecture Notes in Computer Science(), vol 12720. Springer, Cham. https://doi.org/10.1007/978-3-030-80387-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-80387-2_9
Published: 04 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80386-5
Online ISBN: 978-3-030-80387-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics