Skip to main content

Privacy Preserving Text Representation Learning Using BERT

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12720))

Abstract

The availability of user generated textual data in different activities online, such as tweets and reviews has been used in many machine learning models. However, the user generated text could be a privacy leakage source for the individuals’ private-attributes. In this paper, we study the privacy issues in the user generated text and propose a privacy-preserving text representation learning framework, \({DP}_{BERT}\), which learns the textual representation. Our proposed framework uses BERT to extract the sentences embedding to learn the textual representation that (1) is differentially private to protect against identity leakage (e.g., if a target instance in the data or not), (2) protects against leakage of private-attributes information (e.g., age, gender, location), and (3) maintains the high utility of the given text.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alnasser, W., Beigi, G., Liu, H.: An overview on protecting user private-attribute information on social networks. In: Cruz-Cunha, M.M., Mateus-Coelho, N.R. (eds.) Handbook of Research on Cyber Crime and Information Privacy, Chap. 6 (2020)

    Google Scholar 

  2. Beigi, G., Liu, H.: A survey on privacy in social media: identification, mitigation, and applications. ACM/IMS Trans. Data Sci. 1(1) (2020). https://doi.org/10.1145/3343038

  3. Beigi, G., Shu, K., Guo, R., Wang, S., Liu, H.: Privacy preserving text representation learning, pp. 275–276 (2019). https://doi.org/10.1145/3342220.3344925

  4. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. J. Mach. Learn. Res. 12(29), 1069–1109 (2011). http://jmlr.org/papers/v12/chaudhuri11a.html

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)

    Google Scholar 

  6. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1

    Chapter  MATH  Google Scholar 

  7. Ethayarajh, K.: Unsupervised random walk sentence embeddings: a strong but simple baseline. In: Proceedings of the Third Workshop on Representation Learning for NLP, pp. 91–100. Association for Computational Linguistics, Melbourne, July 2018. https://doi.org/10.18653/v1/W18-3012. https://www.aclweb.org/anthology/W18-3012

  8. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4) (2010). https://doi.org/10.1145/1749603.1749605

  9. Hovy, D., Johannsen, A., Søgaard, A.: User review sites as a resource for large-scale sociolinguistic studies. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 452–461. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2015). https://doi.org/10.1145/2736277.2741141

  10. Hovy, D., Søgaard, A.: Tagging performance correlates with author age. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 483–488. Association for Computational Linguistics, Beijing, July 2015. https://doi.org/10.3115/v1/P15-2079. https://www.aclweb.org/anthology/P15-2079

  11. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)

    Google Scholar 

  12. Liu, P., et al.: Local differential privacy for social network publishing. Neurocomputing 391, 273–279 (2020). https://doi.org/10.1016/j.neucom.2018.11.104. http://www.sciencedirect.com/science/article/pii/S0925231219304229

  13. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1410. https://www.aclweb.org/anthology/D19-1410

  14. dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78. Dublin City University and Association for Computational Linguistics, Dublin, August 2014. https://www.aclweb.org/anthology/C14-1008

  15. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16

    Chapter  Google Scholar 

  16. Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002). https://doi.org/10.1142/S0218488502001648

    Article  MathSciNet  MATH  Google Scholar 

  17. Wang, B., Kuo, C.C.J.: SBERT-WK: a sentence embedding method by dissecting BERT-based word models (2020)

    Google Scholar 

Download references

Acknowledgement

This work, in part, is supported by the Saudi Arabian Cultural Mission (SACM) in the United States.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walaa Alnasser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alnasser, W., Beigi, G., Liu, H. (2021). Privacy Preserving Text Representation Learning Using BERT. In: Thomson, R., Hussain, M.N., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2021. Lecture Notes in Computer Science(), vol 12720. Springer, Cham. https://doi.org/10.1007/978-3-030-80387-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80387-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80386-5

  • Online ISBN: 978-3-030-80387-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics