Skip to main content

Anonymization-as-a-Service: The Service Center Transcripts Industrial Case

  • Conference paper
  • First Online:
Service-Oriented Computing (ICSOC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14420))

Included in the following conference series:

  • 332 Accesses

Abstract

Modern Big Data Analytics services require compliance with non-functional requirements such as privacy, in order to align with the introduced legislation such as the General Data Protection Regulation (GDPR). Specifically, the Telco industry has been using Big Data Analytics solutions for service continuity, whose basic steps revolve around automatically transcribing call center text data to extract valuable insights and enhance customer service. Such data obviously contains Personal Identifiable Information (PII) which hampers privacy-sensitive service operations if not handled properly. To meet these requirements we created Deperson—an efficient rule-based data anonymization service—which enables companies to anonymize customer data effectively while preserving its utility for further analysis. As a proof-of-concept, Deperson has been integrated into an existing Big Data Analytics solution in the Customer Contact Analytics department of a major Dutch Telco provider to ensure compliance with GDPR regulations. Based on dictionary look-ups and pattern-matching rules Deperson effectively removes PII achieving an accuracy of 0.82 while maintaining the essential information necessary for analysis. Our concept shows that Deperson plays a significant role in enabling the extraction and further processing of valuable insights from customer data without risking non-compliance with GDPR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    available online at: https://github.com/kpnDataScienceLab/deperson.

  2. 2.

    https://github.com/OpenTaal/opentaal-hunspell.

  3. 3.

    https://data.overheid.nl/.

References

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union L 119 4 May 2016; pp. 1–88 (2016)

    Google Scholar 

  2. Armstrong, M.: Infographic: Data Protection Fines Reach Record High in 2023. Statista Daily Data (2023). https://www.statista.com/chart/30053/gdpr-data-protection-fines-timeline

  3. Ataei, M., Degbelo, A., Kray, C., Santos, V.: Complying with privacy legislation: from legal text to implementation of privacy-aware location-based services. ISPRS Int. J. Geo Inf. 7(11), 442 (2018)

    Article  Google Scholar 

  4. Avison, D.E., Lau, F., Myers, M.D., Nielsen, P.A.: Action research. Commun. ACM 42(1), 94–97 (1999)

    Article  Google Scholar 

  5. Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81, 121–148 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Borovits, N., et al.: FindICI: using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code. Empir. Softw. Eng. 27(7), 1–30 (2022)

    Article  Google Scholar 

  7. Burgess, M.: CHATGPT has a big privacy problem. Wired (2023). https://www.wired.com/story/italy-ban-chatgpt-privacy-gdpr/

  8. Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D.: The secret sharer: Evaluating and testing unintended memorization in neural networks. In: 28th USENIX Security Symposium (USENIX Security 2019), pp. 267–284 (2019)

    Google Scholar 

  9. Carlini, N., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2633–2650 (2021)

    Google Scholar 

  10. Chen, W.Y., Yu, M., Sun, C.: Architecture and building the medical image anonymization service: cloud, big data and automation. In: 2021 International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), pp. 149–153. IEEE (2021)

    Google Scholar 

  11. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  12. Coughlan, P., Coghlan, D.: Action research for operations management. Int. J. Oper. Prod. Manag. 22(2), 220–240 (2002)

    Article  Google Scholar 

  13. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    Google Scholar 

  14. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015)

    Google Scholar 

  15. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 758–769 (2007)

    Google Scholar 

  16. Guerriero, M., Tamburri, D.A., Di Nitto, E.: Defining, enforcing and checking privacy policies in data-intensive applications. In: Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems, pp. 172–182 (2018)

    Google Scholar 

  17. Hisamoto, S., Post, M., Duh, K.: Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system? Trans. Assoc. Comput. Linguist. 8, 49–63 (2020)

    Article  Google Scholar 

  18. Huang, J., Shao, H., Chang, K.C.C.: Are Large Pre-Trained Language Models Leaking Your Personal Information? arXiv preprint arXiv:2205.12628 (2022)

  19. Jian, Z., et al.: A cascaded approach for Chinese clinical text de-identification with less annotation effort. J. Biomed. Inform. 73, 76–83 (2017)

    Article  Google Scholar 

  20. Kaplan, M.: May I Ask Who’s Calling? Named Entity Recognition on Call Center Transcripts for Privacy Law Compliance. arXiv preprint arXiv:2010.15598 (2020)

  21. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2006)

    Google Scholar 

  22. Li, Z.S., Werner, C., Ernst, N., Damian, D.: Towards privacy compliance: a design science study in a small organization. Inf. Softw. Technol. 146, 106868 (2022)

    Article  Google Scholar 

  23. Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-Béguelin, S.: Analyzing leakage of personally identifiable information in language models. arXiv preprint arXiv:2302.00539 (2023)

  24. Meehan, M.: Data Privacy shall Be The Most Important Issue In The Next Decade. Forbes (2019). https://www.forbes.com/sites/marymeehan/2019/11/26/data-privacy-shall-be-the-most-important-issue-in-the-next-decade/

  25. Mireshghallah, F., Uniyal, A., Wang, T., Evans, D.K., Berg-Kirkpatrick, T.: An empirical analysis of memorization in fine-tuned autoregressive language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 1816–1826 (2022)

    Google Scholar 

  26. Murugadoss, K., et al.: Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns 2(6), 100255 (2021)

    Article  Google Scholar 

  27. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 173–187. IEEE (2009)

    Google Scholar 

  28. Neamatullah, I., et al.: Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak. 8(1), 1–17 (2008)

    Article  Google Scholar 

  29. Paleyes, A., Urma, R.G., Lawrence, N.D.: Challenges in deploying machine learning: a survey of case studies. ACM Comput. Surv. 55(6), 1–29 (2022)

    Article  Google Scholar 

  30. Pan, X., Zhang, M., Ji, S., Yang, M.: Privacy risks of general-purpose language models. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1314–1331. IEEE (2020)

    Google Scholar 

  31. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.P.: SoK: security and privacy in machine learning. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 399–414. IEEE (2018)

    Google Scholar 

  32. Solove, D.J.: Why privacy matters even if you have ‘nothing to hide’. Chronicle High. Educ. 15 (2011)

    Google Scholar 

  33. Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 21–28 (2016)

    Article  Google Scholar 

  34. Turrecha, L.M.: AI has a privacy problem, and the solution is privacy tech, not more Red Tape. AI Has A Privacy Problem, And The Solution is Privacy Tech, Not More Red Tape (2023). https://lourdesmturrecha.substack.com/p/title-ai-has-a-privacy-problem-and

  35. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  36. Zhong, Z.Y.S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: SIAM International Conference on Data Mining, pp. 1–11 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nemania Borovits .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borovits, N., Bardelloni, G., Tamburri, D.A., Van Den Heuvel, WJ. (2023). Anonymization-as-a-Service: The Service Center Transcripts Industrial Case. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds) Service-Oriented Computing. ICSOC 2023. Lecture Notes in Computer Science, vol 14420. Springer, Cham. https://doi.org/10.1007/978-3-031-48424-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48424-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48423-0

  • Online ISBN: 978-3-031-48424-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics