skip to main content
10.1145/3462757.3466081acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

A combined rule-based and machine learning approach for automated GDPR compliance checking

Published:27 July 2021Publication History

ABSTRACT

The General Data Protection Regulation (GDPR) requires data controllers to implement end-to-end compliance. Controllers must therefore ensure that the terms agreed with the data subject and their own obligations under GDPR are respected in the data flows from data subject to controllers, processors and sub processors (i.e. data supply chain). This paper seeks to contribute to bridge both ends of compliance checking through a two-pronged study. First, we conceptualize a framework to implement a document-centric approach to compliance checking in the data supply chain. Second, we develop specific methods to automate compliance checking of privacy policies. We test a two-modules system, where the first module relies on NLP to extract data practices from privacy policies. The second module encodes GDPR rules to check the presence of mandatory information. The results show that the text-to-text approach outperforms local classifiers and enables the extraction of both coarse-grained and fine-grained information with only one model. We implement full evaluation of our system on a dataset of 30 privacy policies annotated by legal experts. We conclude that this approach could be generalized to other documents in the data supply as a means to improve end-to-end compliance.

References

  1. 2017. The True Cost of Compliance with Data Protection Regulations. Technical Report. Ponemon Institute LLC.Google ScholarGoogle Scholar
  2. 2019. ICO Guidance: Update report into adtech and real time bidding. Technical Report. Information Commissioner's Office. 19--21 pages.Google ScholarGoogle Scholar
  3. David Restrepo Amariles, Aurore Clément Troussel, and Rajaa El Hamdani. 2020. Compliance Generation for Privacy Documents under GDPR: A Roadmap for Implementing Automation and Machine Learning. arXiv preprint arXiv:2012.12718 (2020).Google ScholarGoogle Scholar
  4. Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, et al. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019).Google ScholarGoogle Scholar
  5. Jaspreet Bhatia, Travis D Breaux, Joel R Reidenberg, and Thomas B Norton. 2016. A theory of vagueness and privacy risk perception. In 2016 IEEE 24th International Requirements Engineering Conference (RE). IEEE, 26--35.Google ScholarGoogle ScholarCross RefCross Ref
  6. Giuseppe Contissa, Koen Docter, Francesca Lagioia, Marco Lippi, Hans-W Micklitz, Przemysław Pałka, Giovanni Sartor, and Paolo Torroni. 2018. Claudette meets gdpr: Automating the evaluation of privacy policies using artificial intelligence. Available at SSRN 3208596 (2018).Google ScholarGoogle Scholar
  7. Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry Den Hartog. 2012. A machine learning solution to assess privacy policy completeness: (short paper). In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society. 91--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).Google ScholarGoogle Scholar
  9. Marina De Vos, Sabrina Kirrane, Julian Padget, and Ken Satoh. 2019. ODRL policy modelling and compliance checking. In International Joint Conference on Rules and Reasoning. Springer, 36--51.Google ScholarGoogle Scholar
  10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  11. Olha Drozd and Sabrina Kirrane. 2020. Privacy CURE: Consent Comprehension Made Easy. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 124--139.Google ScholarGoogle Scholar
  12. María Teresa Gómez-López, Luisa Parody, Rafael M Gasca, and Stefanie Rinderle-Ma. 2014. Prognosing the compliance of declarative business processes using event trace robustness. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 327--344.Google ScholarGoogle ScholarCross RefCross Ref
  13. Guido Governatori and Sidney Shek. 2012. Rule Based Business Process Compliance.. In RuleML (2). Citeseer.Google ScholarGoogle Scholar
  14. Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G Shin, and Karl Aberer. 2018. Polisis: Automated analysis and presentation of privacy policies using deep learning. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 531--548.Google ScholarGoogle Scholar
  15. Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2012. Business process data compliance. In International Workshop on Rules and Rule Markup Languages for the Semantic Web. Springer, 32--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2016. Normative requirements for regulatory compliance: An abstract formal framework. Information Systems Frontiers 18, 3 (2016), 429--455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Martin Hepp. 2008. Ontologies: State of the art, business potential, and grand challenges. Ontology Management (2008), 3--22.Google ScholarGoogle Scholar
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Katsiaryna Krasnashchok, Majd Mustapha, Anas Al Bassit, and Sabri Skhiri. 2020. Towards Privacy Policy Conceptual Modeling. In International Conference on Conceptual Modeling. Springer, 429--438.Google ScholarGoogle Scholar
  20. Logan Lebanoff and Fei Liu. 2018. Automatic detection of vague words and sentences in privacy policies. arXiv preprint arXiv:1808.06219 (2018).Google ScholarGoogle Scholar
  21. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  22. Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115 (2017).Google ScholarGoogle Scholar
  23. Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The privacy policy landscape after the GDPR. Proceedings on Privacy Enhancing Technologies 2020, 1 (2020), 47--64.Google ScholarGoogle ScholarCross RefCross Ref
  24. Fei Liu, Nicole Lee Fella, and Kexin Liao. 2018. Modeling language vagueness in privacy policies using deep neural networks. arXiv preprint arXiv:1805.10393 (2018).Google ScholarGoogle Scholar
  25. Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck, and Norman Sadeh. 2018. Towards automatic classification of privacy policy text. School of Computer Science Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010 (2018).Google ScholarGoogle Scholar
  26. Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018).Google ScholarGoogle Scholar
  27. Tomas Mikolov, Kai Chen, G. S. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.Google ScholarGoogle Scholar
  28. Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit, and Sabri Skhiri. 2020. Privacy Policy Classification with XLNet (Short Paper). In Data Privacy Management, Cryptocurrencies and Blockchain Technology. Springer, 250--257.Google ScholarGoogle Scholar
  29. Najmeh Mousavi Nejad, Pablo Jabat, Rostislav Nedelchev, Simon Scerri, and Damien Graux. 2020. Establishing a strong baseline for privacy policy classification. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 370--383.Google ScholarGoogle ScholarCross RefCross Ref
  30. Monica Palmirani, Michele Martoni, Arianna Rossi, Cesare Bartolini, and Livio Robaldo. 2018. PrOnto: Privacy ontology for legal reasoning. In International Conference on Electronic Government and the Information Systems Perspective. Springer, 139--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ellen Poplavska, Thomas B Norton, Shomir Wilson, and Norman Sadeh. 2020. From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme. In 33rd International Conference on Legal Knowledge and Information Systems, JURIX 2020. IOS Press BV, 243--246.Google ScholarGoogle ScholarCross RefCross Ref
  32. Wenjun Qiu and David Lie. 2020. Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification. arXiv preprint arXiv:2008.02954 (2020).Google ScholarGoogle Scholar
  33. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google ScholarGoogle Scholar
  34. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).Google ScholarGoogle Scholar
  35. Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. 2019. Question answering for privacy policies: Combining computational and legal perspectives. arXiv preprint arXiv:1911.00841 (2019).Google ScholarGoogle Scholar
  36. Joel R Reidenberg, Jaspreet Bhatia, Travis D Breaux, and Thomas B Norton. 2016. Ambiguity in privacy policies and the impact of regulation. The Journal of Legal Studies 45, S2 (2016), S163--S190.Google ScholarGoogle ScholarCross RefCross Ref
  37. Community Research and Development Information Service. 2021. Business Process Re-engineering and functional toolkit for GDPR compliance. https://cordis.europa.eu/project/id/787149/results. Accessed: 2021-02-28.Google ScholarGoogle Scholar
  38. Kanthashree Mysore Sathyendra, Shomir Wilson, Florian Schaub, Sebastian Zimmeck, and Norman Sadeh. 2017. Identifying the provision of choices in privacy policy text. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2774--2779.Google ScholarGoogle ScholarCross RefCross Ref
  39. Carlos N Silla and Alex A Freitas. 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1 (2011), 31--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Laurens Sion, Pierre Dewitte, Dimitri Van Landuyt, Kim Wuyts, Peggy Valcke, and Wouter Joosen. 2020. DPMF: A Modeling Framework for Data Protection by Design. Enterprise Modelling and Information Systems Architectures (EMISAJ) 15 (2020), 10--1.Google ScholarGoogle Scholar
  41. Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1555--1565.Google ScholarGoogle ScholarCross RefCross Ref
  42. Welderufael B Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. 2018. PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. 15--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Damiano Torre, Sallam Abualhaija, Mehrdad Sabetzadeh, Lionel Briand, Katrien Baetens, Peter Goes, and Sylvie Forastier. 2020. An ai-assisted approach for checking the completeness of privacy policies against gdpr. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 136--146.Google ScholarGoogle ScholarCross RefCross Ref
  44. Damiano Torre, Ghanem Soltana, Mehrdad Sabetzadeh, Lionel C Briand, Yuri Auffinger, and Peter Goes. 2019. Using models to enable compliance checking against the GDPR: an experience report. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  45. Silvano Colombo Tosatto, Guido Governatori, Nick van Beest, and Francesco Olivieri. 2019. Efficient Full Compliance Checking of Concurrent Components for business Process Models. FLAP 6, 5 (2019), 963--998.Google ScholarGoogle Scholar
  46. Sebastian Urbina. 2002. Legal method and the rule of law. Vol. 59. Springer Science & Business Media.Google ScholarGoogle Scholar
  47. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google ScholarGoogle Scholar
  48. Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N Cameron Russell, et al. 2016. The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1330--1340.Google ScholarGoogle ScholarCross RefCross Ref
  49. Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS.Google ScholarGoogle Scholar
  50. Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161 (2019).Google ScholarGoogle Scholar
  51. Razieh Nokhbeh Zaeem, Rachel L German, and K Suzanne Barber. 2018. Privacy-check: Automatic summarization of privacy policies using data mining. ACM Transactions on Internet Technology (TOIT) 18, 4 (2018), 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Min-Ling Zhang and Zhi-Hua Zhou. 2013. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2013), 1819--1837.Google ScholarGoogle Scholar
  53. Ben Zhou, Daniel Khashabi, Chen-Tse Tsai, and Dan Roth. 2019. Zero-shot open entity typing as type-compatible grounding. arXiv preprint arXiv:1907.03228 (2019).Google ScholarGoogle Scholar
  54. Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps. Proceedings on Privacy Enhancing Technologies 2019, 3 (2019), 66--86.Google ScholarGoogle Scholar

Index Terms

  1. A combined rule-based and machine learning approach for automated GDPR compliance checking

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
              June 2021
              319 pages
              ISBN:9781450385268
              DOI:10.1145/3462757

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 July 2021

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate69of169submissions,41%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader