research-article

A combined rule-based and machine learning approach for automated GDPR compliance checking

Authors:
Rajaa El Hamdani

HEC Paris, France

HEC Paris, France
View Profile

,
Majd Mustapha

EURA NOVA, Belgium

EURA NOVA, Belgium
View Profile

,
David Restrepo Amariles

HEC Paris, France

HEC Paris, France
View Profile

,
Aurore Troussel

Steptoe & Johnson LLP, Belgium

Steptoe & Johnson LLP, Belgium
View Profile

,
Sébastien Meeùs

HEC Paris, France

HEC Paris, France
View Profile

,
Katsiaryna Krasnashchok

EURA NOVA, Belgium

EURA NOVA, Belgium
View Profile

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and LawJune 2021Pages 40–49https://doi.org/10.1145/3462757.3466081

Published:27 July 2021Publication History

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

Pages 40–49

ABSTRACT

The General Data Protection Regulation (GDPR) requires data controllers to implement end-to-end compliance. Controllers must therefore ensure that the terms agreed with the data subject and their own obligations under GDPR are respected in the data flows from data subject to controllers, processors and sub processors (i.e. data supply chain). This paper seeks to contribute to bridge both ends of compliance checking through a two-pronged study. First, we conceptualize a framework to implement a document-centric approach to compliance checking in the data supply chain. Second, we develop specific methods to automate compliance checking of privacy policies. We test a two-modules system, where the first module relies on NLP to extract data practices from privacy policies. The second module encodes GDPR rules to check the presence of mandatory information. The results show that the text-to-text approach outperforms local classifiers and enables the extraction of both coarse-grained and fine-grained information with only one model. We implement full evaluation of our system on a dataset of 30 privacy policies annotated by legal experts. We conclude that this approach could be generalized to other documents in the data supply as a means to improve end-to-end compliance.

References

2017. The True Cost of Compliance with Data Protection Regulations. Technical Report. Ponemon Institute LLC.Google Scholar
2019. ICO Guidance: Update report into adtech and real time bidding. Technical Report. Information Commissioner's Office. 19--21 pages.Google Scholar
David Restrepo Amariles, Aurore Clément Troussel, and Rajaa El Hamdani. 2020. Compliance Generation for Privacy Documents under GDPR: A Roadmap for Implementing Automation and Machine Learning. arXiv preprint arXiv:2012.12718 (2020).Google Scholar
Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, et al. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019).Google Scholar
Jaspreet Bhatia, Travis D Breaux, Joel R Reidenberg, and Thomas B Norton. 2016. A theory of vagueness and privacy risk perception. In 2016 IEEE 24th International Requirements Engineering Conference (RE). IEEE, 26--35.Google ScholarCross Ref
Giuseppe Contissa, Koen Docter, Francesca Lagioia, Marco Lippi, Hans-W Micklitz, Przemysław Pałka, Giovanni Sartor, and Paolo Torroni. 2018. Claudette meets gdpr: Automating the evaluation of privacy policies using artificial intelligence. Available at SSRN 3208596 (2018).Google Scholar
Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry Den Hartog. 2012. A machine learning solution to assess privacy policy completeness: (short paper). In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society. 91--96.Google ScholarDigital Library
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).Google Scholar
Marina De Vos, Sabrina Kirrane, Julian Padget, and Ken Satoh. 2019. ODRL policy modelling and compliance checking. In International Joint Conference on Rules and Reasoning. Springer, 36--51.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Olha Drozd and Sabrina Kirrane. 2020. Privacy CURE: Consent Comprehension Made Easy. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 124--139.Google Scholar
María Teresa Gómez-López, Luisa Parody, Rafael M Gasca, and Stefanie Rinderle-Ma. 2014. Prognosing the compliance of declarative business processes using event trace robustness. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 327--344.Google ScholarCross Ref
Guido Governatori and Sidney Shek. 2012. Rule Based Business Process Compliance.. In RuleML (2). Citeseer.Google Scholar
Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G Shin, and Karl Aberer. 2018. Polisis: Automated analysis and presentation of privacy policies using deep learning. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 531--548.Google Scholar
Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2012. Business process data compliance. In International Workshop on Rules and Rule Markup Languages for the Semantic Web. Springer, 32--46.Google ScholarDigital Library
Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2016. Normative requirements for regulatory compliance: An abstract formal framework. Information Systems Frontiers 18, 3 (2016), 429--455.Google ScholarDigital Library
Martin Hepp. 2008. Ontologies: State of the art, business potential, and grand challenges. Ontology Management (2008), 3--22.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Katsiaryna Krasnashchok, Majd Mustapha, Anas Al Bassit, and Sabri Skhiri. 2020. Towards Privacy Policy Conceptual Modeling. In International Conference on Conceptual Modeling. Springer, 429--438.Google Scholar
Logan Lebanoff and Fei Liu. 2018. Automatic detection of vague words and sentences in privacy policies. arXiv preprint arXiv:1808.06219 (2018).Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115 (2017).Google Scholar
Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The privacy policy landscape after the GDPR. Proceedings on Privacy Enhancing Technologies 2020, 1 (2020), 47--64.Google ScholarCross Ref
Fei Liu, Nicole Lee Fella, and Kexin Liao. 2018. Modeling language vagueness in privacy policies using deep neural networks. arXiv preprint arXiv:1805.10393 (2018).Google Scholar
Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck, and Norman Sadeh. 2018. Towards automatic classification of privacy policy text. School of Computer Science Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010 (2018).Google Scholar
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018).Google Scholar
Tomas Mikolov, Kai Chen, G. S. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.Google Scholar
Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit, and Sabri Skhiri. 2020. Privacy Policy Classification with XLNet (Short Paper). In Data Privacy Management, Cryptocurrencies and Blockchain Technology. Springer, 250--257.Google Scholar
Najmeh Mousavi Nejad, Pablo Jabat, Rostislav Nedelchev, Simon Scerri, and Damien Graux. 2020. Establishing a strong baseline for privacy policy classification. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 370--383.Google ScholarCross Ref
Monica Palmirani, Michele Martoni, Arianna Rossi, Cesare Bartolini, and Livio Robaldo. 2018. PrOnto: Privacy ontology for legal reasoning. In International Conference on Electronic Government and the Information Systems Perspective. Springer, 139--152.Google ScholarDigital Library
Ellen Poplavska, Thomas B Norton, Shomir Wilson, and Norman Sadeh. 2020. From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme. In 33rd International Conference on Legal Knowledge and Information Systems, JURIX 2020. IOS Press BV, 243--246.Google ScholarCross Ref
Wenjun Qiu and David Lie. 2020. Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification. arXiv preprint arXiv:2008.02954 (2020).Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google Scholar
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).Google Scholar
Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. 2019. Question answering for privacy policies: Combining computational and legal perspectives. arXiv preprint arXiv:1911.00841 (2019).Google Scholar
Joel R Reidenberg, Jaspreet Bhatia, Travis D Breaux, and Thomas B Norton. 2016. Ambiguity in privacy policies and the impact of regulation. The Journal of Legal Studies 45, S2 (2016), S163--S190.Google ScholarCross Ref
Community Research and Development Information Service. 2021. Business Process Re-engineering and functional toolkit for GDPR compliance. https://cordis.europa.eu/project/id/787149/results. Accessed: 2021-02-28.Google Scholar
Kanthashree Mysore Sathyendra, Shomir Wilson, Florian Schaub, Sebastian Zimmeck, and Norman Sadeh. 2017. Identifying the provision of choices in privacy policy text. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2774--2779.Google ScholarCross Ref
Carlos N Silla and Alex A Freitas. 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1 (2011), 31--72.Google ScholarDigital Library
Laurens Sion, Pierre Dewitte, Dimitri Van Landuyt, Kim Wuyts, Peggy Valcke, and Wouter Joosen. 2020. DPMF: A Modeling Framework for Data Protection by Design. Enterprise Modelling and Information Systems Architectures (EMISAJ) 15 (2020), 10--1.Google Scholar
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1555--1565.Google ScholarCross Ref
Welderufael B Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. 2018. PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. 15--21.Google ScholarDigital Library
Damiano Torre, Sallam Abualhaija, Mehrdad Sabetzadeh, Lionel Briand, Katrien Baetens, Peter Goes, and Sylvie Forastier. 2020. An ai-assisted approach for checking the completeness of privacy policies against gdpr. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 136--146.Google ScholarCross Ref
Damiano Torre, Ghanem Soltana, Mehrdad Sabetzadeh, Lionel C Briand, Yuri Auffinger, and Peter Goes. 2019. Using models to enable compliance checking against the GDPR: an experience report. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 1--11.Google ScholarCross Ref
Silvano Colombo Tosatto, Guido Governatori, Nick van Beest, and Francesco Olivieri. 2019. Efficient Full Compliance Checking of Concurrent Components for business Process Models. FLAP 6, 5 (2019), 963--998.Google Scholar
Sebastian Urbina. 2002. Legal method and the rule of law. Vol. 59. Springer Science & Business Media.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N Cameron Russell, et al. 2016. The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1330--1340.Google ScholarCross Ref
Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS.Google Scholar
Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161 (2019).Google Scholar
Razieh Nokhbeh Zaeem, Rachel L German, and K Suzanne Barber. 2018. Privacy-check: Automatic summarization of privacy policies using data mining. ACM Transactions on Internet Technology (TOIT) 18, 4 (2018), 1--18.Google ScholarDigital Library
Min-Ling Zhang and Zhi-Hua Zhou. 2013. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2013), 1819--1837.Google Scholar
Ben Zhou, Daniel Khashabi, Chen-Tse Tsai, and Dan Roth. 2019. Zero-shot open entity typing as type-compatible grounding. arXiv preprint arXiv:1907.03228 (2019).Google Scholar
Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps. Proceedings on Privacy Enhancing Technologies 2019, 3 (2019), 66--86.Google Scholar

Index Terms

A combined rule-based and machine learning approach for automated GDPR compliance checking

Recommendations

NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR
When the entity processing personal data (the processor) differs from the one collecting personal data (the controller), processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through <italic>data processing ...
Read More
A posteriori compliance control
SACMAT '07: Proceedings of the 12th ACM symposium on Access control models and technologies

While preventative policy enforcement mechanisms can provide theoretical guarantees that policy is correctly enforced, they have limitations in practice. They are inflexible when unanticipated circumstances arise, and most are either inflexible with ...
Read More
Automated multi-level governance compliance checking

An institution typically comprises constitutive rules, which give shape and meaning to social interactions and regulative rules, which prescribe agent behaviour in the society. Regulative rules guide social interaction, in particular when they are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
June 2021
319 pages
ISBN:9781450385268
DOI:10.1145/3462757
Conference Chair:
Juliano Maranhão
University of São Paulo, Brazil
,
Program Chair:
Adam Zachary Wyner
Swansea University, United Kingdom
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 July 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate69of169submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 627
  Total Downloads
- Downloads (Last 12 months)212
- Downloads (Last 6 weeks)40
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A combined rule-based and machine learning approach for automated GDPR compliance checking

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

ABSTRACT

References

Cited By

Index Terms

Recommendations

NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR

A posteriori compliance control

Automated multi-level governance compliance checking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A combined rule-based and machine learning approach for automated GDPR compliance checking

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

ABSTRACT

References

Cited By

Index Terms

Recommendations

NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR

A posteriori compliance control

Automated multi-level governance compliance checking

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media