ABSTRACT
The General Data Protection Regulation (GDPR) requires data controllers to implement end-to-end compliance. Controllers must therefore ensure that the terms agreed with the data subject and their own obligations under GDPR are respected in the data flows from data subject to controllers, processors and sub processors (i.e. data supply chain). This paper seeks to contribute to bridge both ends of compliance checking through a two-pronged study. First, we conceptualize a framework to implement a document-centric approach to compliance checking in the data supply chain. Second, we develop specific methods to automate compliance checking of privacy policies. We test a two-modules system, where the first module relies on NLP to extract data practices from privacy policies. The second module encodes GDPR rules to check the presence of mandatory information. The results show that the text-to-text approach outperforms local classifiers and enables the extraction of both coarse-grained and fine-grained information with only one model. We implement full evaluation of our system on a dataset of 30 privacy policies annotated by legal experts. We conclude that this approach could be generalized to other documents in the data supply as a means to improve end-to-end compliance.
- 2017. The True Cost of Compliance with Data Protection Regulations. Technical Report. Ponemon Institute LLC.Google Scholar
- 2019. ICO Guidance: Update report into adtech and real time bidding. Technical Report. Information Commissioner's Office. 19--21 pages.Google Scholar
- David Restrepo Amariles, Aurore Clément Troussel, and Rajaa El Hamdani. 2020. Compliance Generation for Privacy Documents under GDPR: A Roadmap for Implementing Automation and Machine Learning. arXiv preprint arXiv:2012.12718 (2020).Google Scholar
- Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, et al. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019).Google Scholar
- Jaspreet Bhatia, Travis D Breaux, Joel R Reidenberg, and Thomas B Norton. 2016. A theory of vagueness and privacy risk perception. In 2016 IEEE 24th International Requirements Engineering Conference (RE). IEEE, 26--35.Google ScholarCross Ref
- Giuseppe Contissa, Koen Docter, Francesca Lagioia, Marco Lippi, Hans-W Micklitz, Przemysław Pałka, Giovanni Sartor, and Paolo Torroni. 2018. Claudette meets gdpr: Automating the evaluation of privacy policies using artificial intelligence. Available at SSRN 3208596 (2018).Google Scholar
- Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry Den Hartog. 2012. A machine learning solution to assess privacy policy completeness: (short paper). In Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society. 91--96.Google ScholarDigital Library
- Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).Google Scholar
- Marina De Vos, Sabrina Kirrane, Julian Padget, and Ken Satoh. 2019. ODRL policy modelling and compliance checking. In International Joint Conference on Rules and Reasoning. Springer, 36--51.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Olha Drozd and Sabrina Kirrane. 2020. Privacy CURE: Consent Comprehension Made Easy. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 124--139.Google Scholar
- María Teresa Gómez-López, Luisa Parody, Rafael M Gasca, and Stefanie Rinderle-Ma. 2014. Prognosing the compliance of declarative business processes using event trace robustness. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 327--344.Google ScholarCross Ref
- Guido Governatori and Sidney Shek. 2012. Rule Based Business Process Compliance.. In RuleML (2). Citeseer.Google Scholar
- Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G Shin, and Karl Aberer. 2018. Polisis: Automated analysis and presentation of privacy policies using deep learning. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 531--548.Google Scholar
- Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2012. Business process data compliance. In International Workshop on Rules and Rule Markup Languages for the Semantic Web. Springer, 32--46.Google ScholarDigital Library
- Mustafa Hashmi, Guido Governatori, and Moe Thandar Wynn. 2016. Normative requirements for regulatory compliance: An abstract formal framework. Information Systems Frontiers 18, 3 (2016), 429--455.Google ScholarDigital Library
- Martin Hepp. 2008. Ontologies: State of the art, business potential, and grand challenges. Ontology Management (2008), 3--22.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Katsiaryna Krasnashchok, Majd Mustapha, Anas Al Bassit, and Sabri Skhiri. 2020. Towards Privacy Policy Conceptual Modeling. In International Conference on Conceptual Modeling. Springer, 429--438.Google Scholar
- Logan Lebanoff and Fei Liu. 2018. Automatic detection of vague words and sentences in privacy policies. arXiv preprint arXiv:1808.06219 (2018).Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115 (2017).Google Scholar
- Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The privacy policy landscape after the GDPR. Proceedings on Privacy Enhancing Technologies 2020, 1 (2020), 47--64.Google ScholarCross Ref
- Fei Liu, Nicole Lee Fella, and Kexin Liao. 2018. Modeling language vagueness in privacy policies using deep neural networks. arXiv preprint arXiv:1805.10393 (2018).Google Scholar
- Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck, and Norman Sadeh. 2018. Towards automatic classification of privacy policy text. School of Computer Science Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010 (2018).Google Scholar
- Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018).Google Scholar
- Tomas Mikolov, Kai Chen, G. S. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.Google Scholar
- Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit, and Sabri Skhiri. 2020. Privacy Policy Classification with XLNet (Short Paper). In Data Privacy Management, Cryptocurrencies and Blockchain Technology. Springer, 250--257.Google Scholar
- Najmeh Mousavi Nejad, Pablo Jabat, Rostislav Nedelchev, Simon Scerri, and Damien Graux. 2020. Establishing a strong baseline for privacy policy classification. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 370--383.Google ScholarCross Ref
- Monica Palmirani, Michele Martoni, Arianna Rossi, Cesare Bartolini, and Livio Robaldo. 2018. PrOnto: Privacy ontology for legal reasoning. In International Conference on Electronic Government and the Information Systems Perspective. Springer, 139--152.Google ScholarDigital Library
- Ellen Poplavska, Thomas B Norton, Shomir Wilson, and Norman Sadeh. 2020. From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme. In 33rd International Conference on Legal Knowledge and Information Systems, JURIX 2020. IOS Press BV, 243--246.Google ScholarCross Ref
- Wenjun Qiu and David Lie. 2020. Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification. arXiv preprint arXiv:2008.02954 (2020).Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google Scholar
- Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).Google Scholar
- Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. 2019. Question answering for privacy policies: Combining computational and legal perspectives. arXiv preprint arXiv:1911.00841 (2019).Google Scholar
- Joel R Reidenberg, Jaspreet Bhatia, Travis D Breaux, and Thomas B Norton. 2016. Ambiguity in privacy policies and the impact of regulation. The Journal of Legal Studies 45, S2 (2016), S163--S190.Google ScholarCross Ref
- Community Research and Development Information Service. 2021. Business Process Re-engineering and functional toolkit for GDPR compliance. https://cordis.europa.eu/project/id/787149/results. Accessed: 2021-02-28.Google Scholar
- Kanthashree Mysore Sathyendra, Shomir Wilson, Florian Schaub, Sebastian Zimmeck, and Norman Sadeh. 2017. Identifying the provision of choices in privacy policy text. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2774--2779.Google ScholarCross Ref
- Carlos N Silla and Alex A Freitas. 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1 (2011), 31--72.Google ScholarDigital Library
- Laurens Sion, Pierre Dewitte, Dimitri Van Landuyt, Kim Wuyts, Peggy Valcke, and Wouter Joosen. 2020. DPMF: A Modeling Framework for Data Protection by Design. Enterprise Modelling and Information Systems Architectures (EMISAJ) 15 (2020), 10--1.Google Scholar
- Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1555--1565.Google ScholarCross Ref
- Welderufael B Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. 2018. PrivacyGuide: towards an implementation of the EU GDPR on internet privacy policy evaluation. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. 15--21.Google ScholarDigital Library
- Damiano Torre, Sallam Abualhaija, Mehrdad Sabetzadeh, Lionel Briand, Katrien Baetens, Peter Goes, and Sylvie Forastier. 2020. An ai-assisted approach for checking the completeness of privacy policies against gdpr. In 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 136--146.Google ScholarCross Ref
- Damiano Torre, Ghanem Soltana, Mehrdad Sabetzadeh, Lionel C Briand, Yuri Auffinger, and Peter Goes. 2019. Using models to enable compliance checking against the GDPR: an experience report. In 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 1--11.Google ScholarCross Ref
- Silvano Colombo Tosatto, Guido Governatori, Nick van Beest, and Francesco Olivieri. 2019. Efficient Full Compliance Checking of Concurrent Components for business Process Models. FLAP 6, 5 (2019), 963--998.Google Scholar
- Sebastian Urbina. 2002. Legal method and the rule of law. Vol. 59. Springer Science & Business Media.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
- Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N Cameron Russell, et al. 2016. The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1330--1340.Google ScholarCross Ref
- Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS.Google Scholar
- Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019. Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161 (2019).Google Scholar
- Razieh Nokhbeh Zaeem, Rachel L German, and K Suzanne Barber. 2018. Privacy-check: Automatic summarization of privacy policies using data mining. ACM Transactions on Internet Technology (TOIT) 18, 4 (2018), 1--18.Google ScholarDigital Library
- Min-Ling Zhang and Zhi-Hua Zhou. 2013. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2013), 1819--1837.Google Scholar
- Ben Zhou, Daniel Khashabi, Chen-Tse Tsai, and Dan Roth. 2019. Zero-shot open entity typing as type-compatible grounding. arXiv preprint arXiv:1907.03228 (2019).Google Scholar
- Sebastian Zimmeck, Peter Story, Daniel Smullen, Abhilasha Ravichander, Ziqi Wang, Joel Reidenberg, N Cameron Russell, and Norman Sadeh. 2019. Maps: Scaling privacy compliance analysis to a million apps. Proceedings on Privacy Enhancing Technologies 2019, 3 (2019), 66--86.Google Scholar
Index Terms
- A combined rule-based and machine learning approach for automated GDPR compliance checking
Recommendations
NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR
When the entity processing personal data (the processor) differs from the one collecting personal data (the controller), processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through <italic>data processing ...
A posteriori compliance control
SACMAT '07: Proceedings of the 12th ACM symposium on Access control models and technologiesWhile preventative policy enforcement mechanisms can provide theoretical guarantees that policy is correctly enforced, they have limitations in practice. They are inflexible when unanticipated circumstances arise, and most are either inflexible with ...
Automated multi-level governance compliance checking
An institution typically comprises constitutive rules, which give shape and meaning to social interactions and regulative rules, which prescribe agent behaviour in the society. Regulative rules guide social interaction, in particular when they are ...
Comments