research-article

Prediction of monetary penalties for data protection cases in multiple languages

Authors:

Tingting ZhuAuthors Info & Claims

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

Pages 185 - 189

https://doi.org/10.1145/3462757.3466097

Published: 27 July 2021 Publication History

Abstract

As the use of personal data becomes further entrenched in the function of societal interaction, the regulation of such data continues to grow as an important area of law. Nevertheless, it is unfortunately the case that data protection authorities have limited resources to address an increasing number of investigations. The leveraging of appropriate data-driven models, coupled with the automation of decision making, has the potential to help in such circumstances. In this paper, we evaluate machine learning models in the literature (such as Support Vector Machine (SVM), Random Forest, and Multinomial Naive Bayes (MNB) classifiers) for natural language processing in order to predict whether a monetary penalty was levied based on a description of case facts. We tested these models on a novel data set collected from the data protection authority of Macao across the three languages (i.e., Chinese, English, and Portuguese). Our experimental results show that the machine learning models provide the necessary predictability in order to automate the evaluation of data protection cases. In particular, SVM has consistent performance across three languages and achieving an AUROC of 0.725, 0.762, and 0.748 for Chinese, English, and Portuguese, respectively. We further evaluated the interpretability of the results independently for each of the languages and found that the salient texts that were identified are shared across the three languages.

References

[1]

Aletras, N., Tsarafatsanis, D., Preoţiuc-Pietro, D., and Lampos, V. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Computer Science 2 (2016), e93.

[2]

Bayamlioğlu, E., and Leenes, R. The 'rule of law' implications of data-driven decision-making: a techno-regulatory perspective. Law, Innovation and Technology 10, 2 (2018), 295--313.

[3]

Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., and Shadbolt, N. 'It's reducing a human being to a percentage': Perceptions of justice in algorithmic decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, p. 1--14.

[4]

Braithwaite, J. Enforced self-regulation: A new strategy for corporate crime control. Michigan Law Review 80, 7 (1982), 1466--1507.

[5]

Ceross, A. Examining data protection enforcement actions through qualitative interviews and data exploration. International Review of Law, Computers & Technology 32, 1 (2018), 99--117.

[6]

Ceross, A., and Simpson, A. C. The use of data protection regulatory actions as a data source for privacy economics. In Computer Safety, Reliability, and Security (SAFECOMP) (2017), S. Tonetta, E. Schoitsch, and F. Bitsch, Eds., vol. 10489 of Lecture Notes in Computer Science (LNCS), Springer, pp. 350--360.

[7]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, Minnesota, June 2019), Association for Computational Linguistics, pp. 4171--4186.

[8]

Edwards, L., and Veale, M. Enslaving the algorithm: From a "Right to an Explanation" to a "Right to Better Decisions"? IEEE Security & Privacy 16, 3 (2018), 46--54.

[9]

Gabinete para a Protecção de Dados Pessoais. Case No: 0002/2014/IP: Uploaded clients' photos by mistake. https://www.gpdp.gov.mo/index.php?m=content&c=index&a=show&catid=209&id=775, 2014. English version.

[10]

Gabinete para a Protecção de Dados Pessoais. . https://www.gpdp.gov.mo/uploadfile/2020/1009/20201009040422173.pdf, Oct. 2020.

[11]

Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (2018), IEEE, pp. 80--89.

[12]

Greenleaf, G. Macao's EU-influenced Personal Data Protection Act. Privacy Laws & Business International Newsletter 96 (2008), 21--22.

[13]

Hildebrandt, M. Law as computation in the era of artificial legal intelligence: Speaking law to the power of statistics. University of Toronto Law Journal 68, Supplement 1 (2018), 12--35.

[14]

Hustinx, P. The role of data protection authorities. In Reinventing Data Protection?, S. Gutwirth, Y. Poullet, P. De Hert, C. de Terwange, and S. Nouwt, Eds. Springer, 2009, pp. 131--137.

[15]

Jianqiang, Z., and Xiaolin, G. Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5 (2017), 2870--2879.

[16]

Medvedeva, M., Vols, M., and Wieling, M. Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law 28, 2 (2020), 237--266.

Digital Library

[17]

Personal Data Protection Act. https://www.gpdp.gov.mo/uploadfile/2016/0302/20160302033801814.pdf, 2005.

[18]

Powers, D. M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).

[19]

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020).

[20]

Regulation on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). L119, 4/5/2016, p. 1--88, 2016.

[21]

Ribeiro, M. T., Singh, S., and Guestrin, C. "Why should I trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016 (2016), pp. 1135--1144.

Digital Library

[22]

Virtucio, M. B. L., Aborot, J. A., Abonita, J. K. C., Avinante, R. S., Coplno, R. J. B., Neverida, M. P., Osiana, V. O., Peramo, E. C., Syjuco, J. G., and Tan, G. B. A. Predicting decisions of the Philippine Supreme Court using natural language processing and machine learning. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (2018), vol. 2, IEEE, pp. 130--135.

[23]

Waltl, B., Bonczek, G., Scepankova, E., Landthaler, J., and Matthes, F. Predicting the outcome of appeal decisions in Germany's tax law. In International Conference on Electronic Participation (2017), Springer, pp. 89--99.

Cited By

Ceross ASimpson A(2021)Topic Modelling for Risk Identification in Data Protection Act JudgementsNew Frontiers in Artificial Intelligence10.1007/978-3-031-36190-6_5(62-76)Online publication date: 13-Nov-2021
https://dl.acm.org/doi/10.1007/978-3-031-36190-6_5

Prediction of monetary penalties for data protection cases in multiple languages
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Using wiktionary to improve lexical disambiguation in multiple languages
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. ...
Stemming resource-poor Indian languages

Stemming is a basic method for morphological normalization of natural language texts. In this study, we focus on the problem of stemming several resource-poor languages from Eastern India, viz., Assamese, Bengali, Bishnupriya Manipuri and Bodo. While ...
A Data mining approach for resolving cases of Multiple Parsing in Machine Aided Translation of Indian Languages
ITNG '07: Proceedings of the International Conference on Information Technology

Resolving cases of multiple parsing is one of the biggest problems in Machine Aided Translation (MAT) systems. Producing an unambiguous parse is a major challenge for the parsers developed for Indian Languages. The paper discusses a Data Mining based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

June 2021

319 pages

ISBN:9781450385268

DOI:10.1145/3462757

Conference Chair:
Juliano Maranhão
University of São Paulo, Brazil
,
Program Chair:
Adam Zachary Wyner
Swansea University, United Kingdom

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICAIL '21

Sponsor:

SIGAI

ICAIL '21: Eighteenth International Conference for Artificial Intelligence and Law

June 21 - 25, 2021

São Paulo, Brazil

Acceptance Rates

Overall Acceptance Rate 69 of 169 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
68
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ceross ASimpson A(2021)Topic Modelling for Risk Identification in Data Protection Act JudgementsNew Frontiers in Artificial Intelligence10.1007/978-3-031-36190-6_5(62-76)Online publication date: 13-Nov-2021
https://dl.acm.org/doi/10.1007/978-3-031-36190-6_5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten