skip to main content
10.1145/3462757.3466097acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

Prediction of monetary penalties for data protection cases in multiple languages

Published: 27 July 2021 Publication History

Abstract

As the use of personal data becomes further entrenched in the function of societal interaction, the regulation of such data continues to grow as an important area of law. Nevertheless, it is unfortunately the case that data protection authorities have limited resources to address an increasing number of investigations. The leveraging of appropriate data-driven models, coupled with the automation of decision making, has the potential to help in such circumstances. In this paper, we evaluate machine learning models in the literature (such as Support Vector Machine (SVM), Random Forest, and Multinomial Naive Bayes (MNB) classifiers) for natural language processing in order to predict whether a monetary penalty was levied based on a description of case facts. We tested these models on a novel data set collected from the data protection authority of Macao across the three languages (i.e., Chinese, English, and Portuguese). Our experimental results show that the machine learning models provide the necessary predictability in order to automate the evaluation of data protection cases. In particular, SVM has consistent performance across three languages and achieving an AUROC of 0.725, 0.762, and 0.748 for Chinese, English, and Portuguese, respectively. We further evaluated the interpretability of the results independently for each of the languages and found that the salient texts that were identified are shared across the three languages.

References

[1]
Aletras, N., Tsarafatsanis, D., Preoţiuc-Pietro, D., and Lampos, V. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Computer Science 2 (2016), e93.
[2]
Bayamlioğlu, E., and Leenes, R. The 'rule of law' implications of data-driven decision-making: a techno-regulatory perspective. Law, Innovation and Technology 10, 2 (2018), 295--313.
[3]
Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., and Shadbolt, N. 'It's reducing a human being to a percentage': Perceptions of justice in algorithmic decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, p. 1--14.
[4]
Braithwaite, J. Enforced self-regulation: A new strategy for corporate crime control. Michigan Law Review 80, 7 (1982), 1466--1507.
[5]
Ceross, A. Examining data protection enforcement actions through qualitative interviews and data exploration. International Review of Law, Computers & Technology 32, 1 (2018), 99--117.
[6]
Ceross, A., and Simpson, A. C. The use of data protection regulatory actions as a data source for privacy economics. In Computer Safety, Reliability, and Security (SAFECOMP) (2017), S. Tonetta, E. Schoitsch, and F. Bitsch, Eds., vol. 10489 of Lecture Notes in Computer Science (LNCS), Springer, pp. 350--360.
[7]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, Minnesota, June 2019), Association for Computational Linguistics, pp. 4171--4186.
[8]
Edwards, L., and Veale, M. Enslaving the algorithm: From a "Right to an Explanation" to a "Right to Better Decisions"? IEEE Security & Privacy 16, 3 (2018), 46--54.
[9]
Gabinete para a Protecção de Dados Pessoais. Case No: 0002/2014/IP: Uploaded clients' photos by mistake. https://www.gpdp.gov.mo/index.php?m=content&c=index&a=show&catid=209&id=775, 2014. English version.
[10]
Gabinete para a Protecção de Dados Pessoais. . https://www.gpdp.gov.mo/uploadfile/2020/1009/20201009040422173.pdf, Oct. 2020.
[11]
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (2018), IEEE, pp. 80--89.
[12]
Greenleaf, G. Macao's EU-influenced Personal Data Protection Act. Privacy Laws & Business International Newsletter 96 (2008), 21--22.
[13]
Hildebrandt, M. Law as computation in the era of artificial legal intelligence: Speaking law to the power of statistics. University of Toronto Law Journal 68, Supplement 1 (2018), 12--35.
[14]
Hustinx, P. The role of data protection authorities. In Reinventing Data Protection?, S. Gutwirth, Y. Poullet, P. De Hert, C. de Terwange, and S. Nouwt, Eds. Springer, 2009, pp. 131--137.
[15]
Jianqiang, Z., and Xiaolin, G. Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access 5 (2017), 2870--2879.
[16]
Medvedeva, M., Vols, M., and Wieling, M. Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law 28, 2 (2020), 237--266.
[17]
Personal Data Protection Act. https://www.gpdp.gov.mo/uploadfile/2016/0302/20160302033801814.pdf, 2005.
[18]
Powers, D. M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).
[19]
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020).
[20]
Regulation on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). L119, 4/5/2016, p. 1--88, 2016.
[21]
Ribeiro, M. T., Singh, S., and Guestrin, C. "Why should I trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016 (2016), pp. 1135--1144.
[22]
Virtucio, M. B. L., Aborot, J. A., Abonita, J. K. C., Avinante, R. S., Coplno, R. J. B., Neverida, M. P., Osiana, V. O., Peramo, E. C., Syjuco, J. G., and Tan, G. B. A. Predicting decisions of the Philippine Supreme Court using natural language processing and machine learning. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (2018), vol. 2, IEEE, pp. 130--135.
[23]
Waltl, B., Bonczek, G., Scepankova, E., Landthaler, J., and Matthes, F. Predicting the outcome of appeal decisions in Germany's tax law. In International Conference on Electronic Participation (2017), Springer, pp. 89--99.

Cited By

View all
  • (2021)Topic Modelling for Risk Identification in Data Protection Act JudgementsNew Frontiers in Artificial Intelligence10.1007/978-3-031-36190-6_5(62-76)Online publication date: 13-Nov-2021
  1. Prediction of monetary penalties for data protection cases in multiple languages

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
    June 2021
    319 pages
    ISBN:9781450385268
    DOI:10.1145/3462757
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ICAIL '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 69 of 169 submissions, 41%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Topic Modelling for Risk Identification in Data Protection Act JudgementsNew Frontiers in Artificial Intelligence10.1007/978-3-031-36190-6_5(62-76)Online publication date: 13-Nov-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media