Abstract
The rapid advancement of artificial intelligence in recent years has led to an increase in its use in legal contexts . At the same time, a growing body of research has expressed concerns that AI trained on large datasets may learn and model undesirable social biases. In this paper, we investigate the extent to which such social biases are inherent in a real-world legal corpus. We train a word2vec word embedding model on case law data and find evidence that NLP methods make undesirable distinctions between legally equivalent entities that vary only by race. Since legal AI applications that model such distinctions risk perpetuating these inequalities when used, we argue that the development of such applications must incorporate a means to detect and mitigate such biases. To this end, we propose an annotation schema that identifies and categorizes deviations from legal equivalence, so that debiasing may be more systematically incorporated into legal AI development. Future directions for research are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caselaw Access Project (2018). https://case.law/
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. ProPublica 23(2016), 139–159 (2016)
Bolukbasi, T., Chang, K.W., Zou, J., Saligrama, V., Kalai, A.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv arXiv:1607.06520 [cs, stat] (July 2016)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Meth. 39(3), 510–526 (2007)
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017). https://doi.org/10.1126/science.aal4230. arXiv arXiv:1608.07187
Chalkidis, I., Androutsopoulos, I., Aletras, N.: Neural legal judgment prediction in English. arXiv preprint arXiv:1906.02059 (2019)
Chang, F., McCabe, E., Lee, J.: Mining the Harvard Caselaw Access Project. SSRN Scholarly Paper ID 3529257, Social Science Research Network, Rochester, NY (September 2020). https://doi.org/10.2139/ssrn.3529257. https://papers.ssrn.com/abstract=3529257
Duan, X., et al.: CJRC: a reliable human-annotated benchmark dataset for Chinese judicial reading comprehension. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 439–451. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_36
Katz, D.M., Bommarito, M.J., Blackman, J.: A general approach for predicting the behavior of the Supreme Court of the United States. PLoS ONE 12(4), e0174698 (2017)
Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 (2019)
Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the 4th Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp. 66–74 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Rice, D., Rhodes, J.H., Nteta, T.: Racial bias in legal language. Res. Polit. 6(2), 2053168019848930 (2019)
Teruel, M., Cardellino, C., Cardellino, F., Alemany, L.A., Villata, S.: Legal text processing within the MIREL project. In: 1st Workshop on Language Resources and Technologies for the Legal Knowledge Graph, p. 42 (2018)
Tsurel, D., Doron, M., Nus, A., Dagan, A., Guy, I., Shahaf, D.: E-commerce dispute resolution prediction. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1465–1474 (2020)
Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., Sun, M.: How does NLP benefit legal system: a summary of legal artificial intelligence. arXiv preprint arXiv:2004.12158 (2020)
Acknowledgments
This research is supported by a grant from the Indiana University Racial Justice Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gumusel, E., Malic, V.Q., Donaldson, D.R., Ashley, K., Liu, X. (2022). An Annotation Schema for the Detection of Social Bias in Legal Text Corpora. In: Smits, M. (eds) Information for a Better World: Shaping the Global Future. iConference 2022. Lecture Notes in Computer Science(), vol 13192. Springer, Cham. https://doi.org/10.1007/978-3-030-96957-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-96957-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96956-1
Online ISBN: 978-3-030-96957-8
eBook Packages: Computer ScienceComputer Science (R0)