Skip to main content

An Annotation Schema for the Detection of Social Bias in Legal Text Corpora

  • Conference paper
  • First Online:
Information for a Better World: Shaping the Global Future (iConference 2022)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13192))

Included in the following conference series:

Abstract

The rapid advancement of artificial intelligence in recent years has led to an increase in its use in legal contexts . At the same time, a growing body of research has expressed concerns that AI trained on large datasets may learn and model undesirable social biases. In this paper, we investigate the extent to which such social biases are inherent in a real-world legal corpus. We train a word2vec word embedding model on case law data and find evidence that NLP methods make undesirable distinctions between legally equivalent entities that vary only by race. Since legal AI applications that model such distinctions risk perpetuating these inequalities when used, we argue that the development of such applications must incorporate a means to detect and mitigate such biases. To this end, we propose an annotation schema that identifies and categorizes deviations from legal equivalence, so that debiasing may be more systematically incorporated into legal AI development. Future directions for research are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caselaw Access Project (2018). https://case.law/

  2. Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. ProPublica 23(2016), 139–159 (2016)

    Google Scholar 

  3. Bolukbasi, T., Chang, K.W., Zou, J., Saligrama, V., Kalai, A.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv arXiv:1607.06520 [cs, stat] (July 2016)

  4. Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Meth. 39(3), 510–526 (2007)

    Article  Google Scholar 

  5. Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017). https://doi.org/10.1126/science.aal4230. arXiv arXiv:1608.07187

  6. Chalkidis, I., Androutsopoulos, I., Aletras, N.: Neural legal judgment prediction in English. arXiv preprint arXiv:1906.02059 (2019)

  7. Chang, F., McCabe, E., Lee, J.: Mining the Harvard Caselaw Access Project. SSRN Scholarly Paper ID 3529257, Social Science Research Network, Rochester, NY (September 2020). https://doi.org/10.2139/ssrn.3529257. https://papers.ssrn.com/abstract=3529257

  8. Duan, X., et al.: CJRC: a reliable human-annotated benchmark dataset for Chinese judicial reading comprehension. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 439–451. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_36

    Chapter  Google Scholar 

  9. Katz, D.M., Bommarito, M.J., Blackman, J.: A general approach for predicting the behavior of the Supreme Court of the United States. PLoS ONE 12(4), e0174698 (2017)

    Article  Google Scholar 

  10. Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 (2019)

  11. Lapesa, G., Evert, S.: Evaluating neighbor rank and distance measures as predictors of semantic priming. In: Proceedings of the 4th Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp. 66–74 (2013)

    Google Scholar 

  12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  13. Rice, D., Rhodes, J.H., Nteta, T.: Racial bias in legal language. Res. Polit. 6(2), 2053168019848930 (2019)

    Google Scholar 

  14. Teruel, M., Cardellino, C., Cardellino, F., Alemany, L.A., Villata, S.: Legal text processing within the MIREL project. In: 1st Workshop on Language Resources and Technologies for the Legal Knowledge Graph, p. 42 (2018)

    Google Scholar 

  15. Tsurel, D., Doron, M., Nus, A., Dagan, A., Guy, I., Shahaf, D.: E-commerce dispute resolution prediction. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1465–1474 (2020)

    Google Scholar 

  16. Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., Sun, M.: How does NLP benefit legal system: a summary of legal artificial intelligence. arXiv preprint arXiv:2004.12158 (2020)

Download references

Acknowledgments

This research is supported by a grant from the Indiana University Racial Justice Research Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ece Gumusel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gumusel, E., Malic, V.Q., Donaldson, D.R., Ashley, K., Liu, X. (2022). An Annotation Schema for the Detection of Social Bias in Legal Text Corpora. In: Smits, M. (eds) Information for a Better World: Shaping the Global Future. iConference 2022. Lecture Notes in Computer Science(), vol 13192. Springer, Cham. https://doi.org/10.1007/978-3-030-96957-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96957-8_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96956-1

  • Online ISBN: 978-3-030-96957-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics