skip to main content
research-article

Contrastive Learning for Legal Judgment Prediction

Authors Info & Claims
Published:21 April 2023Publication History
Skip Abstract Section

Abstract

Legal judgment prediction (LJP) is a fundamental task of legal artificial intelligence. It aims to automatically predict the judgment results of legal cases. Three typical subtasks are relevant law article prediction, charge prediction, and term-of-penalty prediction. Due to the wide range of potential applications, LJP has attracted a great deal of interest, prompting the development of numerous approaches. These methods mainly focus on building a more accurate representation of a case’s fact description in order to improve the performance of judgment prediction. They overlook, however, the practical judicial scenario in which human judges often compare similar law articles or possible charges before making a final decision. To this end, we propose a supervised contrastive learning framework for the LJP task. Specifically, we train the model to distinguish (1) various law articles within the same chapter of a Law and (2) similar charges of the same law article or related law articles. By this means, the fine-grained differences between similar articles/charges can be captured, which are important for making a judgment. Besides, we optimize our model by identifying cases with the same article/charge labels, allowing it to more effectively model the relationship between the case’s fact description and its associated labels. By jointly learning the LJP task with the aforementioned contrastive learning tasks, our model achieves better performance than the state-of-the-art models on two real-world datasets.

REFERENCES

  1. [1] Chen Huajie, Cai Deng, Dai Wei, Dai Zehui, and Ding Yadong. 2019. Charge-based prison term prediction with deep gating network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 63616366. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Ting, Kornblith Simon, Norouzi Mohammad, and Hinton Geoffrey E.. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 15971607. http://proceedings.mlr.press/v119/chen20j.html.Google ScholarGoogle Scholar
  3. [3] Cui Yiming, Che Wanxiang, Liu Ting, Qin Bing, and Yang Ziqing. 2021. Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans. Audio Speech Lang. Process. 29 (2021), 35043514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). Association for Computational Linguistics, 41714186. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Dong Qian and Niu Shuzi. 2021. Legal judgment prediction via relational learning. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 983992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Gao Tianyu, Yao Xingcheng, and Chen Danqi. 2021. SimCSE: Simple contrastive learning of sentence embeddings. CoRR abs/2104.08821 (2021). arxiv:2104.08821 https://arxiv.org/abs/2104.08821.Google ScholarGoogle Scholar
  7. [7] Gardner Anne von der Lieth. 1984. An Artificial Intelligence Approach to Legal Reasoning. Ph. D. Dissertation. Stanford University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Hadsell Raia, Chopra Sumit, and LeCun Yann. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE Computer Society, 17351742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] He Kaiming, Fan Haoqi, Wu Yuxin, Xie Saining, and Girshick Ross B.. 2020. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 97269735. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Hu Zikun, Li Xiang, Tu Cunchao, Liu Zhiyuan, and Sun Maosong. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). Association for Computational Linguistics, 487498. https://aclanthology.org/C18-1041/.Google ScholarGoogle Scholar
  11. [11] Khosla Prannay, Teterwak Piotr, Wang Chen, Sarna Aaron, Tian Yonglong, Isola Phillip, Maschinot Aaron, Liu Ce, and Krishnan Dilip. 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS’20), virtual. https://proceedings.neurips.cc/paper/2020/hash/d89a66c7c80a29b1bdbab0f2a1a94af8-Abstract.html.Google ScholarGoogle Scholar
  12. [12] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), A meeting of SIGDAT, a Special Interest Group of the ACL, Moschitti Alessandro, Pang Bo, and Daelemans Walter (Eds.). ACL, 17461751. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Kort Fred. 1957. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases. American Political Science Review 51, 1 (1957), 112.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Lai Siwei, Xu Liheng, Liu Kang, and Zhao Jun. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Bonet Blai and Koenig Sven (Eds.). AAAI Press, 22672273. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Loshchilov Ilya and Hutter Frank. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7.Google ScholarGoogle Scholar
  16. [16] Luo Bingfeng, Feng Yansong, Xu Jianbo, Zhang Xiang, and Zhao Dongyan. 2017. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). Association for Computational Linguistics, 27272736. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Ma Luyao, Zhang Yating, Wang Tianyi, Liu Xiaozhong, Ye Wei, Sun Changlong, and Zhang Shikun. 2021. Legal judgment prediction with multi-stage case representation learning in the real court setting. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 9931002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Meng Yu, Xiong Chenyan, Bajaj Payal, Tiwary Saurabh, Bennett Paul, Han Jiawei, and Song Xia. 2021. COCO-LM: Correcting and contrasting text sequences for language model pretraining. CoRR abs/2102.08473 (2021). arxiv:2102.08473 https://arxiv.org/abs/2102.08473.Google ScholarGoogle Scholar
  19. [19] Mikolov Tomás, Sutskever Ilya, Chen Kai, Corrado Gregory S., and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013. 31113119. https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.Google ScholarGoogle Scholar
  20. [20] Nagel Stuart S.. 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42 (1963), 1006.Google ScholarGoogle Scholar
  21. [21] Segal Jeffrey A.. 1984. Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. American Political Science Review 78, 4 (1984), 891900.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Su Zhan, Dou Zhicheng, Zhu Yutao, Qin Xubo, and Wen Ji-Rong. 2021. Modeling intent graph for search result diversification. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event, Diaz Fernando, Shah Chirag, Suel Torsten, Castells Pablo, Jones Rosie, and Sakai Tetsuya (Eds.). ACM, 736746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Sun Maosong, Chen Xinxiong, Zhang Kaixu, Guo Zhipeng, and Liu Zhiyuan. 2016. Thulac: An efficient lexical analyzer for Chinese.Google ScholarGoogle Scholar
  24. [24] Suykens Johan A. K. and Vandewalle Joos. 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9, 3 (1999), 293300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Ulmer S. Sidney. 1963. Quantitative analysis of judicial processes: Some practical and theoretical applications. Law and Contemporary Problems 28, 1 (1963), 164184.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Valvoda Josef, Cotterell Ryan, and Teufel Simone. 2022. On the role of negative precedent in legal outcome prediction. CoRR abs/2208.08225 (2022). arXiv:2208.08225Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Oord Aäron van den, Li Yazhe, and Vinyals Oriol. 2018. Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748.Google ScholarGoogle Scholar
  28. [28] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. 59986008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google ScholarGoogle Scholar
  29. [29] Wang Pengfei, Fan Yu, Niu Shuzi, Yang Ze, Zhang Yongfeng, and Guo Jiafeng. 2019. Hierarchical matching network for crime classification. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 325334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Wang Tongzhou and Isola Phillip. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 99299939. http://proceedings.mlr.press/v119/wang20k.html.Google ScholarGoogle Scholar
  31. [31] Wu Yiquan, Kuang Kun, Zhang Yating, Liu Xiaozhong, Sun Changlong, Xiao Jun, Zhuang Yueting, Si Luo, and Wu Fei. 2020. De-biased court’s view generation with causality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20), Online, Webber Bonnie, Cohn Trevor, He Yulan, and Liu Yang (Eds.). Association for Computational Linguistics, 763780. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Wu Zhuofeng, Wang Sinong, Gu Jiatao, Khabsa Madian, Sun Fei, and Ma Hao. 2020. CLEAR: Contrastive learning for sentence representation. CoRR abs/2012.15466 (2020). arxiv:2012.15466 https://arxiv.org/abs/2012.15466.Google ScholarGoogle Scholar
  33. [33] Xiao Chaojun, Zhong Haoxi, Guo Zhipeng, Tu Cunchao, Liu Zhiyuan, Sun Maosong, Feng Yansong, Han Xianpei, Hu Zhen, Wang Heng, and Xu Jianfeng. 2018. CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018). arXiv:1807.02478 http://arxiv.org/abs/1807.02478.Google ScholarGoogle Scholar
  34. [34] Xu Nuo, Wang Pinghui, Chen Long, Pan Li, Wang Xiaoyan, and Zhao Junzhou. 2020. Distinguish confusing law articles for legal judgment prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online. Association for Computational Linguistics, 30863095. Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Yang Wenmian, Jia Weijia, Zhou Xiaojie, and Luo Yutao. 2019. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). ijcai.org, 40854091. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Yue Linan, Liu Qi, Jin Binbin, Wu Han, Zhang Kai, An Yanqing, Cheng Mingyue, Yin Biao, and Wu Dayong. 2021. NeurJudge: A circumstance-aware neural framework for legal judgment prediction. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 973982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Zhang Han, Dou Zhicheng, Zhu Yutao, and Wen Jirong. 2021. Few-shot charge prediction with multi-grained features and mutual information. In Chinese Computational Linguistics - 20th China National Conference (CCL’21), Proceedings(Lecture Notes in Computer Science, Vol. 12869), Li Sheng, Sun Maosong, Liu Yang, Wu Hua, Liu Kang, Che Wanxiang, He Shizhu, and Rao Gaoqi (Eds.). Springer, 387403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Zhong Haoxi, Guo Zhipeng, Tu Cunchao, Xiao Chaojun, Liu Zhiyuan, and Sun Maosong. 2018. Legal judgment prediction via topological learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 35403549. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Zhong Haoxi, Zhang Zhengyan, Liu Zhiyuan, and Sun Maosong. 2019. Open Chinese Language Pre-trained Model Zoo. Technical Report. https://github.com/thunlp/openclap.Google ScholarGoogle Scholar
  40. [40] Zhou Yujia, Dou Zhicheng, Zhu Yutao, and Wen Ji-Rong. 2021. PSSL: Self-supervised learning for personalized search with contrastive sampling. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 27492758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Zhu Yutao, Nie Jian-Yun, Dou Zhicheng, Ma Zhengyi, Zhang Xinyu, Du Pan, Zuo Xiaochen, and Jiang Hao. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 27802791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Zhu Yutao, Nie Jian-Yun, Dou Zhicheng, Ma Zhengyi, Zhang Xinyu, Du Pan, Zuo Xiaochen, and Jiang Hao. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event, Demartini Gianluca, Zuccon Guido, Culpepper J. Shane, Huang Zi, and Tong Hanghang (Eds.). ACM, 27802791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zhu Yutao, Zhou Kun, Nie Jian-Yun, Liu Shengchao, and Dou Zhicheng. 2021. Neural sentence ordering based on constraint graphs. In 35th AAAI Conference on Artificial Intelligence (AAAI’21), 33rd Conference on Innovative Applications of Artificial Intelligence (IAAI’21), 11th Symposium on Educational Advances in Artificial Intelligence (EAAI’21), Virtual Event. AAAI Press, 1465614664. https://ojs.aaai.org/index.php/AAAI/article/view/17722.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Contrastive Learning for Legal Judgment Prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 41, Issue 4
      October 2023
      958 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3587261
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 April 2023
      • Online AM: 18 January 2023
      • Accepted: 5 January 2023
      • Revised: 4 November 2022
      • Received: 2 August 2022
      Published in tois Volume 41, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format