research-article

Contrastive Learning for Legal Judgment Prediction

Authors:
Han Zhang

School of Information, Renmin University of China, Beijing, China

School of Information, Renmin University of China, Beijing, China

0000-0002-6254-7138
View Profile

,
Zhicheng Dou

Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

0000-0002-9781-948X
View Profile

,
Yutao Zhu

University of Montreal, Montreal, Canada

University of Montreal, Montreal, Canada

0000-0002-9432-3251
View Profile

,
Ji-Rong Wen

Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education, China, and Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education, China, and Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China

0000-0002-9777-9676
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 41 Issue 4Article No.: 113pp 1–25https://doi.org/10.1145/3580489

Published:21 April 2023Publication History

ACM Transactions on Information Systems

Abstract

Legal judgment prediction (LJP) is a fundamental task of legal artificial intelligence. It aims to automatically predict the judgment results of legal cases. Three typical subtasks are relevant law article prediction, charge prediction, and term-of-penalty prediction. Due to the wide range of potential applications, LJP has attracted a great deal of interest, prompting the development of numerous approaches. These methods mainly focus on building a more accurate representation of a case’s fact description in order to improve the performance of judgment prediction. They overlook, however, the practical judicial scenario in which human judges often compare similar law articles or possible charges before making a final decision. To this end, we propose a supervised contrastive learning framework for the LJP task. Specifically, we train the model to distinguish (1) various law articles within the same chapter of a Law and (2) similar charges of the same law article or related law articles. By this means, the fine-grained differences between similar articles/charges can be captured, which are important for making a judgment. Besides, we optimize our model by identifying cases with the same article/charge labels, allowing it to more effectively model the relationship between the case’s fact description and its associated labels. By jointly learning the LJP task with the aforementioned contrastive learning tasks, our model achieves better performance than the state-of-the-art models on two real-world datasets.

REFERENCES

[1] Chen Huajie, Cai Deng, Dai Wei, Dai Zehui, and Ding Yadong. 2019. Charge-based prison term prediction with deep gating network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6361–6366. Google ScholarCross Ref
[2] Chen Ting, Kornblith Simon, Norouzi Mohammad, and Hinton Geoffrey E.. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597–1607. http://proceedings.mlr.press/v119/chen20j.html.Google Scholar
[3] Cui Yiming, Che Wanxiang, Liu Ting, Qin Bing, and Yang Ziqing. 2021. Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans. Audio Speech Lang. Process. 29 (2021), 3504–3514. Google ScholarDigital Library
[4] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. Google ScholarCross Ref
[5] Dong Qian and Niu Shuzi. 2021. Legal judgment prediction via relational learning. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 983–992. Google ScholarDigital Library
[6] Gao Tianyu, Yao Xingcheng, and Chen Danqi. 2021. SimCSE: Simple contrastive learning of sentence embeddings. CoRR abs/2104.08821 (2021). arxiv:2104.08821 https://arxiv.org/abs/2104.08821.Google Scholar
[7] Gardner Anne von der Lieth. 1984. An Artificial Intelligence Approach to Legal Reasoning. Ph. D. Dissertation. Stanford University.Google ScholarDigital Library
[8] Hadsell Raia, Chopra Sumit, and LeCun Yann. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE Computer Society, 1735–1742. Google ScholarDigital Library
[9] He Kaiming, Fan Haoqi, Wu Yuxin, Xie Saining, and Girshick Ross B.. 2020. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 9726–9735. Google ScholarCross Ref
[10] Hu Zikun, Li Xiang, Tu Cunchao, Liu Zhiyuan, and Sun Maosong. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). Association for Computational Linguistics, 487–498. https://aclanthology.org/C18-1041/.Google Scholar
[11] Khosla Prannay, Teterwak Piotr, Wang Chen, Sarna Aaron, Tian Yonglong, Isola Phillip, Maschinot Aaron, Liu Ce, and Krishnan Dilip. 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS’20), virtual. https://proceedings.neurips.cc/paper/2020/hash/d89a66c7c80a29b1bdbab0f2a1a94af8-Abstract.html.Google Scholar
[12] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), A meeting of SIGDAT, a Special Interest Group of the ACL, Moschitti Alessandro, Pang Bo, and Daelemans Walter (Eds.). ACL, 1746–1751. Google ScholarCross Ref
[13] Kort Fred. 1957. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases. American Political Science Review 51, 1 (1957), 1–12.Google ScholarCross Ref
[14] Lai Siwei, Xu Liheng, Liu Kang, and Zhao Jun. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Bonet Blai and Koenig Sven (Eds.). AAAI Press, 2267–2273. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745.Google ScholarDigital Library
[15] Loshchilov Ilya and Hutter Frank. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7.Google Scholar
[16] Luo Bingfeng, Feng Yansong, Xu Jianbo, Zhang Xiang, and Zhao Dongyan. 2017. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). Association for Computational Linguistics, 2727–2736. Google ScholarCross Ref
[17] Ma Luyao, Zhang Yating, Wang Tianyi, Liu Xiaozhong, Ye Wei, Sun Changlong, and Zhang Shikun. 2021. Legal judgment prediction with multi-stage case representation learning in the real court setting. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 993–1002. Google ScholarDigital Library
[18] Meng Yu, Xiong Chenyan, Bajaj Payal, Tiwary Saurabh, Bennett Paul, Han Jiawei, and Song Xia. 2021. COCO-LM: Correcting and contrasting text sequences for language model pretraining. CoRR abs/2102.08473 (2021). arxiv:2102.08473 https://arxiv.org/abs/2102.08473.Google Scholar
[19] Mikolov Tomás, Sutskever Ilya, Chen Kai, Corrado Gregory S., and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013. 3111–3119. https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.Google Scholar
[20] Nagel Stuart S.. 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42 (1963), 1006.Google Scholar
[21] Segal Jeffrey A.. 1984. Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. American Political Science Review 78, 4 (1984), 891–900.Google ScholarCross Ref
[22] Su Zhan, Dou Zhicheng, Zhu Yutao, Qin Xubo, and Wen Ji-Rong. 2021. Modeling intent graph for search result diversification. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event, Diaz Fernando, Shah Chirag, Suel Torsten, Castells Pablo, Jones Rosie, and Sakai Tetsuya (Eds.). ACM, 736–746. Google ScholarDigital Library
[23] Sun Maosong, Chen Xinxiong, Zhang Kaixu, Guo Zhipeng, and Liu Zhiyuan. 2016. Thulac: An efficient lexical analyzer for Chinese.Google Scholar
[24] Suykens Johan A. K. and Vandewalle Joos. 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9, 3 (1999), 293–300. Google ScholarDigital Library
[25] Ulmer S. Sidney. 1963. Quantitative analysis of judicial processes: Some practical and theoretical applications. Law and Contemporary Problems 28, 1 (1963), 164–184.Google ScholarCross Ref
[26] Valvoda Josef, Cotterell Ryan, and Teufel Simone. 2022. On the role of negative precedent in legal outcome prediction. CoRR abs/2208.08225 (2022). arXiv:2208.08225Google ScholarCross Ref
[27] Oord Aäron van den, Li Yazhe, and Vinyals Oriol. 2018. Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748.Google Scholar
[28] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google Scholar
[29] Wang Pengfei, Fan Yu, Niu Shuzi, Yang Ze, Zhang Yongfeng, and Guo Jiafeng. 2019. Hierarchical matching network for crime classification. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 325–334. Google ScholarDigital Library
[30] Wang Tongzhou and Isola Phillip. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 9929–9939. http://proceedings.mlr.press/v119/wang20k.html.Google Scholar
[31] Wu Yiquan, Kuang Kun, Zhang Yating, Liu Xiaozhong, Sun Changlong, Xiao Jun, Zhuang Yueting, Si Luo, and Wu Fei. 2020. De-biased court’s view generation with causality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20), Online, Webber Bonnie, Cohn Trevor, He Yulan, and Liu Yang (Eds.). Association for Computational Linguistics, 763–780. Google ScholarCross Ref
[32] Wu Zhuofeng, Wang Sinong, Gu Jiatao, Khabsa Madian, Sun Fei, and Ma Hao. 2020. CLEAR: Contrastive learning for sentence representation. CoRR abs/2012.15466 (2020). arxiv:2012.15466 https://arxiv.org/abs/2012.15466.Google Scholar
[33] Xiao Chaojun, Zhong Haoxi, Guo Zhipeng, Tu Cunchao, Liu Zhiyuan, Sun Maosong, Feng Yansong, Han Xianpei, Hu Zhen, Wang Heng, and Xu Jianfeng. 2018. CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018). arXiv:1807.02478 http://arxiv.org/abs/1807.02478.Google Scholar
[34] Xu Nuo, Wang Pinghui, Chen Long, Pan Li, Wang Xiaoyan, and Zhao Junzhou. 2020. Distinguish confusing law articles for legal judgment prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online. Association for Computational Linguistics, 3086–3095. Google ScholarCross Ref
[35] Yang Wenmian, Jia Weijia, Zhou Xiaojie, and Luo Yutao. 2019. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). ijcai.org, 4085–4091. Google ScholarCross Ref
[36] Yue Linan, Liu Qi, Jin Binbin, Wu Han, Zhang Kai, An Yanqing, Cheng Mingyue, Yin Biao, and Wu Dayong. 2021. NeurJudge: A circumstance-aware neural framework for legal judgment prediction. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 973–982. Google ScholarDigital Library
[37] Zhang Han, Dou Zhicheng, Zhu Yutao, and Wen Jirong. 2021. Few-shot charge prediction with multi-grained features and mutual information. In Chinese Computational Linguistics - 20th China National Conference (CCL’21), Proceedings(Lecture Notes in Computer Science, Vol. 12869), Li Sheng, Sun Maosong, Liu Yang, Wu Hua, Liu Kang, Che Wanxiang, He Shizhu, and Rao Gaoqi (Eds.). Springer, 387–403. Google ScholarDigital Library
[38] Zhong Haoxi, Guo Zhipeng, Tu Cunchao, Xiao Chaojun, Liu Zhiyuan, and Sun Maosong. 2018. Legal judgment prediction via topological learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3540–3549. Google ScholarCross Ref
[39] Zhong Haoxi, Zhang Zhengyan, Liu Zhiyuan, and Sun Maosong. 2019. Open Chinese Language Pre-trained Model Zoo. Technical Report. https://github.com/thunlp/openclap.Google Scholar
[40] Zhou Yujia, Dou Zhicheng, Zhu Yutao, and Wen Ji-Rong. 2021. PSSL: Self-supervised learning for personalized search with contrastive sampling. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2749–2758. Google ScholarDigital Library
[41] Zhu Yutao, Nie Jian-Yun, Dou Zhicheng, Ma Zhengyi, Zhang Xinyu, Du Pan, Zuo Xiaochen, and Jiang Hao. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2780–2791. Google ScholarDigital Library
[42] Zhu Yutao, Nie Jian-Yun, Dou Zhicheng, Ma Zhengyi, Zhang Xinyu, Du Pan, Zuo Xiaochen, and Jiang Hao. 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event, Demartini Gianluca, Zuccon Guido, Culpepper J. Shane, Huang Zi, and Tong Hanghang (Eds.). ACM, 2780–2791. Google ScholarDigital Library
[43] Zhu Yutao, Zhou Kun, Nie Jian-Yun, Liu Shengchao, and Dou Zhicheng. 2021. Neural sentence ordering based on constraint graphs. In 35th AAAI Conference on Artificial Intelligence (AAAI’21), 33rd Conference on Innovative Applications of Artificial Intelligence (IAAI’21), 11th Symposium on Educational Advances in Artificial Intelligence (EAAI’21), Virtual Event. AAAI Press, 14656–14664. https://ojs.aaai.org/index.php/AAAI/article/view/17722.Google ScholarCross Ref

Index Terms

Contrastive Learning for Legal Judgment Prediction
1. Applied computing
  1. Law, social and behavioral sciences
    1. Law

Recommendations

ML-LJP: Multi-Law Aware Legal Judgment Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Legal judgment prediction (LJP) is a significant task in legal intelligence, which aims to assist the judges and determine the judgment result based on the case's fact description. The judgment result consists of law articles, charge, and prison term. ...
Read More
Mulan: A Multiple Residual Article-Wise Attention Network for Legal Judgment Prediction
Legal judgment prediction (LJP) is used to predict judgment results based on the description of individual legal cases. In order to be more suitable for actual application scenarios in which the case has cited multiple articles and has multiple charges, ...
Read More
Legal Judgment Prediction Incorporating Guiding Cases Matching
Natural Language Processing and Chinese Computing
Abstract
Legal judgment prediction aims to predict the judgment result based on the case fact description. It is an important application of natural language processing within the legal field. To enhance the impartiality and consistency of the judiciary, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 41, Issue 4
October 2023
958 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3587261
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2023
- Online AM: 18 January 2023
- Accepted: 5 January 2023
- Revised: 4 November 2022
- Received: 2 August 2022
Published in tois Volume 41, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
legal judgment prediction
supervised contrastive learning
legal artificial intelligence
law
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 939
  Total Downloads
- Downloads (Last 12 months)705
- Downloads (Last 6 weeks)107
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Contrastive Learning for Legal Judgment Prediction

ACM Transactions on Information Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

ML-LJP: Multi-Law Aware Legal Judgment Prediction

Mulan: A Multiple Residual Article-Wise Attention Network for Legal Judgment Prediction

Legal Judgment Prediction Incorporating Guiding Cases Matching