Abstract
Legal judgment prediction (LJP) is a fundamental task of legal artificial intelligence. It aims to automatically predict the judgment results of legal cases. Three typical subtasks are relevant law article prediction, charge prediction, and term-of-penalty prediction. Due to the wide range of potential applications, LJP has attracted a great deal of interest, prompting the development of numerous approaches. These methods mainly focus on building a more accurate representation of a case’s fact description in order to improve the performance of judgment prediction. They overlook, however, the practical judicial scenario in which human judges often compare similar law articles or possible charges before making a final decision. To this end, we propose a supervised contrastive learning framework for the LJP task. Specifically, we train the model to distinguish (1) various law articles within the same chapter of a Law and (2) similar charges of the same law article or related law articles. By this means, the fine-grained differences between similar articles/charges can be captured, which are important for making a judgment. Besides, we optimize our model by identifying cases with the same article/charge labels, allowing it to more effectively model the relationship between the case’s fact description and its associated labels. By jointly learning the LJP task with the aforementioned contrastive learning tasks, our model achieves better performance than the state-of-the-art models on two real-world datasets.
- [1] . 2019. Charge-based prison term prediction with deep gating network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6361–6366. Google ScholarCross Ref
- [2] . 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(
Proceedings of Machine Learning Research , Vol. 119). PMLR, 1597–1607. http://proceedings.mlr.press/v119/chen20j.html.Google Scholar - [3] . 2021. Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans. Audio Speech Lang. Process. 29 (2021), 3504–3514. Google ScholarDigital Library
- [4] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. Google ScholarCross Ref
- [5] . 2021. Legal judgment prediction via relational learning. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 983–992. Google ScholarDigital Library
- [6] . 2021. SimCSE: Simple contrastive learning of sentence embeddings. CoRR abs/2104.08821 (2021).
arxiv:2104.08821 https://arxiv.org/abs/2104.08821.Google Scholar - [7] . 1984. An Artificial Intelligence Approach to Legal Reasoning. Ph. D. Dissertation. Stanford University.Google ScholarDigital Library
- [8] . 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE Computer Society, 1735–1742. Google ScholarDigital Library
- [9] . 2020. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). Computer Vision Foundation/IEEE, 9726–9735. Google ScholarCross Ref
- [10] . 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). Association for Computational Linguistics, 487–498. https://aclanthology.org/C18-1041/.Google Scholar
- [11] . 2020. Supervised contrastive learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS’20), virtual. https://proceedings.neurips.cc/paper/2020/hash/d89a66c7c80a29b1bdbab0f2a1a94af8-Abstract.html.Google Scholar
- [12] . 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), A meeting of SIGDAT, a Special Interest Group of the ACL, , , and (Eds.). ACL, 1746–1751. Google ScholarCross Ref
- [13] . 1957. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases. American Political Science Review 51, 1 (1957), 1–12.Google ScholarCross Ref
- [14] . 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, and (Eds.). AAAI Press, 2267–2273. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745.Google ScholarDigital Library
- [15] . 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations (ICLR’19). OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7.Google Scholar
- [16] . 2017. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). Association for Computational Linguistics, 2727–2736. Google ScholarCross Ref
- [17] . 2021. Legal judgment prediction with multi-stage case representation learning in the real court setting. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 993–1002. Google ScholarDigital Library
- [18] . 2021. COCO-LM: Correcting and contrasting text sequences for language model pretraining. CoRR abs/2102.08473 (2021).
arxiv:2102.08473 https://arxiv.org/abs/2102.08473.Google Scholar - [19] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013. 3111–3119. https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.Google Scholar
- [20] . 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42 (1963), 1006.Google Scholar
- [21] . 1984. Predicting supreme court cases probabilistically: The search and seizure cases, 1962-1981. American Political Science Review 78, 4 (1984), 891–900.Google ScholarCross Ref
- [22] . 2021. Modeling intent graph for search result diversification. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event, , , , , , and (Eds.). ACM, 736–746. Google ScholarDigital Library
- [23] . 2016. Thulac: An efficient lexical analyzer for Chinese.Google Scholar
- [24] . 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9, 3 (1999), 293–300. Google ScholarDigital Library
- [25] . 1963. Quantitative analysis of judicial processes: Some practical and theoretical applications. Law and Contemporary Problems 28, 1 (1963), 164–184.Google ScholarCross Ref
- [26] . 2022. On the role of negative precedent in legal outcome prediction. CoRR abs/2208.08225 (2022).
arXiv:2208.08225 Google ScholarCross Ref - [27] . 2018. Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018).
arXiv:1807.03748 http://arxiv.org/abs/1807.03748.Google Scholar - [28] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google Scholar
- [29] . 2019. Hierarchical matching network for crime classification. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 325–334. Google ScholarDigital Library
- [30] . 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Virtual Event(
Proceedings of Machine Learning Research , Vol. 119). PMLR, 9929–9939. http://proceedings.mlr.press/v119/wang20k.html.Google Scholar - [31] . 2020. De-biased court’s view generation with causality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20), Online, , , , and (Eds.). Association for Computational Linguistics, 763–780. Google ScholarCross Ref
- [32] . 2020. CLEAR: Contrastive learning for sentence representation. CoRR abs/2012.15466 (2020).
arxiv:2012.15466 https://arxiv.org/abs/2012.15466.Google Scholar - [33] . 2018. CAIL2018: A large-scale legal dataset for judgment prediction. CoRR abs/1807.02478 (2018).
arXiv:1807.02478 http://arxiv.org/abs/1807.02478.Google Scholar - [34] . 2020. Distinguish confusing law articles for legal judgment prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online. Association for Computational Linguistics, 3086–3095. Google ScholarCross Ref
- [35] . 2019. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). ijcai.org, 4085–4091. Google ScholarCross Ref
- [36] . 2021. NeurJudge: A circumstance-aware neural framework for legal judgment prediction. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21), Virtual Event. ACM, 973–982. Google ScholarDigital Library
- [37] . 2021. Few-shot charge prediction with multi-grained features and mutual information. In Chinese Computational Linguistics - 20th China National Conference (CCL’21), Proceedings(
Lecture Notes in Computer Science , Vol. 12869), , , , , , , , and (Eds.). Springer, 387–403. Google ScholarDigital Library - [38] . 2018. Legal judgment prediction via topological learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3540–3549. Google ScholarCross Ref
- [39] . 2019. Open Chinese Language Pre-trained Model Zoo.
Technical Report . https://github.com/thunlp/openclap.Google Scholar - [40] . 2021. PSSL: Self-supervised learning for personalized search with contrastive sampling. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2749–2758. Google ScholarDigital Library
- [41] . 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event. ACM, 2780–2791. Google ScholarDigital Library
- [42] . 2021. Contrastive learning of user behavior sequence for context-aware document ranking. In The 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event, , , , , and (Eds.). ACM, 2780–2791. Google ScholarDigital Library
- [43] . 2021. Neural sentence ordering based on constraint graphs. In 35th AAAI Conference on Artificial Intelligence (AAAI’21), 33rd Conference on Innovative Applications of Artificial Intelligence (IAAI’21), 11th Symposium on Educational Advances in Artificial Intelligence (EAAI’21), Virtual Event. AAAI Press, 14656–14664. https://ojs.aaai.org/index.php/AAAI/article/view/17722.Google ScholarCross Ref
Index Terms
- Contrastive Learning for Legal Judgment Prediction
Recommendations
ML-LJP: Multi-Law Aware Legal Judgment Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalLegal judgment prediction (LJP) is a significant task in legal intelligence, which aims to assist the judges and determine the judgment result based on the case's fact description. The judgment result consists of law articles, charge, and prison term. ...
Mulan: A Multiple Residual Article-Wise Attention Network for Legal Judgment Prediction
Legal judgment prediction (LJP) is used to predict judgment results based on the description of individual legal cases. In order to be more suitable for actual application scenarios in which the case has cited multiple articles and has multiple charges, ...
Legal Judgment Prediction Incorporating Guiding Cases Matching
Natural Language Processing and Chinese ComputingAbstractLegal judgment prediction aims to predict the judgment result based on the case fact description. It is an important application of natural language processing within the legal field. To enhance the impartiality and consistency of the judiciary, ...
Comments