Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction
Introduction
Judicial Decision Prediction (JDP) refers to determining appropriate judicial decisions (e.g., relevant law articles, charges, and term of penalty) for a criminal case based on its corresponding fact description. JDP is a promising application in legal assistant systems. It helps non-legal professionals to obtain approximate judgments and professionals (e.g., lawyers and judges) to view as a handy reference. Also, JDP is a challenging task compared to fundamental applications of NLP. On the one hand, it contains multiple subtasks, i.e., law article prediction, charge prediction, and term of the penalty prediction, denoted as Task 1, 2, and 3, respectively. On the other hand, it involves a higher level of reasoning and understanding.
In recent years, many methods [1], [2], [3], [4] have been proposed for single subtask (e.g., charge prediction), ignoring the fact that JDP consists of multiple subtasks and possesses the following three properties.
As aforementioned, JDP contains three subtasks, namely, law article prediction, charge prediction and term of penalty prediction. The three subtasks tend to represent some similarities while making predictions according to the given fact description. For example, as shown in Fig. 1, all subtasks are apt to focus on the keyword ”steal” stated in fact description. In other words, the keyword ”steal” could facilitate the prediction of the three subtasks. Through the keyword ”steal”, Task 1 can easily predict the law articles related to it, Taks 2 can directly determine the charge is theft, and Task 3 can predict the approximate term of penalty. We define the similarities among subtasks as commonalities.
In addition to the commonalities among subtasks, each subtask possess some unique characteristics. For example, different subtasks have different emphasizes on different contents of the fact description while making predictions. In detail, as shown in Fig. 1, Task 3 is particularly sensitive to some digits/numbers in the fact description, that is, ”6” and ”10,550”. The numbers represent the number of thefts committed and the amount of money stolen, respectively. The larger the number, the longer the term of penalty predicted. In contrast, Tasks 1 and 2 are less sensitive to numbers, that is, the number of thefts committed and the amount of money stolen have little or nearly no effects on the law and charge prediction. We call the properties specific to a subtask as specificities.
In contrast to the conspicuousness of commonalities and specificities, the relationships among subtasks may be more obscure. During judgments, a judge conforms to a judicial logic. The judge first deduces relevant laws violated by a defendant based on fact description, then determines the charges according to the laws, and finally determines the term of penalty by considering the laws and the charges comprehensively [5]. Besides, subtasks contribute differently to each other. As shown in Fig. 1, the content of the law article stipulates the term of imprisonment (i.e., “fixed-term imprisonment of no more than three years”), which can directly facilitate the prediction of term of penalty (“fixed-term imprisonment of 14 ”). Such implicit relationships among subtasks are defined as dependencies.
In this work, we propose a novel Commonalities-, Specificities-, and Dependencies-Enhanced Multi-Task Learning Network, abbreviated as CSDNet, to model multiple subtasks jointly leveraging the three properties aforementioned. Moreover, while handling the dependencies, CSDNet is consistent with the intuition that each subtask contributes to other subtasks to varying degrees and can auto-learn the degrees by a well-designed learning module. Identifying the “degree of contribution” enables one subtask to learn auxiliary information from other subtasks accurately, thereby hugely reducing noise interferences. Even so, the noise interferences cannot be wholly eradicated. In order to minimize noise interferences among subtasks to the hilt, we adopt a similar approach with the learning module to devise a denoising module, that is, learning the “degree of contamination” caused by one subtask to other subtasks. Through “degree of contamination”, one subtask can properly eliminate the noise interferences caused by the interaction with other subtasks. After learning and denoising modules, the original information of one subtask may be suffering from the occupation of other subtasks and lose its dominance. Therefore, to avoid this situation, we invent a reinforcement module to learn the “degree of enhancement” of a subtask to ensure that each subtask still occupies a dominant position after incorporating the information of other subtasks.
Notably, to the best of our knowledge, we are the first to incorporate the commonalities, specificities, and dependencies in a unified network and the first to figure out the issue of noise interferences among subtasks in JDP. Besides, we propose a combination of LSTMs [6], named Q-LSTM, and make it as the fundamental component of CSDNet.
In summary, our key contributions lie on the following aspects:
- •
We propose a novel multi-task learning network to address multiple subtasks of JDP incorporating the three properties, i.e., Commonalities, Specificities, and Dependencies.
- •
We elaborate a Learning Module to let each subtask to learn contributions from other subtasks to varying degrees, a Denoising Module to eliminate noise interferences, and a Reinforcing Module to achieve further enhancement.
- •
We invent a combination of LSTMs, named Q-LSTM, which excels conventional deep neural networks, i.e., CNN, LSTM, HLSTM and stacked BiLSTM over single subtask of JDP. In addition, the proposed Q-LSTM shows competitive generalization performance in text classification, named entity recognition and part of speech tagging tasks.
- •
Extensive experiments on two datasets demonstrate that CSDNet significantly and consistently exceeds previous state-of-the-art methods across all representative subtasks.
Section snippets
Judicial decision prediction
JDP has achieved great progress [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. To elevate the result of charge prediction, Luo et al. [17] extract relevant law articles as legal assistance and uses a hierarchical structure to encode fact description and extracted law articles. Jiang et al. [1] leverage reinforcement learning to select rationales (short, readable, and decisive snippets) as the explanation of charge prediction. To correctly recognize few-shot charges (e.g., scalping
Problem definition
JDP aims to predict the judicial decisions according to the fact description of a given criminal case. It includes several subtasks, and the subtasks are not independent but related to each other. Each subtask can be defined as a text classification problem, with the same fact description as the input, but output different categories. Different subtasks have different numbers of categories. Suppose JDP contains ksubtasks , the input of each subtask is the same fact description X, a
Methodology
In this section, we will introduce the proposed Commonalities-, Specificities- and Dependencies-Enhanced Multi-Task Learning Network in detail. Fig. 2 depicts the overview of CSDNet. Before we introduce the network, we first introduce the fundamental component that constitutes it, namely Q-LSTM.
Experiments
We take the commonalities, specificities, and dependencies into account and innovatively integrate them into a unified framework. For the sake of thoroughly demonstrating the superiority of CSDNet, we choose three classic subtasks of JDP, i.e., law article prediction, charge prediction, and term of penalty prediction, in line with previous researches.
Conclusion
In this paper, we propose a novel multi-task learning network, CSDNet, to jointly tackle multiple subtasks incorporating the properties, i.e., Commonalities, Specificities, and Dependencies. Particularly, while handling the Dependencies, we elaborate a learning module to learn contributions to varying degrees, a denoising module to eliminate noise interferences, and a reinforcing module to enhance each subtask. Notably, we are the first to unify multiple subtasks accompanied by the three
CRediT authorship contribution statement
Fanglong Yao: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Visualization. Xian Sun: Validation, Writing - original draft, Supervision. Hongfeng Yu: Formal analysis, Resources, Writing - review & editing. Wenkai Zhang: Data curation, Writing - review & editing. Kun Fu: Funding acquisition, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work is supported by the National Science Fund for Distinguished Young Scholars of China under Grant #61725105.
Fanglong Yao received the B.Sc. degree from Inner Mongolia University, Hohhot, China, in 2017. He is currently pursuing the Ph.D. degree with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.
His research interests include deep learning, natural language processing and multi-modal learning.
References (43)
- et al.
Interpretable rationale augmented charge prediction system
- et al.
Few-shot charge prediction with discriminative legal attributes
- C. He, L. Peng, Y. Le, J. He, Secaps: A sequence enhanced capsule model for charge prediction, arXiv preprint...
- A.H.N. Tran, Applying deep neural network to retrieve relevant civil law articles, in: Proceedings of the Student...
- et al.
Legal judgment prediction via topological learning
- et al.
Long short-term memory
Neural Comput.
(1997) Predicting supreme court decisions mathematically: a quantitative analysis of the ’right to counsel’ cases
Am. Political Sci. Rev.
(1957)Quantitative analysis of judicial processes: Some practical and theoretical applications
Law Contemp. Probs.
(1963)Applying correlation analysis to case prediction
Tex. L. Rev.
(1963)Mathematical models for legal prediction
Computer/lj
(1980)
Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981
Am Political Sci Rev
The supreme court’s many median justices
Am. Political Sci. Rev.
Predicting judicial decisions of the european court of human rights: A natural language processing perspective
PeerJ Computer Sci.
Learning to predict charges for criminal cases with legal basis
Cited by (4)
Using Machine Learning to Predict Public Prosecution Judges Decisions in Moroccan Courts
2023, Procedia Computer ScienceMVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords
2022, Knowledge-Based SystemsCitation Excerpt :Inspired by the study [10], Xu et al. [22] designed a graph distillation operator combined with a graph neural network to distinguish confusing legal decisions. Another sub-stream of related work centers attention on extracting legal attributes or adding input information, so as to enhance the performance of LJP [8,13,15,23–25], such as fact description, defendant information, court opinions, discriminatory attributes and elements of law, etc. For instance, Xu et al. [8] suggested extracting crime keywords from fact descriptions and integrating them into LJP for multi-task learning.
Optimization of the Economic and Trade Management Legal Model Based on the Support Vector Machine Algorithm and Logistic Regression Algorithm
2022, Mathematical Problems in Engineering
Fanglong Yao received the B.Sc. degree from Inner Mongolia University, Hohhot, China, in 2017. He is currently pursuing the Ph.D. degree with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.
His research interests include deep learning, natural language processing and multi-modal learning.
Xian Sun received the B.Sc. degree from Beihang University, Beijing, China, in 2004, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2006 and 2009, respectively.
He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote-sensing image understanding.
Hongfeng Yu received the B.Sc. degree and M.Sc degree from Peking University, Beijing, China, in 2013 and 2016 respectively. He is currently a Research Assistant at the Institute of Electronics, Chinese Academy of Sciences.
His research interests include deep learning and natural language processing.
Wenkai Zhang received the B.Sc. degree from China University of Petroleum, Shandong, China, in 2013, and the Ph.D. degree from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2018. He is currently a Research Assistant at Aerospace Information Research Institute, Chinese Academy of Sciences.
His research interests include remote sensing image semantic segmentation and multi-media information processing.
Kun Fu received the B.Sc., M.Sc., and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1995, 1999, and 2002, respectively.
He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, remote sensing image understanding, geospatial data mining, and visualization.