Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction

doi:10.1016/j.neucom.2020.10.010

Neurocomputing

Volume 433, 14 April 2021, Pages 169-180

https://doi.org/10.1016/j.neucom.2020.10.010 Get rights and content

Abstract

Judicial Decision Prediction (JDP) aims to determine judicial decisions per the fact description of a criminal case. It comprises multiple subtasks, i.e., law article prediction, charge prediction, and term of the penalty prediction. Besides, there exist three properties among the subtasks, i.e., $Commonalities, Specificities$ , and Dependencies. Nonetheless, existing approaches are usually well-designed for only a specific subtask, or take one of the properties into consideration for multiple subtasks. In this paper, we propose a novel Commonalities-, Specificities- and Dependencies-Enhanced Multi-Task Learning Network, to unify multiple subtasks accompanied by the properties in a framework. Further, while handling the Dependencies, we elaborate a learning module to ensure each subtask to learn contributions from other subtasks to varying degrees, a denoising module to minimize noise interferences among subtasks, and a reinforcing module to guarantee further enhancement for each subtask. Experimental results on two widely used datasets demonstrate that our model significantly and consistently outperforms previous state-of-the-art methods on most evaluation metrics across all subtasks.

Introduction

Judicial Decision Prediction (JDP) refers to determining appropriate judicial decisions (e.g., relevant law articles, charges, and term of penalty) for a criminal case based on its corresponding fact description. JDP is a promising application in legal assistant systems. It helps non-legal professionals to obtain approximate judgments and professionals (e.g., lawyers and judges) to view as a handy reference. Also, JDP is a challenging task compared to fundamental applications of NLP. On the one hand, it contains multiple subtasks, i.e., law article prediction, charge prediction, and term of the penalty prediction, denoted as Task 1, 2, and 3, respectively. On the other hand, it involves a higher level of reasoning and understanding.

In recent years, many methods [1], [2], [3], [4] have been proposed for single subtask (e.g., charge prediction), ignoring the fact that JDP consists of multiple subtasks and possesses the following three properties.

As aforementioned, JDP contains three subtasks, namely, law article prediction, charge prediction and term of penalty prediction. The three subtasks tend to represent some similarities while making predictions according to the given fact description. For example, as shown in Fig. 1, all subtasks are apt to focus on the keyword ”steal” stated in fact description. In other words, the keyword ”steal” could facilitate the prediction of the three subtasks. Through the keyword ”steal”, Task 1 can easily predict the law articles related to it, Taks 2 can directly determine the charge is theft, and Task 3 can predict the approximate term of penalty. We define the similarities among subtasks as commonalities.

In addition to the commonalities among subtasks, each subtask possess some unique characteristics. For example, different subtasks have different emphasizes on different contents of the fact description while making predictions. In detail, as shown in Fig. 1, Task 3 is particularly sensitive to some digits/numbers in the fact description, that is, ”6” and ”10,550”. The numbers represent the number of thefts committed and the amount of money stolen, respectively. The larger the number, the longer the term of penalty predicted. In contrast, Tasks 1 and 2 are less sensitive to numbers, that is, the number of thefts committed and the amount of money stolen have little or nearly no effects on the law and charge prediction. We call the properties specific to a subtask as specificities.

In contrast to the conspicuousness of commonalities and specificities, the relationships among subtasks may be more obscure. During judgments, a judge conforms to a judicial logic. The judge first deduces relevant laws violated by a defendant based on fact description, then determines the charges according to the laws, and finally determines the term of penalty by considering the laws and the charges comprehensively [5]. Besides, subtasks contribute differently to each other. As shown in Fig. 1, the content of the law article stipulates the term of imprisonment (i.e., “fixed-term imprisonment of no more than three years”), which can directly facilitate the prediction of term of penalty (“fixed-term imprisonment of 14 $months$ ”). Such implicit relationships among subtasks are defined as dependencies.

In this work, we propose a novel Commonalities-, Specificities-, and Dependencies-Enhanced Multi-Task Learning Network, abbreviated as CSDNet, to model multiple subtasks jointly leveraging the three properties aforementioned. Moreover, while handling the dependencies, CSDNet is consistent with the intuition that each subtask contributes to other subtasks to varying degrees and can auto-learn the degrees by a well-designed learning module. Identifying the “degree of contribution” enables one subtask to learn auxiliary information from other subtasks accurately, thereby hugely reducing noise interferences. Even so, the noise interferences cannot be wholly eradicated. In order to minimize noise interferences among subtasks to the hilt, we adopt a similar approach with the learning module to devise a denoising module, that is, learning the “degree of contamination” caused by one subtask to other subtasks. Through “degree of contamination”, one subtask can properly eliminate the noise interferences caused by the interaction with other subtasks. After learning and denoising modules, the original information of one subtask may be suffering from the occupation of other subtasks and lose its dominance. Therefore, to avoid this situation, we invent a reinforcement module to learn the “degree of enhancement” of a subtask to ensure that each subtask still occupies a dominant position after incorporating the information of other subtasks.

Notably, to the best of our knowledge, we are the first to incorporate the commonalities, specificities, and dependencies in a unified network and the first to figure out the issue of noise interferences among subtasks in JDP. Besides, we propose a combination of LSTMs [6], named Q-LSTM, and make it as the fundamental component of CSDNet.

In summary, our key contributions lie on the following aspects:

•
We propose a novel multi-task learning network to address multiple subtasks of JDP incorporating the three properties, i.e., Commonalities, Specificities, and Dependencies.
•
We elaborate a Learning Module to let each subtask to learn contributions from other subtasks to varying degrees, a Denoising Module to eliminate noise interferences, and a Reinforcing Module to achieve further enhancement.
•
We invent a combination of LSTMs, named Q-LSTM, which excels conventional deep neural networks, i.e., CNN, LSTM, HLSTM and stacked BiLSTM over single subtask of JDP. In addition, the proposed Q-LSTM shows competitive generalization performance in text classification, named entity recognition and part of speech tagging tasks.
•
Extensive experiments on two datasets demonstrate that CSDNet significantly and consistently exceeds previous state-of-the-art methods across all representative subtasks.

Section snippets

Judicial decision prediction

JDP has achieved great progress [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. To elevate the result of charge prediction, Luo et al. [17] extract relevant law articles as legal assistance and uses a hierarchical structure to encode fact description and extracted law articles. Jiang et al. [1] leverage reinforcement learning to select rationales (short, readable, and decisive snippets) as the explanation of charge prediction. To correctly recognize few-shot charges (e.g., scalping

Problem definition

JDP aims to predict the judicial decisions according to the fact description of a given criminal case. It includes several subtasks, and the subtasks are not independent but related to each other. Each subtask can be defined as a text classification problem, with the same fact description as the input, but output different categories. Different subtasks have different numbers of categories. Suppose JDP contains ksubtasks $Y_{1}, Y_{2}, \dots, Y_{k}$ , the input of each subtask is the same fact description X, a

Methodology

In this section, we will introduce the proposed Commonalities-, Specificities- and Dependencies-Enhanced Multi-Task Learning Network in detail. Fig. 2 depicts the overview of CSDNet. Before we introduce the network, we first introduce the fundamental component that constitutes it, namely Q-LSTM.

Experiments

We take the commonalities, specificities, and dependencies into account and innovatively integrate them into a unified framework. For the sake of thoroughly demonstrating the superiority of CSDNet, we choose three classic subtasks of JDP, i.e., law article prediction, charge prediction, and term of penalty prediction, in line with previous researches.

Conclusion

In this paper, we propose a novel multi-task learning network, CSDNet, to jointly tackle multiple subtasks incorporating the properties, i.e., Commonalities, Specificities, and Dependencies. Particularly, while handling the Dependencies, we elaborate a learning module to learn contributions to varying degrees, a denoising module to eliminate noise interferences, and a reinforcing module to enhance each subtask. Notably, we are the first to unify multiple subtasks accompanied by the three

CRediT authorship contribution statement

Fanglong Yao: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Visualization. Xian Sun: Validation, Writing - original draft, Supervision. Hongfeng Yu: Formal analysis, Resources, Writing - review & editing. Wenkai Zhang: Data curation, Writing - review & editing. Kun Fu: Funding acquisition, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is supported by the National Science Fund for Distinguished Young Scholars of China under Grant #61725105.

Fanglong Yao received the B.Sc. degree from Inner Mongolia University, Hohhot, China, in 2017. He is currently pursuing the Ph.D. degree with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.

His research interests include deep learning, natural language processing and multi-modal learning.

References (43)

X. Jiang et al.
Interpretable rationale augmented charge prediction system
Z. Hu et al.
Few-shot charge prediction with discriminative legal attributes
C. He, L. Peng, Y. Le, J. He, Secaps: A sequence enhanced capsule model for charge prediction, arXiv preprint...
A.H.N. Tran, Applying deep neural network to retrieve relevant civil law articles, in: Proceedings of the Student...
H. Zhong et al.
Legal judgment prediction via topological learning
S. Hochreiter et al.
Long short-term memory
Neural Comput.
(1997)
F. Kort
Predicting supreme court decisions mathematically: a quantitative analysis of the ’right to counsel’ cases
Am. Political Sci. Rev.
(1957)
S.S. Ulmer
Quantitative analysis of judicial processes: Some practical and theoretical applications
Law Contemp. Probs.
(1963)
S.S. Nagel
Applying correlation analysis to case prediction
Tex. L. Rev.
(1963)
R. Keown
Mathematical models for legal prediction
Computer/lj
(1980)

J.A. Segal

Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

Am Political Sci Rev

(1984)

B.E. Lauderdale et al.

The supreme court’s many median justices

Am. Political Sci. Rev.

(2012)

C.-L. Liu, C.-D. Hsieh, Exploring phrase-based classification of judicial documents for criminal charges in chinese,...

W.-C. Lin, T.-T. Kuo, T.-J. Chang, C.-A. Yen, C.-J. Chen, S.-D. Lin, Exploiting machine learning models for chinese...

N. Aletras et al.

Predicting judicial decisions of the european court of human rights: A natural language processing perspective

PeerJ Computer Sci.

(2016)

O.-M. Şulea, M. Zampieri, M. Vela, J. van Genabith, Predicting the law area and decisions of french supreme court...

B. Luo et al.

Learning to predict charges for criminal cases with legal basis

H. Chen, D. Cai, W. Dai, Z. Dai, Y. Ding, Charge-based prison term prediction with deep gating network (08...

W. Yang, W. Jia, X. Zhou, Y. Luo, Legal judgment prediction via multi-perspective bi-feedback network, arXiv preprint...

O. Firat, K. Cho, Y. Bengio, Multi-way, multilingual neural machine translation with a shared attention mechanism, CoRR...

Y. Yang, T. Hospedales, Deep multi-task representation learning: A tensor factorisation...

Cited by (4)

Using Machine Learning to Predict Public Prosecution Judges Decisions in Moroccan Courts
2023, Procedia Computer Science
The use of Machine Learning in the field of justice aims to make a machine capable of understanding legal texts. In Morocco, The Public Prosecution Judges is responsible for representing the community and defending its rights before the courts, and ensuring that the basic interests are respected when the case is brough. Therefore, Public Prosecution judges undertake judicial and administrative tasks, they receive citizens, study their complaints and take appropriate decisions. In this work, we focus on citizens complaints. We have determined the decision-making process of prosecutors when dealing with citizents complaints. Next, we will establich a dataset of complaints and processing in order to extract important informations and characteristics that can determinat judge's decision. Later, different machine learning algorithms will be applied to the dataset and the final conclusion on the performances of these algorithms will be drawn.
MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords
2022, Knowledge-Based Systems
Citation Excerpt :
Inspired by the study [10], Xu et al. [22] designed a graph distillation operator combined with a graph neural network to distinguish confusing legal decisions. Another sub-stream of related work centers attention on extracting legal attributes or adding input information, so as to enhance the performance of LJP [8,13,15,23–25], such as fact description, defendant information, court opinions, discriminatory attributes and elements of law, etc. For instance, Xu et al. [8] suggested extracting crime keywords from fact descriptions and integrating them into LJP for multi-task learning.
Legal Judgment Prediction (LJP) aims to predict the judgment result based on the fact description of a criminal case, and is gradually becoming a hot research topic in the legal realm. Generally, a classic LJP contains three subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction. In real-world scenarios, both charge prediction and applicable law article prediction are actually the tasks of multi-class classification with multi-label learning. However, most existing studies model them as the problems of multi-class classification with single-label learning. Besides, they only consider the context of the fact description, and ignore the exploitation of effective keywords that are widely existed in abundant law articles. To fill the above gaps, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Specifically, the multi-view encoder is the core module of MVE-FLK, in this module, we devise a word and sentence encoder (WSE) with an attention mechanism to fuse legal keywords. And then, we develop a multi-view attention network to combine WSE with classic Transformer and DAN (Deep Averaging Network) for encoding the case from multiple views. After that, we propose a multi-task prediction module by developing a novel keywords fusing approach to enhance the performance of multi-task prediction. In addition, we devise a unique prediction principle for each subtask at a fine-grained level, which effectively improves the performance of subtasks. The experimental results on two real-life legal datasets show that our model yields significant prediction performance advantages over six competitive methods.
SLJP: Semantic Extraction based Legal Judgment Prediction
2023, arXiv
Optimization of the Economic and Trade Management Legal Model Based on the Support Vector Machine Algorithm and Logistic Regression Algorithm
2022, Mathematical Problems in Engineering

His research interests include deep learning, natural language processing and multi-modal learning.

Xian Sun received the B.Sc. degree from Beihang University, Beijing, China, in 2004, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2006 and 2009, respectively.

He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote-sensing image understanding.

Hongfeng Yu received the B.Sc. degree and M.Sc degree from Peking University, Beijing, China, in 2013 and 2016 respectively. He is currently a Research Assistant at the Institute of Electronics, Chinese Academy of Sciences.

His research interests include deep learning and natural language processing.

Wenkai Zhang received the B.Sc. degree from China University of Petroleum, Shandong, China, in 2013, and the Ph.D. degree from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2018. He is currently a Research Assistant at Aerospace Information Research Institute, Chinese Academy of Sciences.

His research interests include remote sensing image semantic segmentation and multi-media information processing.

Kun Fu received the B.Sc., M.Sc., and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1995, 1999, and 2002, respectively.

He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, remote sensing image understanding, geospatial data mining, and visualization.

View full text

Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction

Abstract

Introduction

Section snippets

Judicial decision prediction

Problem definition

Methodology

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Interpretable rationale augmented charge prediction system

Few-shot charge prediction with discriminative legal attributes

Legal judgment prediction via topological learning

Long short-term memory

Neural Comput.

Predicting supreme court decisions mathematically: a quantitative analysis of the ’right to counsel’ cases

Am. Political Sci. Rev.

Quantitative analysis of judicial processes: Some practical and theoretical applications

Law Contemp. Probs.

Applying correlation analysis to case prediction

Tex. L. Rev.

Mathematical models for legal prediction

Computer/lj

Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

Am Political Sci Rev

The supreme court’s many median justices

Am. Political Sci. Rev.

Predicting judicial decisions of the european court of human rights: A natural language processing perspective

PeerJ Computer Sci.

Learning to predict charges for criminal cases with legal basis