MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords

doi:10.1016/j.knosys.2021.107960

Knowledge-Based Systems

Volume 239, 5 March 2022, 107960

https://doi.org/10.1016/j.knosys.2021.107960 Get rights and content

Highlights

•
Taking full advantage of existing law articles is critical to the task of legal judgment prediction.
•
A novel multi-task legal judgment prediction framework via multi-view encoder fusing legal (MVE-FLK) is proposed for legal judgment prediction (LJP).
•
MVE-FLK is appropriate for the task of multi-class classification with multi-label learning.
•
The experimental results verify the effectiveness of MVE-FLK for the task of LJP.

Abstract

Legal Judgment Prediction (LJP) aims to predict the judgment result based on the fact description of a criminal case, and is gradually becoming a hot research topic in the legal realm. Generally, a classic LJP contains three subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction. In real-world scenarios, both charge prediction and applicable law article prediction are actually the tasks of multi-class classification with multi-label learning. However, most existing studies model them as the problems of multi-class classification with single-label learning. Besides, they only consider the context of the fact description, and ignore the exploitation of effective keywords that are widely existed in abundant law articles. To fill the above gaps, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Specifically, the multi-view encoder is the core module of MVE-FLK, in this module, we devise a word and sentence encoder (WSE) with an attention mechanism to fuse legal keywords. And then, we develop a multi-view attention network to combine WSE with classic Transformer and DAN (Deep Averaging Network) for encoding the case from multiple views. After that, we propose a multi-task prediction module by developing a novel keywords fusing approach to enhance the performance of multi-task prediction. In addition, we devise a unique prediction principle for each subtask at a fine-grained level, which effectively improves the performance of subtasks. The experimental results on two real-life legal datasets show that our model yields significant prediction performance advantages over six competitive methods.

Introduction

In recent years, with the opening of high-quality legal texts, a wide range of novel technologies have extensively been applied to various tasks of legal text processing, such as evidence extraction [1], [2], legal judgment prediction through machine learning algorithms [3], [4] and deep neural networks [5], [6], [7], [8], etc. The application of these technologies, such as the generation of legal abstracts and the prediction of judgment results [2], [9], can not only fulfill a large number of repetitive tasks in a short time, but also improve work efficiencies of the judicial department. Therefore, designing an accurate and practical Legal Judgment Prediction (LJP) system by utilizing novel technologies has gradually become one of the hottest topics in the realm of law.

In general, a classic LJP contains multiple subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, denoted as Task 1, Task 2 and Task 3, respectively [5], [7], [10]. For instance, Fig. 1 illustrates a typical scenario of LJP, it automatically extracts criminal characteristics from a fact description, charge description, and penalty description, and then finds out the law article, charge and punishment from the law articles following these corresponding characteristics. Thus, modeling LJP is essentially a multi-task problem, and the analysis results of LJP can be treated as serviceable references, which could effectively reduce the workload of lawyers and judges. Also, this paper offers a study on analyzing legal texts, and aims to design a novel prediction model to fulfill three subtasks in LJP.

In fact, due to the complexity of the judicial trial, designing an effective LJP system is not easy, and has growingly attracted attention in academia. Initially, most of the early studies [11], [12] focused on exploiting existing mathematical and statistical algorithms to analyze legal cases in specific scenarios. Because these methods often rely heavily on expert experience and manual annotation, they are time-consuming and laborious. As a result, they are gradually replaced by deep neural network models [5], [6], [7], [8] recently. For example, inspired by the thinking of human judges, Zhong et al. [7] implemented a neural network based on the topological structure for judgment prediction. Xu et al. [8] proposed a multi-task learning model of legal judgment prediction integrating with charge keywords. Actually, few works pay attention to the task of multi-class classification with multi-label learning, and take full advantage of known legal information effectively.

In summary, the existing studies of LJP still face three major challenges, which motivate us to build a novel model based on deep neural networks for legal judgment prediction in legal texts.

Firstly, since effective information for the multi-class classification task for multi-label scenario is difficult to be extracted, most existing studies only simply define all the subtasks of LJP as multi-class classification problems for single-label scenario. In practice, multi-class classification problems for multi-label scenario are universal in these subtasks. Generally speaking, for the multi-class classification problems, the label categories for both single-label and multi-label are multi-class. Moreover, the multi-class classification problems for single-label scenario mean that each sample has only one label, and the problems of multi-class classification with multi-label learning mean that each sample has multiple labels. Take the subtask of charge prediction as an example, given a fact description, a defendant may be accused with several charges, which could be defined as a task of multi-label. Meanwhile, charge prediction aims to select a charge from many known charges, which could be defined as a task of multi-class classification. So it is natural that charge prediction can be modeled as the problem of multi-class classification with multi-label learning. In addition, some charges are easy to be confused in practice. For example, for the “crime of illegal felling of trees” and “crime of illegal denudation”, the key difference between the two charges is “theft” or “denudation”. Besides, it is obvious that the prediction of multiple charges is more complex than that of a single charge (i.e., single label and multi-class classification), and discriminating confusing information will effectively improve the accuracy and interpretability of charge prediction. Nevertheless, most of the previous studies [5], [13], [14] ignore the above problems, and so for that reason how to extract effective information from the lengthy legal fact description and distinguish the confusing information remains to be explored.

Secondly, taking full advantage of existing law articles is critical to the task of legal judgment prediction. However, the majority of existing studies [7], [10], [15] do not take full advantage of the law articles, and only use the fact description to get the prediction results. Actually, in a real judgment scenario, human judges generally take both the facts of the case and the existing statutory provisions into account when they work on the case in the civil law countries, such as China, France, and Germany. Especially in the subtask of term of penalty prediction, the applicable term range of a crime has been specified in the law articles. For instance, as shown in Fig. 1, operating gambling houses shall be sentenced to fixed-term imprisonment of not more than three years. Thus, it is quite clear that incorporating the information from law articles will doubtless improve the effect of term of penalty prediction.

Thirdly, to better understand the natural language for most research or industry NLP tasks, it is a trend to design a joint encoder that takes into account the advantages of various encoders. For example, Cer et al. [16] combined the Transformer encoder with DAN [17], and introduced a universal sentence encoder that specifically targets transfer learning to other Natural Language Processing (NLP) tasks. Despite its effectiveness, we argue that these tasks are simply concatenating the encoding vectors of various encoders. In fact, the same encoders should have different informativeness for encoding different information of cases in LJP. For example, for a case with long text, Transformer plays a more important role in legal judgment prediction than DAN, because it completely relies on the attention mechanism to model the global dependence of input and output without considering their distance in the sequence. Therefore, how to design a novel joint encoder with an attention mechanism (each type of encoder corresponds to a view) for legal judgment prediction remains to be explored.

To address the above challenges, this paper proposes a novel multi-task learning framework to tackle multiple subtasks in LJP, called Multi-View Encoder Fusing Legal Keywords (MVE-FLK). Specifically, the multi-view encoder is the core module of MVE-FLK, which incorporates our proposed word and sentence encoder (WSE), Transformer and DAN with an attention neural network. It is used to effectively encode the case from multiple views via exploiting fact description and keywords of concept from cases. Based on the output of multi-view encoder, a multi-task prediction module is proposed to jointly model multiple subtasks in LJP, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, via designing a jointly learning framework. In summary, the threefold contributions of this paper are summarized as follows:

•
We propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Different from previous methods which only simply define all the subtasks as the problems of multi-class classification with single-label learning, and adopt the same boundaries of classification for them. We model the subtasks of applicable law article prediction and charge prediction as the problems of multi-class classification with multi-label learning, and design unique prediction principles for each subtask in MVE-FLK at a fine-grained level.
•
We design a word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles. Unlike the previous encoders which simply combine some classic encoders and neglect legal keywords, to effectively encode the case from multiple views of encoders, we combine WSE with classic Transformer and DAN models by devising an attentive multi-view neural network. The approach can effectively select the important and informative encoding of words and sentences with an attention mechanism.
•
To exploit effective information from existing law articles, we design a novel keywords fusing framework to enhance the performance of MVE-FLK. For the keywords extracted from abundant law articles by utilizing some technologies of NLP, we design deep neural networks to learn deep information and fuse them with other modules in MVE-FLK.

Section snippets

Related work

In this section, we survey the relevant literature in three streams of work: “legal judgment prediction”, “word and sentence encoder” and “multi-task learning”.

Problem formulation

In this section, some fundamental definitions in this work and a formulation for LJP are introduced as follows.

Definition 1 Fact Description

Fact description is the factual description of a case, including time of crime, place of crime, result of crime, guilty tools, and judgment made by court or judge [7].

We use Jieba¹ to segment words and remove stop words for fact description of cases. As a result, each fact description can be described as a word sequence, i.e., $S = {s_{1}, s_{2}, \dots, s_{n}}$ , where $n$ is

Methodology

To realize the task of LJP, we propose a novel multi-task framework via multi-view encoder fusing legal keywords, named MEV-FLK, as shown in Fig. 2. Specifically, MEV-FLK mainly contains two core modules, namely, multi-view encoder and multi-task prediction. In this section, we will describe the details of these modules in MVE-FLK, respectively.

Experiments

To demonstrate the effectiveness of our MVE-FLK model, we conduct a series of experiments on a real-life legal dataset called CAIL2018 [54], and select three representative subtasks (i.e., law article prediction, charge prediction, and term of penalty prediction) for comparison.

Conclusion

In this paper, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to tackle multiple subtasks in LJP. The major novelties of the MVE-FLK lie in devising the word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles, and further proposing a multi-view encoder incorporating our proposed WSE, Transformer and DAN with an attention neural network. Moreover, a multi-task

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 92046026, Grant 72172057, Grant 71701089, Grant 61662028, Grant 71901115, and the International Innovation Cooperation Project of Jiangsu Province, China , under Grant BZ2020008.

References (59)

JiD. et al.
An end-to-end joint model for evidence information extraction from court record document
Inf. Process. Manage.
(2020)
JiD. et al.
A deep neural network model for speakers coreference resolution in legal texts
Inf. Process. Manage.
(2020)
LiY.-S. et al.
A novel random forest approach for imbalance problem in crime linkage
Knowl.-Based Syst.
(2020)
ChiH. et al.
A decision support system for detecting serial crimes
Knowl.-Based Syst.
(2017)
YaoF. et al.
Gated hierarchical multi-task learning network for judicial decision prediction
Neurocomputing
(2020)
YinC. et al.
Evaluating the credit risk of SMEs using legal judgments
Decis. Support Syst.
(2020)
LiuY.-H. et al.
Predicting associated statutes for legal problems
Inf. Process. Manage.
(2015)
ZhuG. et al.
Online purchase decisions for tourism e-commerce
Electron. Commer. Res. Appl.
(2019)
YaoF. et al.
Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction
Neurocomputing
(2021)
HuangT. et al.
A window-based self-attention approach for sentence encoding
Neurocomputing
(2020)

LiuF. et al.

HieNN-DWE: A hierarchical neural network with dynamic word embeddings for document level sentiment classification

Neurocomputing

(2020)

ZhuG. et al.

Neural attentive travel package recommendation via exploiting long-term and short-term behaviors

Knowl.-Based Syst.

(2021)

ZhangY. et al.

A survey on multi-task learning

IEEE Trans. Knowl. Data Eng.

(2021)

YadavS. et al.

Feature assisted stacked attentive shortest dependency path based bi-LSTM model for protein–protein interaction

Knowl.-Based Syst.

(2019)

B. Luo, Y. Feng, J. Xu, X. Zhang, D. Zhao, Learning to predict charges for criminal cases with legal basis, in:...

H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, M. Sun, Legal judgment prediction via topological learning, in: Proceedings...

Z. Xu, X. Li, Y. Li, Z. Wang, Y. Fanxu, X. Lai, Multi-task legal judgement prediction combining a subtask of the...

W. Yang, W. Jia, X. Zhou, Y. Luo, Legal judgment prediction via multi-perspective bi-feedback network, in: Proceedings...

SegalJ.A.

Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

Am. Political Sci. Rev.

(1984)

LauderdaleB.E. et al.

The supreme court’s many median justices

Am. Political Sci. Rev.

(2012)

Z. Hu, X. Li, C. Tu, Z. Liu, M. Sun, Few-shot charge prediction with discriminative legal attributes, in: Proceedings...

P. Wang, Y. Fan, S. Niu, Z. Yang, Y. Zhang, J. Guo, Hierarchical matching network for crime classification, in:...

H. Chen, D. Cai, W. Dai, Z. Dai, Y. Ding, Charge-based prison term prediction with deep gating network, in: Proceedings...

D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant, M. Guajardo-Céspedes, S. Yuan, C. Tar, et al....

M. Iyyer, V. Manjunatha, J. Boyd-Graber, H. Daumé III, Deep unordered composition rivals syntactic methods for text...

W.-C. Lin, T.-T. Kuo, T.-J. Chang, C.-A. Yen, C.-J. Chen, S.-d. Lin, Exploiting machine learning models for chinese...

LiuY.-H. et al.

A two-phase sentiment analysis approach for judgement prediction

J. Inf. Sci.

(2018)

N. Xu, P. Wang, L. Chen, L. Pan, X. Wang, J. Zhao, Distinguish confusing law articles for legal judgment prediction,...

S. Li, B. Liu, L. Ye, H. Zhang, B. Fang, Element-aware legal judgment prediction for criminal cases with confusing...

Cited by (10)

Legal Judgment Prediction via graph boosting with constraints
2024, Information Processing and Management
Legal Judgment Prediction (LJP) is a multi-task multi-label problem in the civil law system, involving the prediction of law articles, charges, and terms of penalty based on fact descriptions. However, most existing research approaches LJP as a single-label scenario, neglecting the correlations between multiple labels and failing to consider cross-task consistency constraints in a multi-label scenario. Moreover, although previous multi-task studies have proposed expert models and coarse-grained topology construction for inter-task relationships, the former neglects rich information exchange among different tasks, and the latter, if one task’s prediction is inaccurate, will affect subsequent tasks. This paper has designed legal label graphs and proposed a novel graph boosting with constraints framework, GJudge, for legal judgment prediction to address these limitations. The framework comprises a multi-perspective interactive encoder and a multi-graph attention consistency expert module. The encoder utilizes bidirectional LSTM, gated attention units, cross attention, and graph attention networks to integrate fact descriptions and label similarity relationships information from legal label graphs for multi-perspective interactive encoding. The expert module utilizes the multiple expert networks and the multi-graph attention network to differentiate between confusing labels and ensure consistent constraints across tasks, this is achieved through the fusion of label consistency constraints and confusion relationships information in the legal label graphs. Experimental results on two real-world datasets across different tasks show an improvement in F1 scores ranging from at least 0.93% to a maximum of 2.97%, illustrating the effectiveness of GJudge compared to the state-of-the-art model.
GAA-PPO: A novel graph adversarial attack method by incorporating proximal policy optimization
2023, Neurocomputing
The Graph Convolutional Network (GCN) has demonstrated impressive performance in processing graph structured data. However recent studies have revealed that GCN is vulnerable to adversarial attacks, where a small amount of data modification can significantly affect the performance of the GCN models. While most existing studies node injection attacks with graph reinforcement learning by considering gradient information, they still suffer from the problems that the step size of the policy gradient is difficult to determine, and the attack effect needs to be further improved. In light of the above issues, this paper proposes a Graph Adversarial Attack method by incorporating Proximal Policy Optimization named GAA-PPO, which fills subtasks of sequentially generating features and links for injected nodes without modifying existing nodes or edges. GAA-PPO comprises two main components: node injection attack network (actor network) and value prediction network (critic network). Specifically, the actor network leverages a node generator and an edge sampler to generate appropriate features and edges for the injected nodes. Notably, a novel edge sampler that incorporates Approximation Personalized Propagation of Neural Prediction (APPNP) is introduced to effectively propagate malicious features of the injected nodes. On the other hand, the critic network evaluates the performance of the perturbed graph at each stage. To enhance the stability of the algorithm, GAA-PPO employs the importance sampling technique of Proximal Policy Optimization (PPO) during the training process. Extensive experiments on three publicly benchmark datasets show that GAA-PPO yields significant performance advantages over the state-of-the-art method.
Using deep learning to predict matched signal-to-noise ratio of gravitational waves
2024, Physical Review D
Legal judgment prediction via optimized multi-task learning fusing similarity correlation
2023, Applied Intelligence
Matched filtering for gravitational wave detection without template bank driven by deep learning template prediction model bank
2023, arXiv
A Knowledge-Based System For Smart Sustainable Groundwater Facility Management
2023, Research Square

View all citing articles on Scopus

View full text

MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords

Highlights

Abstract

Introduction

Section snippets

Related work

Problem formulation

Methodology

Experiments

Conclusion

Declaration of Competing Interest

Acknowledgments

Inf. Process. Manage.

Inf. Process. Manage.

Knowl.-Based Syst.

Knowl.-Based Syst.

Neurocomputing

Decis. Support Syst.

Inf. Process. Manage.

Electron. Commer. Res. Appl.

Neurocomputing

Neurocomputing

Neurocomputing

Knowl.-Based Syst.

IEEE Trans. Knowl. Data Eng.

Knowl.-Based Syst.

Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

Am. Political Sci. Rev.

The supreme court’s many median justices

Am. Political Sci. Rev.

A two-phase sentiment analysis approach for judgement prediction

J. Inf. Sci.