MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords
Introduction
In recent years, with the opening of high-quality legal texts, a wide range of novel technologies have extensively been applied to various tasks of legal text processing, such as evidence extraction [1], [2], legal judgment prediction through machine learning algorithms [3], [4] and deep neural networks [5], [6], [7], [8], etc. The application of these technologies, such as the generation of legal abstracts and the prediction of judgment results [2], [9], can not only fulfill a large number of repetitive tasks in a short time, but also improve work efficiencies of the judicial department. Therefore, designing an accurate and practical Legal Judgment Prediction (LJP) system by utilizing novel technologies has gradually become one of the hottest topics in the realm of law.
In general, a classic LJP contains multiple subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, denoted as Task 1, Task 2 and Task 3, respectively [5], [7], [10]. For instance, Fig. 1 illustrates a typical scenario of LJP, it automatically extracts criminal characteristics from a fact description, charge description, and penalty description, and then finds out the law article, charge and punishment from the law articles following these corresponding characteristics. Thus, modeling LJP is essentially a multi-task problem, and the analysis results of LJP can be treated as serviceable references, which could effectively reduce the workload of lawyers and judges. Also, this paper offers a study on analyzing legal texts, and aims to design a novel prediction model to fulfill three subtasks in LJP.
In fact, due to the complexity of the judicial trial, designing an effective LJP system is not easy, and has growingly attracted attention in academia. Initially, most of the early studies [11], [12] focused on exploiting existing mathematical and statistical algorithms to analyze legal cases in specific scenarios. Because these methods often rely heavily on expert experience and manual annotation, they are time-consuming and laborious. As a result, they are gradually replaced by deep neural network models [5], [6], [7], [8] recently. For example, inspired by the thinking of human judges, Zhong et al. [7] implemented a neural network based on the topological structure for judgment prediction. Xu et al. [8] proposed a multi-task learning model of legal judgment prediction integrating with charge keywords. Actually, few works pay attention to the task of multi-class classification with multi-label learning, and take full advantage of known legal information effectively.
In summary, the existing studies of LJP still face three major challenges, which motivate us to build a novel model based on deep neural networks for legal judgment prediction in legal texts.
Firstly, since effective information for the multi-class classification task for multi-label scenario is difficult to be extracted, most existing studies only simply define all the subtasks of LJP as multi-class classification problems for single-label scenario. In practice, multi-class classification problems for multi-label scenario are universal in these subtasks. Generally speaking, for the multi-class classification problems, the label categories for both single-label and multi-label are multi-class. Moreover, the multi-class classification problems for single-label scenario mean that each sample has only one label, and the problems of multi-class classification with multi-label learning mean that each sample has multiple labels. Take the subtask of charge prediction as an example, given a fact description, a defendant may be accused with several charges, which could be defined as a task of multi-label. Meanwhile, charge prediction aims to select a charge from many known charges, which could be defined as a task of multi-class classification. So it is natural that charge prediction can be modeled as the problem of multi-class classification with multi-label learning. In addition, some charges are easy to be confused in practice. For example, for the “crime of illegal felling of trees” and “crime of illegal denudation”, the key difference between the two charges is “theft” or “denudation”. Besides, it is obvious that the prediction of multiple charges is more complex than that of a single charge (i.e., single label and multi-class classification), and discriminating confusing information will effectively improve the accuracy and interpretability of charge prediction. Nevertheless, most of the previous studies [5], [13], [14] ignore the above problems, and so for that reason how to extract effective information from the lengthy legal fact description and distinguish the confusing information remains to be explored.
Secondly, taking full advantage of existing law articles is critical to the task of legal judgment prediction. However, the majority of existing studies [7], [10], [15] do not take full advantage of the law articles, and only use the fact description to get the prediction results. Actually, in a real judgment scenario, human judges generally take both the facts of the case and the existing statutory provisions into account when they work on the case in the civil law countries, such as China, France, and Germany. Especially in the subtask of term of penalty prediction, the applicable term range of a crime has been specified in the law articles. For instance, as shown in Fig. 1, operating gambling houses shall be sentenced to fixed-term imprisonment of not more than three years. Thus, it is quite clear that incorporating the information from law articles will doubtless improve the effect of term of penalty prediction.
Thirdly, to better understand the natural language for most research or industry NLP tasks, it is a trend to design a joint encoder that takes into account the advantages of various encoders. For example, Cer et al. [16] combined the Transformer encoder with DAN [17], and introduced a universal sentence encoder that specifically targets transfer learning to other Natural Language Processing (NLP) tasks. Despite its effectiveness, we argue that these tasks are simply concatenating the encoding vectors of various encoders. In fact, the same encoders should have different informativeness for encoding different information of cases in LJP. For example, for a case with long text, Transformer plays a more important role in legal judgment prediction than DAN, because it completely relies on the attention mechanism to model the global dependence of input and output without considering their distance in the sequence. Therefore, how to design a novel joint encoder with an attention mechanism (each type of encoder corresponds to a view) for legal judgment prediction remains to be explored.
To address the above challenges, this paper proposes a novel multi-task learning framework to tackle multiple subtasks in LJP, called Multi-View Encoder Fusing Legal Keywords (MVE-FLK). Specifically, the multi-view encoder is the core module of MVE-FLK, which incorporates our proposed word and sentence encoder (WSE), Transformer and DAN with an attention neural network. It is used to effectively encode the case from multiple views via exploiting fact description and keywords of concept from cases. Based on the output of multi-view encoder, a multi-task prediction module is proposed to jointly model multiple subtasks in LJP, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, via designing a jointly learning framework. In summary, the threefold contributions of this paper are summarized as follows:
- •
We propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Different from previous methods which only simply define all the subtasks as the problems of multi-class classification with single-label learning, and adopt the same boundaries of classification for them. We model the subtasks of applicable law article prediction and charge prediction as the problems of multi-class classification with multi-label learning, and design unique prediction principles for each subtask in MVE-FLK at a fine-grained level.
- •
We design a word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles. Unlike the previous encoders which simply combine some classic encoders and neglect legal keywords, to effectively encode the case from multiple views of encoders, we combine WSE with classic Transformer and DAN models by devising an attentive multi-view neural network. The approach can effectively select the important and informative encoding of words and sentences with an attention mechanism.
- •
To exploit effective information from existing law articles, we design a novel keywords fusing framework to enhance the performance of MVE-FLK. For the keywords extracted from abundant law articles by utilizing some technologies of NLP, we design deep neural networks to learn deep information and fuse them with other modules in MVE-FLK.
Section snippets
Related work
In this section, we survey the relevant literature in three streams of work: “legal judgment prediction”, “word and sentence encoder” and “multi-task learning”.
Problem formulation
In this section, some fundamental definitions in this work and a formulation for LJP are introduced as follows.
Definition 1 Fact Description Fact description is the factual description of a case, including time of crime, place of crime, result of crime, guilty tools, and judgment made by court or judge [7].
We use Jieba1 to segment words and remove stop words for fact description of cases. As a result, each fact description can be described as a word sequence, i.e., , where is
Methodology
To realize the task of LJP, we propose a novel multi-task framework via multi-view encoder fusing legal keywords, named MEV-FLK, as shown in Fig. 2. Specifically, MEV-FLK mainly contains two core modules, namely, multi-view encoder and multi-task prediction. In this section, we will describe the details of these modules in MVE-FLK, respectively.
Experiments
To demonstrate the effectiveness of our MVE-FLK model, we conduct a series of experiments on a real-life legal dataset called CAIL2018 [54], and select three representative subtasks (i.e., law article prediction, charge prediction, and term of penalty prediction) for comparison.
Conclusion
In this paper, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to tackle multiple subtasks in LJP. The major novelties of the MVE-FLK lie in devising the word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles, and further proposing a multi-view encoder incorporating our proposed WSE, Transformer and DAN with an attention neural network. Moreover, a multi-task
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 92046026, Grant 72172057, Grant 71701089, Grant 61662028, Grant 71901115, and the International Innovation Cooperation Project of Jiangsu Province, China , under Grant BZ2020008.
References (59)
- et al.
An end-to-end joint model for evidence information extraction from court record document
Inf. Process. Manage.
(2020) - et al.
A deep neural network model for speakers coreference resolution in legal texts
Inf. Process. Manage.
(2020) - et al.
A novel random forest approach for imbalance problem in crime linkage
Knowl.-Based Syst.
(2020) - et al.
A decision support system for detecting serial crimes
Knowl.-Based Syst.
(2017) - et al.
Gated hierarchical multi-task learning network for judicial decision prediction
Neurocomputing
(2020) - et al.
Evaluating the credit risk of SMEs using legal judgments
Decis. Support Syst.
(2020) - et al.
Predicting associated statutes for legal problems
Inf. Process. Manage.
(2015) - et al.
Online purchase decisions for tourism e-commerce
Electron. Commer. Res. Appl.
(2019) - et al.
Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction
Neurocomputing
(2021) - et al.
A window-based self-attention approach for sentence encoding
Neurocomputing
(2020)
HieNN-DWE: A hierarchical neural network with dynamic word embeddings for document level sentiment classification
Neurocomputing
Neural attentive travel package recommendation via exploiting long-term and short-term behaviors
Knowl.-Based Syst.
A survey on multi-task learning
IEEE Trans. Knowl. Data Eng.
Feature assisted stacked attentive shortest dependency path based bi-LSTM model for protein–protein interaction
Knowl.-Based Syst.
Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981
Am. Political Sci. Rev.
The supreme court’s many median justices
Am. Political Sci. Rev.
A two-phase sentiment analysis approach for judgement prediction
J. Inf. Sci.
Cited by (10)
Legal Judgment Prediction via graph boosting with constraints
2024, Information Processing and ManagementUsing deep learning to predict matched signal-to-noise ratio of gravitational waves
2024, Physical Review DLegal judgment prediction via optimized multi-task learning fusing similarity correlation
2023, Applied IntelligenceA Knowledge-Based System For Smart Sustainable Groundwater Facility Management
2023, Research Square