Elsevier

Knowledge-Based Systems

Volume 239, 5 March 2022, 107960
Knowledge-Based Systems

MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords

https://doi.org/10.1016/j.knosys.2021.107960Get rights and content

Highlights

  • Taking full advantage of existing law articles is critical to the task of legal judgment prediction.

  • A novel multi-task legal judgment prediction framework via multi-view encoder fusing legal (MVE-FLK) is proposed for legal judgment prediction (LJP).

  • MVE-FLK is appropriate for the task of multi-class classification with multi-label learning.

  • The experimental results verify the effectiveness of MVE-FLK for the task of LJP.

Abstract

Legal Judgment Prediction (LJP) aims to predict the judgment result based on the fact description of a criminal case, and is gradually becoming a hot research topic in the legal realm. Generally, a classic LJP contains three subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction. In real-world scenarios, both charge prediction and applicable law article prediction are actually the tasks of multi-class classification with multi-label learning. However, most existing studies model them as the problems of multi-class classification with single-label learning. Besides, they only consider the context of the fact description, and ignore the exploitation of effective keywords that are widely existed in abundant law articles. To fill the above gaps, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Specifically, the multi-view encoder is the core module of MVE-FLK, in this module, we devise a word and sentence encoder (WSE) with an attention mechanism to fuse legal keywords. And then, we develop a multi-view attention network to combine WSE with classic Transformer and DAN (Deep Averaging Network) for encoding the case from multiple views. After that, we propose a multi-task prediction module by developing a novel keywords fusing approach to enhance the performance of multi-task prediction. In addition, we devise a unique prediction principle for each subtask at a fine-grained level, which effectively improves the performance of subtasks. The experimental results on two real-life legal datasets show that our model yields significant prediction performance advantages over six competitive methods.

Introduction

In recent years, with the opening of high-quality legal texts, a wide range of novel technologies have extensively been applied to various tasks of legal text processing, such as evidence extraction [1], [2], legal judgment prediction through machine learning algorithms [3], [4] and deep neural networks [5], [6], [7], [8], etc. The application of these technologies, such as the generation of legal abstracts and the prediction of judgment results [2], [9], can not only fulfill a large number of repetitive tasks in a short time, but also improve work efficiencies of the judicial department. Therefore, designing an accurate and practical Legal Judgment Prediction (LJP) system by utilizing novel technologies has gradually become one of the hottest topics in the realm of law.

In general, a classic LJP contains multiple subtasks, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, denoted as Task 1, Task 2 and Task 3, respectively [5], [7], [10]. For instance, Fig. 1 illustrates a typical scenario of LJP, it automatically extracts criminal characteristics from a fact description, charge description, and penalty description, and then finds out the law article, charge and punishment from the law articles following these corresponding characteristics. Thus, modeling LJP is essentially a multi-task problem, and the analysis results of LJP can be treated as serviceable references, which could effectively reduce the workload of lawyers and judges. Also, this paper offers a study on analyzing legal texts, and aims to design a novel prediction model to fulfill three subtasks in LJP.

In fact, due to the complexity of the judicial trial, designing an effective LJP system is not easy, and has growingly attracted attention in academia. Initially, most of the early studies [11], [12] focused on exploiting existing mathematical and statistical algorithms to analyze legal cases in specific scenarios. Because these methods often rely heavily on expert experience and manual annotation, they are time-consuming and laborious. As a result, they are gradually replaced by deep neural network models [5], [6], [7], [8] recently. For example, inspired by the thinking of human judges, Zhong et al. [7] implemented a neural network based on the topological structure for judgment prediction. Xu et al. [8] proposed a multi-task learning model of legal judgment prediction integrating with charge keywords. Actually, few works pay attention to the task of multi-class classification with multi-label learning, and take full advantage of known legal information effectively.

In summary, the existing studies of LJP still face three major challenges, which motivate us to build a novel model based on deep neural networks for legal judgment prediction in legal texts.

Firstly, since effective information for the multi-class classification task for multi-label scenario is difficult to be extracted, most existing studies only simply define all the subtasks of LJP as multi-class classification problems for single-label scenario. In practice, multi-class classification problems for multi-label scenario are universal in these subtasks. Generally speaking, for the multi-class classification problems, the label categories for both single-label and multi-label are multi-class. Moreover, the multi-class classification problems for single-label scenario mean that each sample has only one label, and the problems of multi-class classification with multi-label learning mean that each sample has multiple labels. Take the subtask of charge prediction as an example, given a fact description, a defendant may be accused with several charges, which could be defined as a task of multi-label. Meanwhile, charge prediction aims to select a charge from many known charges, which could be defined as a task of multi-class classification. So it is natural that charge prediction can be modeled as the problem of multi-class classification with multi-label learning. In addition, some charges are easy to be confused in practice. For example, for the “crime of illegal felling of trees” and “crime of illegal denudation”, the key difference between the two charges is “theft” or “denudation”. Besides, it is obvious that the prediction of multiple charges is more complex than that of a single charge (i.e., single label and multi-class classification), and discriminating confusing information will effectively improve the accuracy and interpretability of charge prediction. Nevertheless, most of the previous studies [5], [13], [14] ignore the above problems, and so for that reason how to extract effective information from the lengthy legal fact description and distinguish the confusing information remains to be explored.

Secondly, taking full advantage of existing law articles is critical to the task of legal judgment prediction. However, the majority of existing studies [7], [10], [15] do not take full advantage of the law articles, and only use the fact description to get the prediction results. Actually, in a real judgment scenario, human judges generally take both the facts of the case and the existing statutory provisions into account when they work on the case in the civil law countries, such as China, France, and Germany. Especially in the subtask of term of penalty prediction, the applicable term range of a crime has been specified in the law articles. For instance, as shown in Fig. 1, operating gambling houses shall be sentenced to fixed-term imprisonment of not more than three years. Thus, it is quite clear that incorporating the information from law articles will doubtless improve the effect of term of penalty prediction.

Thirdly, to better understand the natural language for most research or industry NLP tasks, it is a trend to design a joint encoder that takes into account the advantages of various encoders. For example, Cer et al. [16] combined the Transformer encoder with DAN [17], and introduced a universal sentence encoder that specifically targets transfer learning to other Natural Language Processing (NLP) tasks. Despite its effectiveness, we argue that these tasks are simply concatenating the encoding vectors of various encoders. In fact, the same encoders should have different informativeness for encoding different information of cases in LJP. For example, for a case with long text, Transformer plays a more important role in legal judgment prediction than DAN, because it completely relies on the attention mechanism to model the global dependence of input and output without considering their distance in the sequence. Therefore, how to design a novel joint encoder with an attention mechanism (each type of encoder corresponds to a view) for legal judgment prediction remains to be explored.

To address the above challenges, this paper proposes a novel multi-task learning framework to tackle multiple subtasks in LJP, called Multi-View Encoder Fusing Legal Keywords (MVE-FLK). Specifically, the multi-view encoder is the core module of MVE-FLK, which incorporates our proposed word and sentence encoder (WSE), Transformer and DAN with an attention neural network. It is used to effectively encode the case from multiple views via exploiting fact description and keywords of concept from cases. Based on the output of multi-view encoder, a multi-task prediction module is proposed to jointly model multiple subtasks in LJP, i.e., applicable law article prediction, charge prediction, and term of penalty prediction, via designing a jointly learning framework. In summary, the threefold contributions of this paper are summarized as follows:

  • We propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to jointly model multiple subtasks in LJP. Different from previous methods which only simply define all the subtasks as the problems of multi-class classification with single-label learning, and adopt the same boundaries of classification for them. We model the subtasks of applicable law article prediction and charge prediction as the problems of multi-class classification with multi-label learning, and design unique prediction principles for each subtask in MVE-FLK at a fine-grained level.

  • We design a word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles. Unlike the previous encoders which simply combine some classic encoders and neglect legal keywords, to effectively encode the case from multiple views of encoders, we combine WSE with classic Transformer and DAN models by devising an attentive multi-view neural network. The approach can effectively select the important and informative encoding of words and sentences with an attention mechanism.

  • To exploit effective information from existing law articles, we design a novel keywords fusing framework to enhance the performance of MVE-FLK. For the keywords extracted from abundant law articles by utilizing some technologies of NLP, we design deep neural networks to learn deep information and fuse them with other modules in MVE-FLK.

Section snippets

Related work

In this section, we survey the relevant literature in three streams of work: “legal judgment prediction”, “word and sentence encoder” and “multi-task learning”.

Problem formulation

In this section, some fundamental definitions in this work and a formulation for LJP are introduced as follows.

Definition 1 Fact Description

Fact description is the factual description of a case, including time of crime, place of crime, result of crime, guilty tools, and judgment made by court or judge  [7].

We use Jieba1 to segment words and remove stop words for fact description of cases. As a result, each fact description can be described as a word sequence, i.e., S={s1,s2,,sn}, where n is

Methodology

To realize the task of LJP, we propose a novel multi-task framework via multi-view encoder fusing legal keywords, named MEV-FLK, as shown in Fig. 2. Specifically, MEV-FLK mainly contains two core modules, namely, multi-view encoder and multi-task prediction. In this section, we will describe the details of these modules in MVE-FLK, respectively.

Experiments

To demonstrate the effectiveness of our MVE-FLK model, we conduct a series of experiments on a real-life legal dataset called CAIL2018 [54], and select three representative subtasks (i.e., law article prediction, charge prediction, and term of penalty prediction) for comparison.

Conclusion

In this paper, we propose a novel multi-task legal judgment prediction framework via multi-view encoder fusing legal keywords, named MVE-FLK, to tackle multiple subtasks in LJP. The major novelties of the MVE-FLK lie in devising the word and sentence encoder (WSE) with an attention mechanism to fuse keywords from existing law articles, and further proposing a multi-view encoder incorporating our proposed WSE, Transformer and DAN with an attention neural network. Moreover, a multi-task

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 92046026, Grant 72172057, Grant 71701089, Grant 61662028, Grant 71901115, and the International Innovation Cooperation Project of Jiangsu Province, China , under Grant BZ2020008.

References (59)

  • LiuF. et al.

    HieNN-DWE: A hierarchical neural network with dynamic word embeddings for document level sentiment classification

    Neurocomputing

    (2020)
  • ZhuG. et al.

    Neural attentive travel package recommendation via exploiting long-term and short-term behaviors

    Knowl.-Based Syst.

    (2021)
  • ZhangY. et al.

    A survey on multi-task learning

    IEEE Trans. Knowl. Data Eng.

    (2021)
  • YadavS. et al.

    Feature assisted stacked attentive shortest dependency path based bi-LSTM model for protein–protein interaction

    Knowl.-Based Syst.

    (2019)
  • B. Luo, Y. Feng, J. Xu, X. Zhang, D. Zhao, Learning to predict charges for criminal cases with legal basis, in:...
  • H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, M. Sun, Legal judgment prediction via topological learning, in: Proceedings...
  • Z. Xu, X. Li, Y. Li, Z. Wang, Y. Fanxu, X. Lai, Multi-task legal judgement prediction combining a subtask of the...
  • W. Yang, W. Jia, X. Zhou, Y. Luo, Legal judgment prediction via multi-perspective bi-feedback network, in: Proceedings...
  • SegalJ.A.

    Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

    Am. Political Sci. Rev.

    (1984)
  • LauderdaleB.E. et al.

    The supreme court’s many median justices

    Am. Political Sci. Rev.

    (2012)
  • Z. Hu, X. Li, C. Tu, Z. Liu, M. Sun, Few-shot charge prediction with discriminative legal attributes, in: Proceedings...
  • P. Wang, Y. Fan, S. Niu, Z. Yang, Y. Zhang, J. Guo, Hierarchical matching network for crime classification, in:...
  • H. Chen, D. Cai, W. Dai, Z. Dai, Y. Ding, Charge-based prison term prediction with deep gating network, in: Proceedings...
  • D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R.S. John, N. Constant, M. Guajardo-Céspedes, S. Yuan, C. Tar, et al....
  • M. Iyyer, V. Manjunatha, J. Boyd-Graber, H. Daumé III, Deep unordered composition rivals syntactic methods for text...
  • W.-C. Lin, T.-T. Kuo, T.-J. Chang, C.-A. Yen, C.-J. Chen, S.-d. Lin, Exploiting machine learning models for chinese...
  • LiuY.-H. et al.

    A two-phase sentiment analysis approach for judgement prediction

    J. Inf. Sci.

    (2018)
  • N. Xu, P. Wang, L. Chen, L. Pan, X. Wang, J. Zhao, Distinguish confusing law articles for legal judgment prediction,...
  • S. Li, B. Liu, L. Ye, H. Zhang, B. Fang, Element-aware legal judgment prediction for criminal cases with confusing...
  • Cited by (10)

    • Legal Judgment Prediction via graph boosting with constraints

      2024, Information Processing and Management
    View all citing articles on Scopus
    View full text