Elsevier

Neurocomputing

Volume 433, 14 April 2021, Pages 169-180
Neurocomputing

Commonalities-, specificities-, and dependencies-enhanced multi-task learning network for judicial decision prediction

https://doi.org/10.1016/j.neucom.2020.10.010Get rights and content

Abstract

Judicial Decision Prediction (JDP) aims to determine judicial decisions per the fact description of a criminal case. It comprises multiple subtasks, i.e., law article prediction, charge prediction, and term of the penalty prediction. Besides, there exist three properties among the subtasks, i.e., Commonalities,Specificities, and Dependencies. Nonetheless, existing approaches are usually well-designed for only a specific subtask, or take one of the properties into consideration for multiple subtasks. In this paper, we propose a novel Commonalities-, Specificities- and Dependencies-Enhanced Multi-Task Learning Network, to unify multiple subtasks accompanied by the properties in a framework. Further, while handling the Dependencies, we elaborate a learning module to ensure each subtask to learn contributions from other subtasks to varying degrees, a denoising module to minimize noise interferences among subtasks, and a reinforcing module to guarantee further enhancement for each subtask. Experimental results on two widely used datasets demonstrate that our model significantly and consistently outperforms previous state-of-the-art methods on most evaluation metrics across all subtasks.

Introduction

Judicial Decision Prediction (JDP) refers to determining appropriate judicial decisions (e.g., relevant law articles, charges, and term of penalty) for a criminal case based on its corresponding fact description. JDP is a promising application in legal assistant systems. It helps non-legal professionals to obtain approximate judgments and professionals (e.g., lawyers and judges) to view as a handy reference. Also, JDP is a challenging task compared to fundamental applications of NLP. On the one hand, it contains multiple subtasks, i.e., law article prediction, charge prediction, and term of the penalty prediction, denoted as Task 1, 2, and 3, respectively. On the other hand, it involves a higher level of reasoning and understanding.

In recent years, many methods [1], [2], [3], [4] have been proposed for single subtask (e.g., charge prediction), ignoring the fact that JDP consists of multiple subtasks and possesses the following three properties.

As aforementioned, JDP contains three subtasks, namely, law article prediction, charge prediction and term of penalty prediction. The three subtasks tend to represent some similarities while making predictions according to the given fact description. For example, as shown in Fig. 1, all subtasks are apt to focus on the keyword ”steal” stated in fact description. In other words, the keyword ”steal” could facilitate the prediction of the three subtasks. Through the keyword ”steal”, Task 1 can easily predict the law articles related to it, Taks 2 can directly determine the charge is theft, and Task 3 can predict the approximate term of penalty. We define the similarities among subtasks as commonalities.

In addition to the commonalities among subtasks, each subtask possess some unique characteristics. For example, different subtasks have different emphasizes on different contents of the fact description while making predictions. In detail, as shown in Fig. 1, Task 3 is particularly sensitive to some digits/numbers in the fact description, that is, ”6” and ”10,550”. The numbers represent the number of thefts committed and the amount of money stolen, respectively. The larger the number, the longer the term of penalty predicted. In contrast, Tasks 1 and 2 are less sensitive to numbers, that is, the number of thefts committed and the amount of money stolen have little or nearly no effects on the law and charge prediction. We call the properties specific to a subtask as specificities.

In contrast to the conspicuousness of commonalities and specificities, the relationships among subtasks may be more obscure. During judgments, a judge conforms to a judicial logic. The judge first deduces relevant laws violated by a defendant based on fact description, then determines the charges according to the laws, and finally determines the term of penalty by considering the laws and the charges comprehensively [5]. Besides, subtasks contribute differently to each other. As shown in Fig. 1, the content of the law article stipulates the term of imprisonment (i.e., “fixed-term imprisonment of no more than three years”), which can directly facilitate the prediction of term of penalty (“fixed-term imprisonment of 14 months”). Such implicit relationships among subtasks are defined as dependencies.

In this work, we propose a novel Commonalities-, Specificities-, and Dependencies-Enhanced Multi-Task Learning Network, abbreviated as CSDNet, to model multiple subtasks jointly leveraging the three properties aforementioned. Moreover, while handling the dependencies, CSDNet is consistent with the intuition that each subtask contributes to other subtasks to varying degrees and can auto-learn the degrees by a well-designed learning module. Identifying the “degree of contribution” enables one subtask to learn auxiliary information from other subtasks accurately, thereby hugely reducing noise interferences. Even so, the noise interferences cannot be wholly eradicated. In order to minimize noise interferences among subtasks to the hilt, we adopt a similar approach with the learning module to devise a denoising module, that is, learning the “degree of contamination” caused by one subtask to other subtasks. Through “degree of contamination”, one subtask can properly eliminate the noise interferences caused by the interaction with other subtasks. After learning and denoising modules, the original information of one subtask may be suffering from the occupation of other subtasks and lose its dominance. Therefore, to avoid this situation, we invent a reinforcement module to learn the “degree of enhancement” of a subtask to ensure that each subtask still occupies a dominant position after incorporating the information of other subtasks.

Notably, to the best of our knowledge, we are the first to incorporate the commonalities, specificities, and dependencies in a unified network and the first to figure out the issue of noise interferences among subtasks in JDP. Besides, we propose a combination of LSTMs [6], named Q-LSTM, and make it as the fundamental component of CSDNet.

In summary, our key contributions lie on the following aspects:

  • We propose a novel multi-task learning network to address multiple subtasks of JDP incorporating the three properties, i.e., Commonalities, Specificities, and Dependencies.

  • We elaborate a Learning Module to let each subtask to learn contributions from other subtasks to varying degrees, a Denoising Module to eliminate noise interferences, and a Reinforcing Module to achieve further enhancement.

  • We invent a combination of LSTMs, named Q-LSTM, which excels conventional deep neural networks, i.e., CNN, LSTM, HLSTM and stacked BiLSTM over single subtask of JDP. In addition, the proposed Q-LSTM shows competitive generalization performance in text classification, named entity recognition and part of speech tagging tasks.

  • Extensive experiments on two datasets demonstrate that CSDNet significantly and consistently exceeds previous state-of-the-art methods across all representative subtasks.

Section snippets

Judicial decision prediction

JDP has achieved great progress [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. To elevate the result of charge prediction, Luo et al. [17] extract relevant law articles as legal assistance and uses a hierarchical structure to encode fact description and extracted law articles. Jiang et al. [1] leverage reinforcement learning to select rationales (short, readable, and decisive snippets) as the explanation of charge prediction. To correctly recognize few-shot charges (e.g., scalping

Problem definition

JDP aims to predict the judicial decisions according to the fact description of a given criminal case. It includes several subtasks, and the subtasks are not independent but related to each other. Each subtask can be defined as a text classification problem, with the same fact description as the input, but output different categories. Different subtasks have different numbers of categories. Suppose JDP contains ksubtasks Y1,Y2,,Yk, the input of each subtask is the same fact description X, a

Methodology

In this section, we will introduce the proposed Commonalities-, Specificities- and Dependencies-Enhanced Multi-Task Learning Network in detail. Fig. 2 depicts the overview of CSDNet. Before we introduce the network, we first introduce the fundamental component that constitutes it, namely Q-LSTM.

Experiments

We take the commonalities, specificities, and dependencies into account and innovatively integrate them into a unified framework. For the sake of thoroughly demonstrating the superiority of CSDNet, we choose three classic subtasks of JDP, i.e., law article prediction, charge prediction, and term of penalty prediction, in line with previous researches.

Conclusion

In this paper, we propose a novel multi-task learning network, CSDNet, to jointly tackle multiple subtasks incorporating the properties, i.e., Commonalities, Specificities, and Dependencies. Particularly, while handling the Dependencies, we elaborate a learning module to learn contributions to varying degrees, a denoising module to eliminate noise interferences, and a reinforcing module to enhance each subtask. Notably, we are the first to unify multiple subtasks accompanied by the three

CRediT authorship contribution statement

Fanglong Yao: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Visualization. Xian Sun: Validation, Writing - original draft, Supervision. Hongfeng Yu: Formal analysis, Resources, Writing - review & editing. Wenkai Zhang: Data curation, Writing - review & editing. Kun Fu: Funding acquisition, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is supported by the National Science Fund for Distinguished Young Scholars of China under Grant #61725105.

Fanglong Yao received the B.Sc. degree from Inner Mongolia University, Hohhot, China, in 2017. He is currently pursuing the Ph.D. degree with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.

His research interests include deep learning, natural language processing and multi-modal learning.

References (43)

  • X. Jiang et al.

    Interpretable rationale augmented charge prediction system

  • Z. Hu et al.

    Few-shot charge prediction with discriminative legal attributes

  • C. He, L. Peng, Y. Le, J. He, Secaps: A sequence enhanced capsule model for charge prediction, arXiv preprint...
  • A.H.N. Tran, Applying deep neural network to retrieve relevant civil law articles, in: Proceedings of the Student...
  • H. Zhong et al.

    Legal judgment prediction via topological learning

  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • F. Kort

    Predicting supreme court decisions mathematically: a quantitative analysis of the ’right to counsel’ cases

    Am. Political Sci. Rev.

    (1957)
  • S.S. Ulmer

    Quantitative analysis of judicial processes: Some practical and theoretical applications

    Law Contemp. Probs.

    (1963)
  • S.S. Nagel

    Applying correlation analysis to case prediction

    Tex. L. Rev.

    (1963)
  • R. Keown

    Mathematical models for legal prediction

    Computer/lj

    (1980)
  • J.A. Segal

    Predicting supreme court cases probabilistically: The search and seizure cases, 1962–1981

    Am Political Sci Rev

    (1984)
  • B.E. Lauderdale et al.

    The supreme court’s many median justices

    Am. Political Sci. Rev.

    (2012)
  • C.-L. Liu, C.-D. Hsieh, Exploring phrase-based classification of judicial documents for criminal charges in chinese,...
  • W.-C. Lin, T.-T. Kuo, T.-J. Chang, C.-A. Yen, C.-J. Chen, S.-D. Lin, Exploiting machine learning models for chinese...
  • N. Aletras et al.

    Predicting judicial decisions of the european court of human rights: A natural language processing perspective

    PeerJ Computer Sci.

    (2016)
  • O.-M. Şulea, M. Zampieri, M. Vela, J. van Genabith, Predicting the law area and decisions of french supreme court...
  • B. Luo et al.

    Learning to predict charges for criminal cases with legal basis

  • H. Chen, D. Cai, W. Dai, Z. Dai, Y. Ding, Charge-based prison term prediction with deep gating network (08...
  • W. Yang, W. Jia, X. Zhou, Y. Luo, Legal judgment prediction via multi-perspective bi-feedback network, arXiv preprint...
  • O. Firat, K. Cho, Y. Bengio, Multi-way, multilingual neural machine translation with a shared attention mechanism, CoRR...
  • Y. Yang, T. Hospedales, Deep multi-task representation learning: A tensor factorisation...
  • Cited by (4)

    • MVE-FLK: A multi-task legal judgment prediction via multi-view encoder fusing legal keywords

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Inspired by the study [10], Xu et al. [22] designed a graph distillation operator combined with a graph neural network to distinguish confusing legal decisions. Another sub-stream of related work centers attention on extracting legal attributes or adding input information, so as to enhance the performance of LJP [8,13,15,23–25], such as fact description, defendant information, court opinions, discriminatory attributes and elements of law, etc. For instance, Xu et al. [8] suggested extracting crime keywords from fact descriptions and integrating them into LJP for multi-task learning.

    Fanglong Yao received the B.Sc. degree from Inner Mongolia University, Hohhot, China, in 2017. He is currently pursuing the Ph.D. degree with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China.

    His research interests include deep learning, natural language processing and multi-modal learning.

    Xian Sun received the B.Sc. degree from Beihang University, Beijing, China, in 2004, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2006 and 2009, respectively.

    He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote-sensing image understanding.

    Hongfeng Yu received the B.Sc. degree and M.Sc degree from Peking University, Beijing, China, in 2013 and 2016 respectively. He is currently a Research Assistant at the Institute of Electronics, Chinese Academy of Sciences.

    His research interests include deep learning and natural language processing.

    Wenkai Zhang received the B.Sc. degree from China University of Petroleum, Shandong, China, in 2013, and the Ph.D. degree from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2018. He is currently a Research Assistant at Aerospace Information Research Institute, Chinese Academy of Sciences.

    His research interests include remote sensing image semantic segmentation and multi-media information processing.

    Kun Fu received the B.Sc., M.Sc., and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1995, 1999, and 2002, respectively.

    He is currently a Professor with Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, remote sensing image understanding, geospatial data mining, and visualization.

    View full text