skip to main content
10.1145/3511808.3557379acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Legal Charge Prediction via Bilinear Attention Network

Published:17 October 2022Publication History

ABSTRACT

The legal charge prediction task aims to judge appropriate charges according to the given fact description in cases. Most existing methods formulate it as a multi-class text classification problem and have achieved tremendous progress. However, the performance on low-frequency charges is still unsatisfactory. Previous studies indicate leveraging the charge label information can facilitate this task, but the approaches to utilizing the label information are not fully explored. In this paper, inspired by the vision-language information fusion techniques in the multi-modal field, we propose a novel model (denoted as LeapBank) by fusing the representations of text and labels to enhance the legal charge prediction task. Specifically, we devise a representation fusion block based on the bilinear attention network to interact the labels and text tokens seamlessly. Extensive experiments are conducted on three real-world datasets to compare our proposed method with state-of-the-art models. Experimental results show that LeapBank obtains up to 8.5% Macro-F1 improvements on the low-frequency charges, demonstrating our model's superiority and competitiveness.

Skip Supplemental Material Section

Supplemental Material

CIKM22-fp0487.mp4

mp4

26.9 MB

References

  1. Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2016. Label-Embedding for Image Classification. TPAMI 38 (2016), 1425--1438.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hedi Ben-younes, Rémi Cadène, Matthieu Cord, and Nicolas Thome. 2017. MU- TAN: Multimodal Tucker Fusion for Visual Question Answering. In Proc. of ICCV. 2631--2639.Google ScholarGoogle Scholar
  3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL-HLT. 4171--4186.Google ScholarGoogle Scholar
  4. Cunxiao Du, Zhaozheng Chen, Fuli Feng, Lei Zhu, Tian Gan, and Liqiang Nie. 2019. Explicit Interaction Model towards Text Classification. In Proc. of AAAI. 6359--6366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomás Mikolov. 2013. DeViSE: A Deep Visual- Semantic Embedding Model. In Proc. of NeuIPS. 2121--2129.Google ScholarGoogle Scholar
  6. Congqing He, Li Peng, Yuquan Le, Jiawei He, and Xiangyu Zhu. 2019. SECaps: a sequence enhanced capsule model for charge prediction. In Proc. of ICANN. Springer, 227--239.Google ScholarGoogle Scholar
  7. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-Shot Charge Prediction with Discriminative Legal Attributes. In Proc. of COLING. 487--498.Google ScholarGoogle Scholar
  9. Xin Jiang, Hai Ye, Zhunchen Luo, WenHan Chao, and Wenjia Ma. 2018. Inter-pretable Rationale Augmented Charge Prediction System. In Proc. of COLING. 146--151.Google ScholarGoogle Scholar
  10. Liangyi Kang, Jie Liu, Lingqiao Liu, Qinfeng Shi, and Dan Ye. 2019. Creating auxiliary representations from charge definitions for criminal charge prediction. ArXiv preprint abs/1911.05202 (2019).Google ScholarGoogle Scholar
  11. Daniel Martin Katz, Michael J Bommarito II, and Josh Blackman. 2017. A general approach for predicting the behavior of the Supreme Court of the United States. PloS one 12, 4 (2017), e0174698.Google ScholarGoogle ScholarCross RefCross Ref
  12. R Keown. 1980. Mathematical models for legal prediction. Computer/LJ 2 (1980), 829.Google ScholarGoogle Scholar
  13. Jin-Hwa Kim, Jaehyun Jun, and Byoung-Tak Zhang. 2018. Bilinear Attention Networks. In Proc. of NeurIPS. 1571--1581.Google ScholarGoogle Scholar
  14. Jin-Hwa Kim, Kyoung Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Hadamard Product for Low-rank Bilinear Pooling. In Proc. of ICLR.Google ScholarGoogle Scholar
  15. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proc. of EMNLP. 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  16. Fred Kort. 1957. Predicting Supreme Court Decisions Mathematically: A Quantitative Analysis of the ''Right to Counsel" Cases. American Political Science Review 51, 1 (1957), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  17. Yuquan Le, Congqing He, Meng Chen, Youzheng Wu, Xiaodong He, and Bowen Zhou. 2020. Learning to Predict Charges for Legal Judgment via Self-Attentive Capsule Network. In Proc. of ECAI.Google ScholarGoogle Scholar
  18. Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proc. of ICCV. 1449--1457.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wan-Chen Lin, Tsung-Ting Kuo, Tung-Jia Chang, Chueh-An Yen, Chao-Ju Chen, and Shou-de Lin. 2012. Exploiting Machine Learning Models for Chinese Legal Documents Labeling, Case Classification, and Sentencing Prediction. In Inter- national Journal of Computational Linguistics & Chinese Language Processing, Vol. 17. 49--68.Google ScholarGoogle Scholar
  20. Chao-Lin Liu, Cheng-Tsung Chang, and Jim-How Ho. 2004. Case instance generation and refinement for case-based criminal summary judgments in Chinese. JISE (2004), 783--800.Google ScholarGoogle Scholar
  21. Chao-Lin Liu and Chwen-Dar Hsieh. 2006. Exploring phrase-based classification of judicial documents for criminal charges in chinese. In Proc. of ISMIS. Springer, 681--690.Google ScholarGoogle Scholar
  22. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. ArXiv preprint abs/1907.11692 (2019).Google ScholarGoogle Scholar
  23. Yi-Hung Liu, Yen-Liang Chen, and Wu-Liang Ho. 2015. Predicting associated statutes for legal problems. Information Processing & Management 51, 1 (2015), 194--211.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhiyuan Liu, Cunchao Tu, and Maosong Sun. 2019. Legal cause prediction with inner descriptions and outer hierarchies. In Proc. of CCL. Springer, 573--586.Google ScholarGoogle Scholar
  25. Shangbang Long, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2019. Automatic judgment prediction via legal reading comprehension. In China National Conference on Chinese Computational Linguistics. Springer, 558--572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, and Dongyan Zhao. 2017. Learning to Predict Charges for Criminal Cases with Legal Basis. In Proc. of EMNLP. 2727--2736.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ejan Mackaay and Pierre Robillard. 1974. Predicting judicial decisions: The nearest neighbour rule and visual representation of case patterns. 3(3/4):302--331 pages.Google ScholarGoogle Scholar
  28. Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proc. of ECML-PKDD. Springer, 50--65.Google ScholarGoogle Scholar
  29. Taro Miyazaki, Kiminobu Makino, Yuka Takei, Hiroki Okamoto, and Jun Goto. 2019. Label Embedding using Hierarchical Structure of Labels for Twitter Classification. In Proc. of EMNLP. 6317--6322.Google ScholarGoogle ScholarCross RefCross Ref
  30. Stuart S Nagel. 1963. Applying correlation analysis to case prediction. Tex. L. Rev. 42 (1963), 1006.Google ScholarGoogle Scholar
  31. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proc. of ICML. 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jinseok Nam, Eneldo Loza Mencía, and Johannes Fürnkranz. 2016. All-in Text: Learning Document, Label, and Word Representations Jointly. In Proc. of AAAI. 1948--1954.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yingwei Pan, Ting Yao, Yehao Li, and Tao Mei. 2020. X-Linear Attention Networks for Image Captioning. In Proc. of CVPR. 10968--10977.Google ScholarGoogle ScholarCross RefCross Ref
  34. Nikolaos Pappas and James Henderson. 2019. GILE: A Generalized Input-Label Embedding for Text Classification. TACL 7 (2019), 139--155.Google ScholarGoogle ScholarCross RefCross Ref
  35. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proc. of NAACL-HLT. 2227--2237.Google ScholarGoogle ScholarCross RefCross Ref
  36. Hamed Pirsiavash, Deva Ramanan, and Charless C. Fowlkes. 2009. Bilinear classifiers for visual recognition. In Proc. of NeurIPS. 1482--1490.Google ScholarGoogle Scholar
  37. José A. Rodríguez-Serrano and Florent Perronnin. 2013. Label embedding for text recognition. In Proc. of BMVC.Google ScholarGoogle Scholar
  38. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic Routing Between Capsules. In Proc. of NeurIPS. 3856--3866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513--523.Google ScholarGoogle Scholar
  40. Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P Dinu, and Josef van Genabith. 2017. Exploring the Use of Text Classification in the Legal Domain. In Proceedings of ASAIL workshop.Google ScholarGoogle Scholar
  41. Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters 9, 3 (1999), 293--300.Google ScholarGoogle Scholar
  42. Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks. In Proc. of SIGKDD. 1165--1174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Joshua B Tenenbaum and William T Freeman. 2000. Separating style and content with bilinear models. Neural computation 12, 6 (2000), 1247--1283.Google ScholarGoogle Scholar
  44. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, 11 (2008).Google ScholarGoogle Scholar
  45. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. of NeurIPS. 5998--6008.Google ScholarGoogle Scholar
  46. Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint Embedding of Words and Labels for Text Classification. In Proc. of ACL. 2321--2331.Google ScholarGoogle ScholarCross RefCross Ref
  47. Lin Xiao, Xin Huang, Boli Chen, and Liping Jing. 2019. Label-Specific Document Representation for Multi-Label Text Classification. In Proc. of EMNLP. 466--475.Google ScholarGoogle ScholarCross RefCross Ref
  48. Nuo Xu, Pinghui Wang, Long Chen, Li Pan, Xiaoyan Wang, and Junzhou Zhao. 2020. Distinguish Confusing Law Articles for Legal Judgment Prediction. In Proc. of ACL. 3086--3095.Google ScholarGoogle ScholarCross RefCross Ref
  49. Wenmian Yang, Weijia Jia, Xiaojie Zhou, and Yutao Luo. 2019. Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network. In Proc. of IJCAI. 4085--4091.Google ScholarGoogle ScholarCross RefCross Ref
  50. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proc. of NeurIPS. 5754--5764.Google ScholarGoogle Scholar
  51. Majid Yazdani and James Henderson. 2015. A Model of Zero-Shot Learning of Spoken Language Understanding. In Proc. of EMNLP. 244--249.Google ScholarGoogle ScholarCross RefCross Ref
  52. Zhou Yu, Jun Yu, Jianping Fan, and Dacheng Tao. 2017. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering. In Proc. of ICCV. 1839--1848.Google ScholarGoogle ScholarCross RefCross Ref
  53. Chao Zhang, Zichao Yang, Xiaodong He, and Li Deng. 2020. Multimodal intelligence: Representation learning, information fusion, and applications. JSTSP 14, 3 (2020), 478--493.Google ScholarGoogle Scholar
  54. Honglun Zhang, Liqiang Xiao, Wenqing Chen, Yongkun Wang, and Yaohui Jin. 2018. Multi-Task Label Embedding for Text Classification. In Proc. of EMNLP. 4545--4553.Google ScholarGoogle ScholarCross RefCross Ref
  55. Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, and Maosong Sun. 2018. Legal Judgment Prediction via Topological Learning. In Proc. of EMNLP. 3540--3549.Google ScholarGoogle ScholarCross RefCross Ref
  56. Haoxi Zhong, Yuzhong Wang, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction. In Proc. of AAAI. 1250--1257.Google ScholarGoogle ScholarCross RefCross Ref
  57. Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence. In Proc. of ACL. 5218--5230.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Legal Charge Prediction via Bilinear Attention Network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
        October 2022
        5274 pages
        ISBN:9781450392365
        DOI:10.1145/3511808
        • General Chairs:
        • Mohammad Al Hasan,
        • Li Xiong

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)101
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader