research-article

Modeling Dynamic Pairwise Attention for Crime Classification over Legal Articles

Authors:

Yongfeng Zhang,

ShaoZhang NiuAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 485 - 494

https://doi.org/10.1145/3209978.3210057

Published: 27 June 2018 Publication History

Abstract

In juridical field, judges usually need to consult several relevant cases to determine the specific articles that the evidence violated, which is a task that is time consuming and needs extensive professional knowledge. In this paper, we focus on how to save the manual efforts and make the conviction process more efficient. Specifically, we treat the evidences as documents, and articles as labels, thus the conviction process can be cast as a multi-label classification problem. However, the challenge in this specific scenario lies in two aspects. One is that the number of articles that evidences violated is dynamic, which we denote as the label dynamic problem. The other is that most articles are violated by only a few of the evidences, which we denote as the label imbalance problem. Previous methods usually learn the multi-label classification model and the label thresholds independently, and may ignore the label imbalance problem. To tackle with both challenges, we propose a unified D ynamic P airwise A ttention M odel (DPAM for short) in this paper. Specifically, DPAM adopts the multi-task learning paradigm to learn the multi-label classifier and the threshold predictor jointly, and thus DPAM can improve the generalization performance by leveraging the information learned in both of the two tasks. In addition, a pairwise attention model based on article definitions is incorporated into the classification model to help alleviate the label imbalance problem. Experimental results on two real-world datasets show that our proposed approach significantly outperforms state-of-the-art multi-label classification methods.

References

[1]

Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2007. Multitask feature learning. In Advances in neural information processing systems. 41--48.

Digital Library

[2]

Zafer Barutcuoglu, Robert E Schapire, and Olga G Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7 (2006), 830--836.

Digital Library

[3]

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6, 1 (June 2004), 20--29.

Digital Library

[4]

Matthew R Boutell, Jiebo Luo, Xipeng Shen, and Christopher M Brown. 2004. Learning multi-label scene classification. Pattern recognition 37, 9 (2004), 1757-- 1771.

[5]

Paula Branco, Luis Torgo, and Rita P Ribeiro. 2015. A Survey of Predictive Modelling under Imbalanced Distributions. arXiv: Learning (2015).

[6]

Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine Learning 73, 2 (2008), 133--153.

Digital Library

[7]

Amanda Clare and Ross D King. 2001. Knowledge Discovery in Multi-label Phenotype Data. european conference on principles of data mining and knowledge discovery (2001), 42--53.

Digital Library

[8]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167.

Digital Library

[9]

Yuxiao Dong, Yang Yang, Jie Tang, Yang Yang, and Nitesh V Chawla. 2014. Inferring user demographics and social strategies in mobile social networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 15--24.

Digital Library

[10]

Andr Elisseeff and Jason Weston. 2001. A kernel method for multi-labelled classification. In International Conference on Neural Information Processing Systems: Natural and Synthetic. 681--687.

Digital Library

[11]

Rong-En Fan and Chih-Jen Lin. 2007. A study on threshold selection for multilabel classification. Department of Computer Science, National Taiwan University (2007), 1--23.

[12]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: a deep learning approach. In International Conference on International Conference on Machine Learning. 513--520.

Digital Library

[13]

Masaru Isonuma, Toru Fujino, Junichiro Mori, Yutaka Matsuo, and Ichiro Sakata. 2017. Extractive Summarization Using Multi-Task Learning with Document Classification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9--11, 2017. 2091--2100.

[14]

Zhuoliang Kang, Kristen Grauman, and Fei Sha. 2011. Learning with Whom to Share in Multi-task Feature Learning. In International Conference on Machine Learning, ICML 2011, Bellevue, Washington, Usa, June 28 - July. 521--528.

Digital Library

[15]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. empirical methods in natural language processing (2014), 1746--1751.

[16]

Oluwasanmi O Koyejo, Nagarajan Natarajan, Pradeep K Ravikumar, and Inderjit S Dhillon. 2015. Consistent Multilabel Classification. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3321--3329. http://papers.nips.cc/paper/5883-consistent-multilabel-classification.pdf

Digital Library

[17]

Miroslav Kubat. 2017. Induction in Multi-Label Domains. (09 2017), 251- 271 pages.

[18]

Quoc V Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. international conference on machine learning (2014), 1188--1196.

Digital Library

[19]

Changsheng Li, Junchi Yan, Fan Wei, Weishan Dong, Qingshan Liu, and Hongyuan Zha. 2016. Self-Paced Multi-Task Learning. national conference on artificial intelligence (2016), 2175--2181.

[20]

Xin Li and Yuhong Guo. 2015. Multi-label classification with feature-aware non-linear label space transformation. In International Conference on Artificial Intelligence. 3635--3642.

Digital Library

[21]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. 1--10.

[22]

Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye Yi Wang. 2015. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification avvnd Information Retrieval. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 912--921.

[23]

Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2015. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114 (2015).

[24]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv: Computation and Language (2013).

[25]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.

[26]

Guillaume Obozinski, Ben Taskar, and Michael I Jordan. 2010. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20, 2 (2010), 231--252.

Digital Library

[27]

Anastasia Pentina and Christoph H Lampert. 2017. Multi-Task Learning with Labeled and Unlabeled Tasks. stat 1050 (2017), 1.

[28]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333--359.

Digital Library

[29]

Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2014. Deep Learning Face Representation by Joint Identification-Verification. Advances in Neural Information Processing Systems 27 (2014), 1988--1996.

Digital Library

[30]

Antonio Torralba, Kevin P Murphy, and William T Freeman. 2007. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 5 (2007), 854--869.

Digital Library

[31]

Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P. Vlahavas. 2008. Multi-label classification of music into emotions. In Ismir 2008, International Conference on Music Information Retrieval, Drexel University, Philadelphia, Pa, Usa, September. 325--330.

[32]

Grigorios Tsoumakas and Ioannis Vlahavas. 2007. Random k-Labelsets: An Ensemble Method for Multilabel Classification. In European Conference on Machine Learning. 406--417.

Digital Library

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). arXiv:1706.03762 http://arxiv.org/abs/ 1706.03762

Digital Library

[34]

Byron C. Wallace, Kevin Small, Carla E. Brodley, and Thomas A. Trikalinos. 2011. Class Imbalance, Redux. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM '11). IEEE Computer Society, Washington, DC, USA, 754--763.

Digital Library

[35]

Yu Wang, David Wipf, Qing Ling, Wei Chen, and Ian Wassell. 2015. Multi-task learning for subspace segmentation. In International Conference on International Conference on Machine Learning. 1209--1217.

Digital Library

[36]

Yiming Yang. 2001. A Study of Thresholding Strategies for Text Categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, NY, USA, 137--145.

Digital Library

[37]

Junho Yim, Heechul Jung, ByungIn Yoo, Changkyu Choi, Dusik Park, and Junmo Kim. 2015. Rotating your face using multi-task deep neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 676--684.

[38]

Shaodan Zhai, Chenyang Zhao, Tian Xia, and Shaojun Wang. 2015. A Multi-label Ensemble Method Based on Minimum Ranking Margin Maximization. In IEEE International Conference on Data Mining. 1093--1098.

Digital Library

[39]

Honglun Zhang, Liqiang Xiao, Yongkun Wang, and Yaohui Jin. 2017. A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning. (2017), 3385--3391.

Digital Library

[40]

Min Ling Zhang and Zhi Hua Zhou. 2006. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Transactions on Knowledge and Data Engineering 18, 10 (2006), 1338--1351.

Digital Library

[41]

Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038-- 2048.

Digital Library

[42]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2014), 1819--1837.

[43]

Tianzhu Zhang, Bernard Ghanem, Si Liu, and Narendra Ahuja. 2013. Robust Visual Tracking via Structured Multi-Task Sparse Learning. International Journal of Computer Vision 101, 2 (2013), 367--383.

Digital Library

[44]

Yu Zhang, Dityan Yeung, and Qian Xu. 2010. Probabilistic Multi-Task Feature Selection. Advances in Neural Information Processing Systems (2010), 2559--2567.

Digital Library

[45]

Erheng Zhong, Ben Tan, Kaixiang Mo, and Qiang Yang. 2013. User demographics prediction based on mobile data. Pervasive and Mobile Computing 9, 6 (2013), 823--837.

Digital Library

Cited By

Paul SBhatt RGoyal PGhosh SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Legal Statute Identification: A Case Study using State-of-the-Art Datasets and MethodsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657879(2231-2240)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657879
Yue LLiu QJin BWu HAn Y(2024)A Circumstance-Aware Neural Framework for Explainable Legal Judgment PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338758036:11(5453-5467)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3387580
Le YQuan ZWang JCao DLi K(2024)$\boldsymbol{R}^{2}$: A Novel Recall & Ranking Framework for Legal Judgment PredictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.336538932(1609-1622)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3365389
Show More Cited By

Index Terms

Modeling Dynamic Pairwise Attention for Crime Classification over Legal Articles

Recommendations

Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training
Abstract
Multi-label document classification has a broad range of applicability to various practical problems, such as news article topic tagging, sentiment analysis, medical code classification, etc. A variety of approaches (e.g., tree-based ...
Highlights
- A novel legal procedural posture dataset (50K cases) for multi-label classification
Dynamic ensemble learning for multi-label classification
Abstract
Ensemble learning has been shown to be an effective approach to solve multi-label classification problem. However, most existing ensemble learning methods do not consider the difference between unseen instances, and existing methods that consider ...
Hyperspherical Learning in Multi-Label Classification
Computer Vision – ECCV 2022
Abstract
Learning from online data with noisy web labels is gaining more attention due to the increasing cost of fully annotated datasets in large-scale multi-label classification tasks. Partial (positive) annotated data, as a particular case of data with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Joint Funds of NSFC-Basic Research on General Technology

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
679
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Paul SBhatt RGoyal PGhosh SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Legal Statute Identification: A Case Study using State-of-the-Art Datasets and MethodsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657879(2231-2240)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657879
Yue LLiu QJin BWu HAn Y(2024)A Circumstance-Aware Neural Framework for Explainable Legal Judgment PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338758036:11(5453-5467)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3387580
Le YQuan ZWang JCao DLi K(2024)$\boldsymbol{R}^{2}$: A Novel Recall & Ranking Framework for Legal Judgment PredictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.336538932(1609-1622)Online publication date: 19-Feb-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3365389
Liu PZhang WDing YZhang XYang S(2024)SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC54092.2024.10830950(3447-3453)Online publication date: 6-Oct-2024
https://doi.org/10.1109/SMC54092.2024.10830950
Zhang YWei XYu H(2024)HD-LJPKnowledge-Based Systems10.1016/j.knosys.2024.112033299:COnline publication date: 18-Oct-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112033
Srivastav APrajapat S(2024)A Decision Tree Approach for Identifying Indian Penal Code Sections Across Different Crime AspectsContributions Presented at The International Conference on Computing, Communication, Cybersecurity and AI, July 3–4, 2024, London, UK10.1007/978-3-031-74443-3_45(773-782)Online publication date: 20-Dec-2024
https://doi.org/10.1007/978-3-031-74443-3_45
Chen JZhang XZhou XHan YZhou Q(2023)An Approach Based on Cross-Attention Mechanism and Label-Enhancement Algorithm for Legal Judgment PredictionMathematics10.3390/math1109203211:9(2032)Online publication date: 25-Apr-2023
https://doi.org/10.3390/math11092032
Liu YWu YZhang YSun CLu WWu FKuang KChen HDuh WHuang HKato MMothe JPoblete B(2023)ML-LJP: Multi-Law Aware Legal Judgment PredictionProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591731(1023-1034)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591731
Li WLi L(2023)Leveraging Task Dependencies and Label Constraints for Legal Judgment Prediction2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191665(01-08)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191665
Deng WYan QWang LZhang SChen HMa T(2023)Hierarchical Structure Based Explainable Pre-Trained Model for Legal Provisions Recommendation2023 5th International Conference on Artificial Intelligence and Computer Applications (ICAICA)10.1109/ICAICA58456.2023.10405466(242-247)Online publication date: 28-Nov-2023
https://doi.org/10.1109/ICAICA58456.2023.10405466
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten