short-paper

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System

Authors:

Shaoping MaAuthors Info & Claims

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2342 - 2348

https://doi.org/10.1145/3404835.3463250

Published: 11 July 2021 Publication History

Abstract

Legal case retrieval is of vital importance for ensuring justice in different kinds of law systems and has recently received increasing attention in information retrieval (IR) research. However, the relevance judgment criteria of previous retrieval datasets are either not applicable to non-cited relationship cases or not instructive enough for future datasets to follow. Besides, most existing benchmark datasets do not focus on the selection of queries. In this paper, we construct the Chinese Legal Case Retrieval Dataset (LeCaRD), which contains 107 query cases and over 43,000 candidate cases. Queries and results are adopted from criminal cases published by the Supreme People's Court of China. In particular, to address the difficulty in relevance definition, we propose a series of relevance judgment criteria designed by our legal team and corresponding candidate case annotations are conducted by legal experts. Also, we develop a novel query sampling strategy that takes both query difficulty and diversity into consideration. For dataset evaluation, we implemented several existing retrieval models on LeCaRD as baselines. The dataset is now available to the public together with the complete data processing details.

Supplementary Material

MP4 File (1494.mp4)

Presentation video of LeCaRD

Download
22.82 MB

References

[1]

Piyush Arora, Murhaf Hossari, Alfredo Maldonado, Clare Conran, and Gareth JF Jones. 2018. Challenges in the development of effective systems for professional legal search. In ProfS/KG4IR/Data: Search@ SIGIR.

[2]

Trevor Bench-Capon, Michał Araszkiewicz, Kevin Ashley, Katie Atkinson, Floris Bex, Filipe Borges, Daniele Bourcier, Paul Bourgine, Jack G Conrad, Enrico Francesconi, et al. 2012. A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law. Artificial Intelligence and Law, Vol. 20, 3 (2012), 215--319.

Digital Library

[3]

Paheli Bhattacharya, Kripabandhu Ghosh, Saptarshi Ghosh, Arindam Pal, Parth Mehta, Arnab Bhattacharya, and Prasenjit Majumder. 2019. Overview of the FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance. In FIRE (Working Notes). 1--12.

[4]

WG Cochran. 1977. Double sampling. Cochran WG. Sampling techniques. 3rd ed. New York: John Wiley & Sons, Inc (1977), 327--58.

[5]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, Vol. 20, 1 (1960), 37--46.

[6]

Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.

Digital Library

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[8]

Joseph L Fleiss and Jacob Cohen. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and psychological measurement, Vol. 33, 3 (1973), 613--619.

[9]

Hanjo Hamann. 2019. The German Federal Courts Dataset 1950--2019: From Paper Archives to Linked Open Data. Journal of Empirical Legal Studies, Vol. 16, 3 (2019), 671--688.

[10]

Yoshinobu Kano, Mi-Young Kim, Masaharu Yoshioka, Yao Lu, Juliano Rabelo, Naoki Kiyota, Randy Goebel, and Ken Satoh. 2018. Coliee-2018: Evaluation of the competition on legal information extraction and entailment. In JSAI International Symposium on Artificial Intelligence. Springer, 177--192.

[11]

D Lewis. 1996. The TREC-5 filtering track, TREC-5.

[12]

Daniel Locke and Guido Zuccon. 2018. A test collection for evaluating legal case law search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1261--1264.

Digital Library

[13]

Daniel Locke, Guido Zuccon, and Harrisen Scells. 2017. Automatic query generation from legal texts for case law retrieval. In Asia Information Retrieval Symposium. Springer, 181--193.

Digital Library

[14]

Jay M Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 275--281.

Digital Library

[15]

A Rakhlin. 2016. Convolutional Neural Networks for Sentence Classification. GitHub (2016).

[16]

Radim Rehurek, Petr Sojka, et al. 2011. Gensim-statistical semantics in python. Retrieved from genism. org (2011).

[17]

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp, Vol. 109 (1995), 109.

[18]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, Vol. 24, 5 (1988), 513--523.

[19]

Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Weizhi Ma, Ken Satoh, Min Zhang, and Shaoping Ma. [n.d.]. BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval.

[20]

Olga Shulayeva, Advaith Siddharthan, and Adam Wyner. 2017. Recognizing cited facts and principles in legal judgements. Artificial Intelligence and Law, Vol. 25, 1 (2017), 107--126.

Digital Library

[21]

Maosong Sun, Xinxiong Chen, Kaixu Zhang, Zhipeng Guo, and Zhiyuan Liu. 2016. Thulac: An efficient lexical analyzer for chinese.

[22]

Marc Van Opijnen and Cristiana Santos. 2017. On the concept of relevance in legal information retrieval. Artificial Intelligence and Law, Vol. 25, 1 (2017), 65--87.

Digital Library

[23]

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, et al. 2018. Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478 (2018).

[24]

Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen Hu, Heng Wang, et al. 2019. Cail2019-scm: A dataset of similar case matching in legal domain. arXiv preprint arXiv:1911.08962 (2019).

[25]

Haoxi Zhong, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. 2019. Open Chinese Language Pre-trained Model Zoo. Technical Report. https://github.com/thunlp/openclap

Cited By

Qin WYu WZhang KZhao HXu JWen J(2025)Uncertainty-aware evidential learning for legal case retrieval with noisy correspondenceInformation Sciences10.1016/j.ins.2025.121915(121915)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2025.121915
Xiaoyi LJingwen C(2025)The Use of Artificial Intelligence in Chinese Humanities and Social Sciences ResearchKI in Medien, Kommunikation und Marketing10.1007/978-3-658-46344-1_20(277-300)Online publication date: 15-Feb-2025
https://doi.org/10.1007/978-3-658-46344-1_20
Guan JYu ZLiao YTang RDuan MHan G(2024)Predicting Critical Path of Labor Dispute Resolution in Legal Domain by Machine Learning Models Based on SHapley Additive exPlanations and Soft Voting StrategyMathematics10.3390/math1202027212:2(272)Online publication date: 14-Jan-2024
https://doi.org/10.3390/math12020272
Show More Cited By

Index Terms

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Legal case retrieval, which aims to find relevant cases for a query case, plays a core role in the intelligent legal system. Despite the success that pre-training has achieved in ad-hoc retrieval tasks, effective pre-training strategies for legal case ...
Investigating User Behavior in Legal Case Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Legal case retrieval is a specialized IR task aiming to retrieve supporting cases given a query case. While recent research efforts are committed to improving the automatic retrieval models' performances, little attention has been paid to the practical ...
Understanding Relevance Judgments in Legal Case Retrieval
Legal case retrieval, which aims to retrieve relevant cases given a query case, has drawn increasing research attention in recent years. While much research has worked on developing automatic retrieval models, how to characterize relevance in this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2021

2998 pages

ISBN:9781450380379

DOI:10.1145/3404835

General Chairs:
Fernando Diaz
(Google)
,
Chirag Shah
University of Washington
,
Torsten Suel
New York University
,
Program Chairs:
Pablo Castells
Universidad Autónoma de Madrid, Amazon
,
Rosie Jones
Spotify
,
Tetsuya Sakai
Waseda University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

the National Key Research and Development Program of China
Beijing Academy of Artificial Intelligence (BAAI)
Natural Science Foundation of China
Tsinghua University Guoqiang Research Institute

Conference

SIGIR '21

Sponsor:

SIGIR

SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
764
Total Downloads

Downloads (Last 12 months)208
Downloads (Last 6 weeks)11

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qin WYu WZhang KZhao HXu JWen J(2025)Uncertainty-aware evidential learning for legal case retrieval with noisy correspondenceInformation Sciences10.1016/j.ins.2025.121915(121915)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2025.121915
Xiaoyi LJingwen C(2025)The Use of Artificial Intelligence in Chinese Humanities and Social Sciences ResearchKI in Medien, Kommunikation und Marketing10.1007/978-3-658-46344-1_20(277-300)Online publication date: 15-Feb-2025
https://doi.org/10.1007/978-3-658-46344-1_20
Guan JYu ZLiao YTang RDuan MHan G(2024)Predicting Critical Path of Labor Dispute Resolution in Legal Domain by Machine Learning Models Based on SHapley Additive exPlanations and Soft Voting StrategyMathematics10.3390/math1202027212:2(272)Online publication date: 14-Jan-2024
https://doi.org/10.3390/math12020272
Xue ZLiu HHu YQian YWang YKong KWang CLiu YShen WLarson K(2024)LEEC for judicial fairnessProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/833(7527-7535)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/833
Zhao JGuan ZZhao WJiang YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Enhancing Criminal Case Matching through Diverse Legal FactorsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657960(2379-2383)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657960
Li HShao YWu YAi QMa YLiu YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657887(2251-2260)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657887
Qin WCao ZYu WSi ZChen SXu JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Explicitly Integrating Judgment Prediction with Legal Document Retrieval: A Law-Guided Generative ApproachProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657717(2210-2220)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657717
Yue LLiu QZhao LWang LGao WAn YHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Event Grounded Criminal Court View Generation with Cooperative (Large) Language ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657698(2221-2230)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657698
Ye FLi SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MileCut: A Multi-view Truncation Framework for Legal Case RetrievalProceedings of the ACM Web Conference 202410.1145/3589334.3645349(1341-1349)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645349
Yue LLiu QJin BWu HAn Y(2024)A Circumstance-Aware Neural Framework for Explainable Legal Judgment PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338758036:11(5453-5467)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3387580
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten