skip to main content
10.1145/3477495.3531872acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Summarizing Legal Regulatory Documents using Transformers

Published: 07 July 2022 Publication History

Abstract

Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.

Supplementary Material

MP4 File (SIGIR22-sp1890.mp4)
Presentation video for "Summarizing Legal Regulatory Documents using Transformers"

References

[1]
Sophia Althammer, Arian Askari, Suzan Verberne, and Allan Hanbury. 2021. DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval. (2021). https://arxiv.org/abs/2108.03937
[2]
Arian Askari and Suzan Verberne. 2021. Combining Lexical and Neural Retrieval with Longformer-based Summarization for Effective Case Law Retrieval. In Proc. of the Second International Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES). 162--170.
[3]
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long- Document Transformer. (2020). https://arxiv.org/abs/2004.05150
[4]
Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh. 2021. Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents. CoRR abs/2106.15876 (2021). arXiv:2106.15876 https://arxiv.org/abs/2106.15876
[5]
Ilias Chalkidis, Manos Fergadiotis, and Prodromos Malakasiotis. 2019. Large-Scale Multi-Label Text Classification on EU Legislation. Technical Report. 6314--6322 pages. https://eur-lex.europa.eu/
[6]
Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, and Þorsteinn Björnsson. 2021. IceSum: An Icelandic Text Summarization Corpus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 9--14.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[8]
Diego Feijo and Viviane Moreira. 2019. Summarizing legal rulings: Comparative experiments. In proceedings of the international conference on recent advances in natural language processing (RANLP 2019). 313--322.
[9]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://www.aclweb.org/anthology/W04--1013
[10]
Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. (2019). http://arxiv.org/abs/1903.10318
[11]
Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. CoRR abs/1908.08345 (2019). arXiv:1908.08345 http://arxiv.org/abs/1908.08345
[12]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[13]
Ye Liu, Jianguo Zhang, Yao Wan, Congying Xia, Lifang He, and Philip Yu. 2021. HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 146--154. https://doi.org/10.18653/v1/2021.emnlp-main.13
[14]
Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System. In Proc. of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2342--2348.
[15]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404--411.
[16]
Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A Deep Reinforced Model for Abstractive Summarization. CoRR abs/1705.04304 (2017). arXiv:1705.04304 http://arxiv.org/abs/1705.04304
[17]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.
[18]
Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. COLIEE 2020: Methods for Legal Document Retrieval and Entailment. In New Frontiers in Artificial Intelligence - JSAI-isAI 2020 Workshops, JURISIN, LENLS 2020 Workshops. 196--210.
[19]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[20]
Ming Zhong, Pengfei Liu, Yiran Chen, DanqingWang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. (2020), 6197--6208. https://doi.org/10.18653/v1/2020.acl-main.552

Cited By

View all
  • (2025)Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP ApproachesIEEE Access10.1109/ACCESS.2025.353321713(18253-18276)Online publication date: 2025
  • (2024)Adaptive Search Support for Teachers in Lesson PlanningAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664921(20-24)Online publication date: 27-Jun-2024
  • (2024)CivilSum: A Dataset for Abstractive Summarization of Indian Court DecisionsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657859(2241-2250)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. eur-lex
  2. extractive text summarization
  3. legal ir
  4. transformer

Qualifiers

  • Short-paper

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)8
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP ApproachesIEEE Access10.1109/ACCESS.2025.353321713(18253-18276)Online publication date: 2025
  • (2024)Adaptive Search Support for Teachers in Lesson PlanningAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3664921(20-24)Online publication date: 27-Jun-2024
  • (2024)CivilSum: A Dataset for Abstractive Summarization of Indian Court DecisionsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657859(2241-2250)Online publication date: 10-Jul-2024
  • (2024)It cannot be right if it was written by AI: on lawyers’ preferences of documents perceived as authored by an LLM vs a humanArtificial Intelligence and Law10.1007/s10506-024-09422-wOnline publication date: 3-Dec-2024
  • (2024)A support system for the detection of abusive clauses in B2C contractsArtificial Intelligence and Law10.1007/s10506-024-09408-8Online publication date: 26-Jun-2024
  • (2024)Extractive Summarization of Indian Legal Judgments: Bridging NLP and Generative AI for Socially Responsible Content GenerationGenerative AI: Current Trends and Applications10.1007/978-981-97-8460-8_15(329-352)Online publication date: 10-Dec-2024
  • (2024)Turkish Legal Single-Document SummarizingInformation Technologies and Their Applications10.1007/978-3-031-73420-5_3(32-41)Online publication date: 17-Oct-2024
  • (2024)Advancing Legal NLP: Application of Pre-trained Language Models in the Legal DomainNew Trends in Database and Information Systems10.1007/978-3-031-70421-5_26(309-317)Online publication date: 14-Nov-2024
  • (2023)End-to-End Transformer-Based Models in Textual-Based NLPAI10.3390/ai40100044:1(54-110)Online publication date: 5-Jan-2023
  • (2023)Strategy-aware Bundle Recommender SystemProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591771(1198-1207)Online publication date: 19-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media