skip to main content
10.1145/3462757.3466102acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

Using transformers to improve answer retrieval for legal questions

Published: 27 July 2021 Publication History

Abstract

Transformer architectures such as BERT, XLNet, and others are frequently used in the field of natural language processing. Transformers have achieved state-of-the-art performance in tasks such as text classification, passage summarization, machine translation, and question answering. Efficient hosting of transformer models, however, is a difficult task because of their large size and high latency. In this work, we describe how we deploy a RoBERTa Base question answer classification model in a production environment. We also compare the answer retrieval performance of a RoBERTa Base classifier against a traditional machine learning model in the legal domain by measuring the performance difference between a trained linear SVM on the publicly available PRIVACYQA dataset. We show that RoBERTa achieves a 31% improvement in F1-score and a 41% improvement in Mean Reciprocal Rank over the traditional SVM.

References

[1]
S. Badugu and R. Manivannan. A study on different closed domain question answering approaches. Int. J. Speech Technol., 23(2):315--325, 2020.
[2]
Z. Bennett, T. Russell-Rose, and K. Farmer. A scalable approach to legal question answering. In Proceedings of the 16th Edition of the International Conference on Artificial Intelligence and Law, ICAIL '17, pages 269--270, New York, NY, USA, 2017. Association for Computing Machinery.
[3]
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. Legal-bert: The muppets straight out of law school, 2020.
[4]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority over-sampling technique. J. Artif. Int. Res., 16(1):321--357, June 2002.
[5]
D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading wikipedia to answer open-domain questions, 2017.
[6]
R. Chitta and A. K. Hudek. A reliable and accurate multiple choice question answering system for due diligence. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL '19, pages 184--188, New York, NY, USA, 2019. Association for Computing Machinery.
[7]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
[8]
X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu. Tinybert: Distilling BERT for natural language understanding. CoRR, abs/1909.10351, 2019.
[9]
V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W. t. Yih. Dense passage retrieval for open-domain question answering, 2020.
[10]
Y. Liu, M. O., N. Goyal, J. Du, M. Joshi, D. Chen, O. L., M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019.
[11]
G. McElvain, G. Sanchez, S. Matthews, D. Teo, F. Pompili, and T. Custis. West-search plus: A non-factoid question-answering system for the legal domain. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21--25, 2019, pages 1361--1364. ACM, 2019.
[12]
M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli. fairseq: A fast, extensible toolkit for sequence modeling, 2019.
[13]
P. Quaresma and I. Rodrigues. A question-answering system for portuguese juridical documents. In Proceedings of the 10th International Conference on Artificial Intelligence and Law, ICAIL '05, pages 256--257, New York, NY, USA, 2005. Association for Computing Machinery.
[14]
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehension of text, 2016.
[15]
A. Ravichander, A. W. Black, S. Wilson, T. B. Norton, and N. M. Sadeh. Question answering for privacy policies: Combining computational and legal perspectives. CoRR, abs/1911.00841, 2019.
[16]
S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, apr 2009.
[17]
P. Story, S. Zimmeck, and N. Sadeh. Which apps have privacy policies? In M. Medina, A. Mitrakas, K. Rannenberg, E. Schweighofer, and N. Tsouroulas, editors, Privacy Technologies and Policy, pages 3--23, Cham, 2018. Springer International Publishing.
[18]
H. R. Turtle. Text retrieval in the legal world. Artif. Intell. Law, 3(1--2):5--54, 1995.
[19]
E. M. Voorhees. The trec-8 question answering track report. In Proceedings of TREC-8, pages 77--82, 1999.
[20]
D. Weissenborn, G. Wiese, and L. Seiffe. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 271--280, Vancouver, Canada, Aug. 2017. Association for Computational Linguistics.
[21]
S. Wilson, F. Schaub, A. A. Dara, F. Liu, S. Cherivirala, P. Giovanni Leon, M. Schaarup Andersen, S. Zimmeck, K. M. Sathyendra, N. C. Russell, T. B. Norton, E. Hovy, J. Reidenberg, and N. Sadeh. The creation and analysis of a website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330--1340, Berlin, Germany, Aug. 2016. Association for Computational Linguistics.
[22]
M. Zhu, A. Ahuja, D. Juan, W. Wei, and C. K. Reddy. Question answering with long multiple-span answers. In T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16--20 November 2020, pages 3840--3849. Association for Computational Linguistics, 2020.

Cited By

View all
  • (2025)DeBERTA-Att-LMCQA: A hybrid model of DeBERTA and attention for legal multi-choice question answeringExpert Systems with Applications10.1016/j.eswa.2025.126579271(126579)Online publication date: May-2025
  • (2024)HistoryQuest: Arabic Question Answering in Egyptian History with LLM Fine-Tuning and Transformer Models2024 Intelligent Methods, Systems, and Applications (IMSA)10.1109/IMSA61967.2024.10652824(135-140)Online publication date: 13-Jul-2024
  • (2024)A Legal Multi-Choice Question Answering Model Based on DeBERTa and Attention Mechanism2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00119(814-821)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
June 2021
319 pages
ISBN:9781450385268
DOI:10.1145/3462757
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BERT engines
  2. deep learning
  3. evaluation
  4. language models
  5. legal applications
  6. question answering

Qualifiers

  • Research-article

Conference

ICAIL '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 69 of 169 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)DeBERTA-Att-LMCQA: A hybrid model of DeBERTA and attention for legal multi-choice question answeringExpert Systems with Applications10.1016/j.eswa.2025.126579271(126579)Online publication date: May-2025
  • (2024)HistoryQuest: Arabic Question Answering in Egyptian History with LLM Fine-Tuning and Transformer Models2024 Intelligent Methods, Systems, and Applications (IMSA)10.1109/IMSA61967.2024.10652824(135-140)Online publication date: 13-Jul-2024
  • (2024)A Legal Multi-Choice Question Answering Model Based on DeBERTa and Attention Mechanism2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00119(814-821)Online publication date: 28-Oct-2024
  • (2024)Legal Natural Language Processing From 2015 to 2022: A Comprehensive Systematic Mapping Study of Advances and ApplicationsIEEE Access10.1109/ACCESS.2023.333394612(145286-145317)Online publication date: 2024
  • (2024)Debiasing large language models: research opportunities*Journal of the Royal Society of New Zealand10.1080/03036758.2024.239856755:2(372-395)Online publication date: 16-Sep-2024
  • (2024)DiscoLQA: zero-shot discourse-based legal question answering on European LegislationArtificial Intelligence and Law10.1007/s10506-023-09387-2Online publication date: 10-Jan-2024
  • (2023)End-to-End Transformer-Based Models in Textual-Based NLPAI10.3390/ai40100044:1(54-110)Online publication date: 5-Jan-2023
  • (2023)Leveraging Unannotated Data to Improve Zero-Shot Question Answering in the French Legal Domain2023 IEEE/ACIS 8th International Conference on Big Data, Cloud Computing, and Data Science (BCD)10.1109/BCD57833.2023.10466348(201-207)Online publication date: 14-Dec-2023
  • (2023)A survey on legal question–answering systemsComputer Science Review10.1016/j.cosrev.2023.10055248:COnline publication date: 1-May-2023
  • (2023)Bringing order into the realm of Transformer-based language models for artificial intelligence and lawArtificial Intelligence and Law10.1007/s10506-023-09374-732:4(863-1010)Online publication date: 20-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media