research-article

Conceptual Questions in Developing Expert-Annotated Data

Authors:
Megan Ma

Stanford Center for Legal Informatics, Stanford Law School, Palo Alto, California, USA

Stanford Center for Legal Informatics, Stanford Law School, Palo Alto, California, USA

0000-0002-9488-6302
View Profile

,
Brandon Waldon

Stanford University, Palo Alto, California, USA

Stanford University, Palo Alto, California, USA

0000-0001-8046-1701
View Profile

,
Julian Nyarko

Stanford Law School, Palo Alto, California, USA

Stanford Law School, Palo Alto, California, USA

0000-0002-7121-5696
View Profile

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and LawJune 2023Pages 427–431https://doi.org/10.1145/3594536.3595139

Published:07 September 2023Publication History

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

Pages 427–431

ABSTRACT

In this paper, we argue that nuanced expert annotation often requires a significant rethinking of the traditional paradigms of data annotation. In a small pilot study, we find that even the most highly trained experts demonstrate significant heterogeneity in their evaluation of the document-level coherence of bespoke contracts. The outcomes of our study provide preliminary considerations of how paradigms of document annotation should fully utilize expert annotations in bespoke contexts.

References

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. 2021. LexGLUE: A benchmark dataset for legal language understanding in English. arXiv preprint arXiv:2110.00976 (2021).Google Scholar
Dan Hendrycks, Collin Burns, Anya Chen, and Spencer Ball. 2021. Cuad: An expert-annotated nlp dataset for legal contract review. arXiv preprint arXiv:2103.06268 (2021).Google Scholar
Yuta Koreeda and Christopher D Manning. 2021. ContractNLI: A dataset for document-level natural language inference for contracts. arXiv preprint arXiv:2110.01799 (2021).Google Scholar
Khiem H Le, Tuan V Tran, Hieu H Pham, Hieu T Nguyen, Tung T Le, and Ha Q Nguyen. 2022. Learning from multiple expert annotators for enhancing anomaly detection in medical image analysis. arXiv preprint arXiv:2203.10611 (2022).Google Scholar
Spyretta Leivaditi, Julien Rossi, and Evangelos Kanoulas. 2020. A benchmark for lease contract review. arXiv preprint arXiv:2010.10386 (2020).Google Scholar
Jessica C. Pearlman. 2021. 2021 ABA PRIVATE TARGET MERGERS ACQUISITIONS DEAL POINTS STUDY. Retrieved May 5, 2023 from https://www.klgates.com/2021-ABA-Private-Target-Mergers-Acquisitions-Deal-Points-Study-12-31-2021Google Scholar
Paul Röttger, Bertie Vidgen, Dirk Hovy, and Janet B Pierrehumbert. 2021. Two contrasting data annotation paradigms for subjective nlp tasks. arXiv preprint arXiv:2112.07475 (2021).Google Scholar
Steven H Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dimitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, and Dan Hendrycks. 2023. MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding. arXiv preprint arXiv:2301.00876 (2023).Google Scholar
Spencer Williams. 2020. Contracts as systems. Del. J. Corp. L. 45 (2020), 219.Google Scholar

Index Terms

Conceptual Questions in Developing Expert-Annotated Data

Recommendations

Compressed data structures for annotated web search
WWW '12: Proceedings of the 21st international conference on World Wide Web

Entity relationship search at Web scale depends on adding dozens of entity annotations to each of billions of crawled pages and indexing the annotations at rates comparable to regular text indexing. Even small entity search benchmarks from TREC and INEX ...
Read More
Named Entity Recognition for Partially Annotated Datasets
Natural Language Processing and Information Systems
Abstract
The most common Named Entity Recognizers are usually sequence taggers trained on fully annotated corpora, i.e. the class of all words for all entities is known. Partially annotated corpora, i.e. some but not all entities of some types are ...
Read More
Building a semantically annotated corpus of clinical texts

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law
June 2023
499 pages
ISBN:9798400701979
DOI:10.1145/3594536
Conference Chair:
Francisco Andrade
University of Minho, Portugal
,
Program Chair:
Matthias Grabmair
Technical University of Munich, Germany
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contract review
data annotation paradigms
domain expertise
large language models
legal NLP
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate69of169submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 27
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Conceptual Questions in Developing Expert-Annotated Data

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compressed data structures for annotated web search

Named Entity Recognition for Partially Annotated Datasets

Building a semantically annotated corpus of clinical texts