ABSTRACT
In this paper, we argue that nuanced expert annotation often requires a significant rethinking of the traditional paradigms of data annotation. In a small pilot study, we find that even the most highly trained experts demonstrate significant heterogeneity in their evaluation of the document-level coherence of bespoke contracts. The outcomes of our study provide preliminary considerations of how paradigms of document annotation should fully utilize expert annotations in bespoke contexts.
- Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. 2021. LexGLUE: A benchmark dataset for legal language understanding in English. arXiv preprint arXiv:2110.00976 (2021).Google Scholar
- Dan Hendrycks, Collin Burns, Anya Chen, and Spencer Ball. 2021. Cuad: An expert-annotated nlp dataset for legal contract review. arXiv preprint arXiv:2103.06268 (2021).Google Scholar
- Yuta Koreeda and Christopher D Manning. 2021. ContractNLI: A dataset for document-level natural language inference for contracts. arXiv preprint arXiv:2110.01799 (2021).Google Scholar
- Khiem H Le, Tuan V Tran, Hieu H Pham, Hieu T Nguyen, Tung T Le, and Ha Q Nguyen. 2022. Learning from multiple expert annotators for enhancing anomaly detection in medical image analysis. arXiv preprint arXiv:2203.10611 (2022).Google Scholar
- Spyretta Leivaditi, Julien Rossi, and Evangelos Kanoulas. 2020. A benchmark for lease contract review. arXiv preprint arXiv:2010.10386 (2020).Google Scholar
- Jessica C. Pearlman. 2021. 2021 ABA PRIVATE TARGET MERGERS ACQUISITIONS DEAL POINTS STUDY. Retrieved May 5, 2023 from https://www.klgates.com/2021-ABA-Private-Target-Mergers-Acquisitions-Deal-Points-Study-12-31-2021Google Scholar
- Paul Röttger, Bertie Vidgen, Dirk Hovy, and Janet B Pierrehumbert. 2021. Two contrasting data annotation paradigms for subjective nlp tasks. arXiv preprint arXiv:2112.07475 (2021).Google Scholar
- Steven H Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dimitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, and Dan Hendrycks. 2023. MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding. arXiv preprint arXiv:2301.00876 (2023).Google Scholar
- Spencer Williams. 2020. Contracts as systems. Del. J. Corp. L. 45 (2020), 219.Google Scholar
Index Terms
- Conceptual Questions in Developing Expert-Annotated Data
Recommendations
Compressed data structures for annotated web search
WWW '12: Proceedings of the 21st international conference on World Wide WebEntity relationship search at Web scale depends on adding dozens of entity annotations to each of billions of crawled pages and indexing the annotations at rates comparable to regular text indexing. Even small entity search benchmarks from TREC and INEX ...
Named Entity Recognition for Partially Annotated Datasets
Natural Language Processing and Information SystemsAbstractThe most common Named Entity Recognizers are usually sequence taggers trained on fully annotated corpora, i.e. the class of all words for all entities is known. Partially annotated corpora, i.e. some but not all entities of some types are ...
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient ...
Comments