skip to main content
10.1145/2600428.2609506acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Assessing the reliability and reusability of an E-discovery privilege test collection

Published: 03 July 2014 Publication History

Abstract

In some jurisdictions, parties to a lawsuit can request documents from each other, but documents subject to a claim of privilege may be withheld. The TREC 2010 Legal Track developed what is presently the only public test collection for evaluating privilege classification. This paper examines the reliability and reusability of that collection. For reliability, the key question is the extent to which privilege judgments correctly reflect the opinion of the senior litigator whose judgment is authoritative. For reusability, the key question is the degree to which systems whose results contributed to creation of the test collection can be fairly compared with other systems that use those privilege judgments in the future. These correspond to measurement error and sampling error, respectively. The results indicate that measurement error is the larger problem.

References

[1]
C. Buckley et al. Bias and the limits of pooling for large collections. Information Retrieval, 10(6), 2007.
[2]
G. Cormack et al. Overview of the TREC 2010 legal track. In TREC, 2010.
[3]
D. Oard et al. Overview of the TREC 2008 legal track. In TREC, 2008.
[4]
K. Sp\"arck Jones et al. Information retrieval test collections. Journal of Documentation, 32(1), 1976.
[5]
E. Voorhees. Variations in relevance judgments & the measurement of retrieval effectiveness. IP & M, 2000.
[6]
W. Webber. Approximate recall confidence intervals. Transactions on Information Systems, 31(1), 2013.
[7]
W. Webber et al. Assessor error in stratified evaluation. In CIKM, 2010.
[8]
E. Yilmaz et al. Estimating average precision with incomplete and imperfect judgments. In CIKM, 2006.
[9]
E. Yilmaz et al. A simple and efficient sampling method for estimating AP & NDCG. In SIGIR, 2008.
[10]
J. Zobel. How reliable are the results of large-scale information retrieval experiments? In SIGIR, 1998.

Cited By

View all
  • (2022)An Empirical Comparison of DistilBERT, Longformer and Logistic Regression for Predictive Coding2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020486(3336-3340)Online publication date: 17-Dec-2022
  • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10300759:5Online publication date: 1-Sep-2022
  • (2022)Comparing Intrinsic and Extrinsic Evaluation of Sensitivity ClassificationAdvances in Information Retrieval10.1007/978-3-030-99739-7_25(215-222)Online publication date: 5-Apr-2022
  • Show More Cited By

Index Terms

  1. Assessing the reliability and reusability of an E-discovery privilege test collection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
    July 2014
    1330 pages
    ISBN:9781450322577
    DOI:10.1145/2600428
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. measurement error
    3. sampling

    Qualifiers

    • Poster

    Funding Sources

    Conference

    SIGIR '14
    Sponsor:

    Acceptance Rates

    SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)An Empirical Comparison of DistilBERT, Longformer and Logistic Regression for Predictive Coding2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020486(3336-3340)Online publication date: 17-Dec-2022
    • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10300759:5Online publication date: 1-Sep-2022
    • (2022)Comparing Intrinsic and Extrinsic Evaluation of Sensitivity ClassificationAdvances in Information Retrieval10.1007/978-3-030-99739-7_25(215-222)Online publication date: 5-Apr-2022
    • (2021)An Empirical Study on Transfer Learning for Privilege Review2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9672008(2729-2733)Online publication date: 15-Dec-2021
    • (2020)A Class of C2 Interpolating SplinesACM Transactions on Graphics10.1145/340030139:5(1-14)Online publication date: 21-Aug-2020
    • (2020)Social Science–guided Feature EngineeringACM Transactions on Intelligent Systems and Technology10.1145/336422211:1(1-27)Online publication date: 9-Jan-2020
    • (2019)Report on the First International Workshop on Incremental Re-computationACM SIGMOD Record10.1145/3335409.333541847:4(35-38)Online publication date: 17-May-2019
    • (2019)SIGMOD 2018 Program Committee Chair's ReportACM SIGMOD Record10.1145/3335409.333541747:4(29-34)Online publication date: 17-May-2019
    • (2019)Data Management Systems Research at TU BerlinACM SIGMOD Record10.1145/3335409.333541547:4(23-28)Online publication date: 17-May-2019
    • (2019)Graph QueriesACM SIGMOD Record10.1145/3335409.333541147:4(5-16)Online publication date: 17-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media