skip to main content
10.1145/3626772.3657903acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

Unbiased Validation of Technology-Assisted Review for eDiscovery

Published: 11 July 2024 Publication History

Abstract

Although it is well established that recall estimates are valid only when based on independent relevance assessments, and useful only to compare the relative effectiveness of competing methods, these conditions are seldom met when validating eDiscovery efforts in litigation. We present two unbiased validation strategies that embed blind relevance assessments into a technology-assisted review (TAR) process, so as to compare its recall to that which would have been achieved by exhaustive manual review. We illustrate the use of these strategies within the context of TAR occasioned by litigation over accounting practices preceding the collapse of a major insurance company.

References

[1]
S. M. Cohen, E. T. Timkovich J. J. Rosenthal. The tested effectiveness of Equivio>Relevance in technology-assisted review. Winston & Strawn, 2014.
[2]
G. V. Cormack and M. R. Grossman. Scalability of continuous active learning for reliable high-recall text classification. In Proc. of CIKM 2016.
[3]
G. V. Cormack and M. R. Grossman. 2017. Navigating Imprecision in Relevance Assessments on the Road to Total Recall: Roger and Me. In Proc. SIGIR '17, 2017.
[4]
M. R. Grossman and G. V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond J. Law & Tech., 17(3), 2011.
[5]
M. R. Grossman and G.V. Cormack. Inconsistent responsiveness determination in document review: Difference of opinion or human error. Pace L. Rev. 32 (2012): 267.
[6]
M. R. Grossman and G. V. Cormack. Comments on ?The implications of Rule 26 (g) on the use of technology-assisted review". Fed. Courts L. Rev. 1, 2014.
[7]
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. J. Am. Statistical Ass'n, 47(260):663--685, 1952.
[8]
A. Peck. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y., 2015).
[9]
A. Roegiest, G. V. Cormack, C. L. A. Clarke, and M. R. Grossman. Impact of surrogate assessments on high-recall retrieval. In Proc. SIGIR '15, 2015.
[10]
A. Roegiest and G. V. Cormack. Impact of review-set selection on human assessment for text classification. In Proc. SIGIR '16, 2016.
[11]
H. L. Roitblat, A. Kershaw, and P. Oot. Document categorization in legal electronic discovery: Computer classification vs. manual review. J. Am. Soc. for Info. Science and Tech. (JASIST), 61(1):70--80, 2010.
[12]
T. Saracevic, Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance, J. Am. Soc. for Info. Science and Tech. (JASIST), 58(13):1915--1933, 2007.
[13]
M. D. Smucker and C. P. Jethani. Human performance and retrieval precision revisited. In Proc. SIGIR '10, 2010.
[14]
M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Info. Processing & Mgmt., 36(5), 2000.
[15]
W. Webber, D. W. Oard, F. Scholer, and B. Hedin. Assessor error in stratified evaluation. In Proc. of CIKM, 2010.

Index Terms

  1. Unbiased Validation of Technology-Assisted Review for eDiscovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2024
    3164 pages
    ISBN:9798400704314
    DOI:10.1145/3626772
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2024

    Check for updates

    Author Tags

    1. ediscovery
    2. electronic discovery
    3. recall estimation
    4. tar
    5. technology-assisted review

    Qualifiers

    • Short-paper

    Conference

    SIGIR 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 325
      Total Downloads
    • Downloads (Last 12 months)325
    • Downloads (Last 6 weeks)72
    Reflects downloads up to 23 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media