Skip to main content
Log in

E-Discovery revisited: the need for artificial intelligence beyond information retrieval

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

In this work, we provide a broad overview of the distinct stages of E-Discovery. We portray them as an interconnected, often complex workflow process, while relating them to the general Electronic Discovery Reference Model (EDRM). We start with the definition of E-Discovery. We then describe the very positive role that NIST’s Text REtrieval Conference (TREC) has added to the science of E-Discovery, in terms of the tasks involved and the evaluation of the legal discovery work performed. Given the critical nature that data analysis plays at various stages of the process, we present a pyramid model, which complements the EDRM model: for gathering and hosting; indexing; searching and navigating; and finally consolidating and summarizing E-Discovery findings. Next we discuss where the current areas of need and areas of growth appear to be, using one of the field’s most authoritative surveys of providers and consumers of E-Discovery products and services. We subsequently address some areas of Artificial Intelligence, both Information Retrieval-related and not, which promise to make future contributions to the E-Discovery discipline. Some of these areas include data mining applied to e-mail and social networks, classification and machine learning, and the technologies that will enable next generation E-Discovery. The lesson we convey is that the more IR researchers and others understand the broader context of E-Discovery, including the stages that occur before and after primary search, the greater will be the prospects for broader solutions, creative optimizations and synergies yet to be tapped.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Other areas closely affiliated with EDD today include litigation support and management, compliance regulation, Freedom of Information Act (FOIA) inquiries, and Homeland Security initiatives.

  2. http://www.westlaw.com.

  3. http://www.lexis.com.

  4. Relevant resources are in fact provided throughout the paper, but are provided in Sect. 9 as sources of evidence for statements made about the field in terms of its legal, technological and commercial dimensions.

  5. Another point that further substantiates the iterative nature of EDD is the negotiated exchange process that takes place between the two sides in many complex lawsuits, where what is “discoverable” is hotly contested, typically by the responding party, and often under the close supervision of the court.

  6. Requestor generally refers to the party of the plaintiff that is requesting the electronically stored information (ESI), while responder generally refers to the party of the defendant that is obliged to hand over the requested ESI, assuming it is within reason and the scope of the negotiated querying.

  7. Cf: Sect. 1’s discussion of perceptions and potential misconceptions of the field.

  8. The double arrow is used to indicate that the suing party has both access to the EDD repository and withdraws information and materials from it.

  9. To use the language introduced in the Buckley discussion above, in some scenarios, these would be termed the collection of the requestor and the collection of the responder, respectively.

  10. http://en.wikipedia.org/wiki/Enron_scandal;

    http://en.wikipedia.org/wiki/Timeline_of_the_Enron_scandal.

  11. For instance, beyond the dotted rectangles in Figs. 1 and 2.

  12. To some extent, the Information Retrieval field has been a victim of its own success. That is, its early successes permitted the field to avoid diversifying into specific domains and addressing detailed user-centric needs. It has delivered clear successes in fields like Web-based search as well as traditional legal document search, and in fields where there may be more than a single acceptable solution. Yet IR initially found that treating a problem like E-Discovery as a stand alone IR application was satisfactory, rather than seeing what IR could contribute to broader existing approaches to the distinct problem space. Early legal IR provides an illustration, where result sets were lists of documents, at best ranked by probable relevance, yet showing no awareness of the role negative indirect history was required to play (e.g., “The court declines to extend this decision.”) (Thompson et al. 1994). Similarly, TREC’s initial forays into the legal domain essentially cast the EDD problem as an ad hoc search problem (i.e., single-pass, automatic) (Baron and Thompson 2007). And to some extent, it was, though human reviewers typically make the final determination regarding relevance. Yet E-Discovery is not so much an exclusive search problem as traditional or ad hoc IR is; rather, it is one approach to help solve the complex set of problems posed by EDD which are discussed in Sects. 2 and 7.

  13. The first three of these—data mining, machine learning, classification—are ostensibly distinct from and thus “beyond IR” while the last, concept search, is admittedly an advanced development within the field of Information Retrieval.

  14. Concerning the EDRM model’s Analysis stage, on the subject of Tools and Technology, the Model page states “There are many different tools and technologies available to assist in this process. This is largely a research process and ultimately the decision made regarding the application of any criteria developed will be a legal decision made by Counsel.”

  15. There may be some irony in this pattern insofar as the organizations often most capable of conducting an E-Discovery project across multiple stages are precisely those larger more established providers.

  16. Concept search is obviously another evolving technology that challenges the boundaries of conventional litigation tools (Chaplin and Jytyla 2009). Because we view it as an extension to primary search, we do not treat it as an enabling technology, but as a significant improvement to an existing one.

  17. That is, not relevant in the sense that the documents at hand do not answer the question posed by the opposing counsel and investigated in the pool of discoverable materials.

  18. A legal hold is a process that an organization uses to preserve all forms of relevant information when litigation is reasonably anticipated. By extension, anticipatory E-Discovery refers to just such actions when litigation against an organization is expected and is being prepared for.

  19. The International Association of Artificial Intelligence and Law (IAAIL), http://www.iaail.org.

References

  • Ashley KD, Bridewell W (2010) Emerging AI & Law approaches to automating analysis and retrieval of electronically stored information in discovery proceedings. In: Artificial intelligence and law special issue on E-Discovery (This issue)

  • Barnett T, Godjevac S, Renders J-M, Privault C, Schneider J, Wickstrom R (2009) Machine learning classification for document review. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona

  • Baron JR, Lewis DD, Oard DW (2006) TREC 2006 legal track overview. In: The fifteenth Text REtrieval Conference proceedings (TREC 2006), Gaithersburg, MD, Nov 2006. National Institute of Standards and Technology (NIST), USA

  • Baron JR, Thompson P (2007) The search problem posed by large heterogeneous data sets in litigation: possible future approaches to research. In: Proceedings of the 11th international conference on artificial intelligence and law (ICAIL07). ACM Press, Palo Alto, CA

  • Baron JR (2008) Panning for gold in E-discovery: what every information scientist should know about how lawyers search for electronic evidence. In: CIKM panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (Oct. CIKM Web site)

  • Barsocchini A (2005) Electronic discovery primer, in law technology News, 28 Aug 2005

  • Bauer RS, Jade T, Hedin B, Hogan C (2008) Automated legal sensemaking: the centrality of relevance and intentionality. In: Proceedings of the second international workshop on supporting search and sensemaking for electronically stored information in discovery at the international conference on digital evidence (ICDE 2008, DESI Workshop). DESI Press, UK

  • Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document retrieval system. In: Communications of the ACM, 28(3). ACM Press, New York, pp 289–299

  • Bobrow DG, King TH, Lee LC (2007) Enhancing legal discovery with linguistic processing. In: Proceedings of the first international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings at the 11th international conference on artificial intelligence and law (ICAIL07 DESI Workshop, Stanford University). DESI Press, CA

  • Buckley C (2008) IR perspectives on the E-discovery problems. In: CIKM Panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (CIKM Web Site)

  • Chaplin D, Jytyla R (2009) Conceptual search technology: avoid sanctions, prevent privilege waiver, and understand your data. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona

  • Conrad JG (2007) E-Discovery revisited: a broader perspective for IR researchers. In: Proceedings of the first international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings at the 11th international conference on artificial intelligence and law (ICAIL07 DESI Workshop, Stanford University). DESI Press, CA

  • Cormack GV, Mojdeh M (2009) Machine learning for information retrieval: TREC 2009 Web, relevance feedback and legal tracks. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USA

  • Counsel C (2006) The American Bar Association (ABA), section of litigation, committee on Corporate Counsel. http://www.abanet.org/litigation/committees/corporate/

  • Evans DA (2008) Why E-Discovery is a CIKM-hard problem. In: CIKM Panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (Oct. CIKM Web site)

  • Evans S (2009) E-discovery market set for 2010 boom: Gartner, http://www.cbronline.com/news. 16 Dec 2009

  • Fios (2010) Discovery resources web site. Resources and news about E-discovery: http://www.discoveryresources.com

  • Hedin B, Tomlinson S, Baron JR, Oard DW (2009) Overview of the TREC 2009 legal track. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USA

  • Henseler H (2010) Network-based filtering for large e-mail collections in E-discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)

  • Hogan C, Brassil D, Rugani SM, Reinhart J, Gerber M, Jade T (2008) H5 at TREC 2008 legal interactive: user modeling, assessment & measurement. In: The seventeenth Text REtrieval Conference proceedings (TREC 2008), Gaithersburg, MD, Nov 2008. National Institute of Standards and Technology (NIST), USA

  • Hogan C, Bauer R, Brassil D (2010) Human-aided computer cognition for E-discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)

  • Isaza J, Jablonski JJ (2010) Legal holds: define the scope: the third article in a series aimed at helping organizations discharge their duty to preserve ESI. From Law.com, http://www.almdc.com/jsp/lawtechnologynews. 26 Feb 2010

  • Klimt B, Yang Y (2004) A new dataset: the Enron Corpus. In: ECML, pp 217–226

  • Lang JP, Baffa J (2010) Electronic discovery: an overview and practical pointers. “Firm News and Activities”. Bates & Carey LLP Web site. http://www.batescarey.com/newsandarticles/electronicdiscovery.asp

  • Law.com (2010) Web-based legal news and information network: http://www.Law.com

  • Lewis DD, Agam G, Argamon S, Frieder O, Grossman DA, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR06). ACM Press, New York, pp 665–666

  • Losey R (2009) Jason Baron on search—how do you find anything when you have a billion emails? “E-Discovery Team” blog. http://www.e-discoveryteam.com, March 4 2009

  • Illinois Institute of Technology (IIT) (2006) Complex document information processing (CDIP) 1.0 collection. Master Settlement Agreement (MSA) subcollection of the Legacy Tobacco Documents Library (LTDL)

  • Oard DW, Hedin B, Tomlinson S, Baron JR (2008) Overview of the TREC 2008 legal track. In: The seventeenth Text REtrieval Conference proceedings (TREC 2008), Gaithersburg, MD, Nov 2008. National Institute of Standards and Technology (NIST), USA

  • Oard DW, Baron JR, Hedin B, Lewis DD, Tomlinson S (2010) Evaluation of information retrieval for E-Discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)

  • Radding A (2006) The forecast for EDD, special to Law.com, Nov 15 2006

  • Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs manual review. J Am Soc Info Sci Techn (JASIST), Wiley: Hoboken, NJ 61(1):70–80

    Google Scholar 

  • Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage Publications, London

    Google Scholar 

  • SearchFinancialSecurity.com (2009) Definitions: Electronic Discovery. http://searchfinancialsecurity.techtarget.com

  • Search Security (2010) IT site to keep corporate data and assets secure: http://www.SearchSecurity.com

  • The Sedona Conference (2009) Commentary on achieving quality in the E-Discovery process. Working group on best practices for document retention & production. Public Comment Version, May

  • The Sedona Conference (2010) Facilitates discussion among legal experts on topics like complex litigation: http://www.theSedonaConference.org

  • Socha G, Gelbmann T (2006) The 2006 Socha-Gelbmann electronic discovery survey report. Socha Consulting LLC and Gelbmann & Associates, MN

  • Socha G, Gelbmann T (2008) The 2008 Socha-Gelbmann 6th annual electronic discovery survey. Socha Consulting LLC and Gelbmann & Associates, MN

    Google Scholar 

  • Socha G, Gelbmann T (2009a) Strange times, a summary of the 2009 Socha-Gelbmann 7th annual electronic discovery survey. Socha Consulting LLC and Gelbmann & Associates, MN. Law technology news, Law.com, 1 Aug 2009

  • Socha G, Gelbmann T (2009b) Electronic discovery reference model. edrm.net

  • Socha G (2008) Description of the electronic discovery reference model. Interview by Kenna Kim, PivotalDiscovery.com (Part 3) [on http://www.YouTube.com] At 12th Annual Thomson Reuters E-Discovery and Records Retention Conference. San Francisco, CA, Dec

  • Sterenzy T (2009) Equivio at TREC 2009 legal interactive. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USA

  • Third International Workshop on Supporting Search and Sensemaking for Electronically Stored Information in Discovery (DESI3) (2009) Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona. http://www.law.pitt.edu/DESI3_Workshop/

  • Thompson P, Turtle HR, Yang B, Flood J (1994) TREC-3 ad hoc retrieval and routing experiments using the WIN system. In: The third Text REtrieval Conference proceedings (TREC 1994), Gaithersburg, MD, Nov 1994. National Institute of Standards and Technology (NIST), USA

  • Tomlinson S, Oard DW, Baron JR, Thompson P (2007) Overview of the TREC 2007 legal track. In: The sixteenth Text REtrieval Conference proceedings (TREC 2007), Gaithersburg, MD, Nov 2007. National Institute of Standards and Technology (NIST), USA

  • Voorhees EM (2007) Overview of TREC 2007. In: The sixteenth Text REtrieval Conference proceedings (TREC 2007), Gaithersburg, MD, Nov 2007. National Institute of Standards and Technology (NIST), USA

  • Voorhees EM, Buckland LP (eds) (2008) Proceedings of the seventeenth Text REtrieval Conferences (TREC 2008), Gaithersburg MD, Nov 2008. National Institute of Standards and Technology (NIST), USA

  • Zhao FC, Oard DW, Baron JR (2009) Improving search effectiveness in the legal e-discovery process using relevance feedback. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona

Download references

Acknowledgements

We thank Peter Jackson and Khalid Al-Kofahi for the time and resources to pursue this study. And we are grateful to Marc Light for his review of this work and his recommendations for its increased clarity. We also thank the formal reviewers of this work for their comments and suggestions for its improvement. Finally, we wish to thank Kevin Ashley and Jason Baron for their support and feedback on this expanded work, as well as for their numerous contributions to the First and Third DESI Workshops, the former from which this paper was germinated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jack G. Conrad.

Additional information

This article represents an expanded version of a shorter work that first appeared in the Proceedings of the DESI I Workshop on E-Discovery that was co-located with the 11th International Conference on Artificial Intelligence and Law (ICAIL 2007) at Stanford University (Conrad 2007).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conrad, J.G. E-Discovery revisited: the need for artificial intelligence beyond information retrieval. Artif Intell Law 18, 321–345 (2010). https://doi.org/10.1007/s10506-010-9096-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-010-9096-6

Keywords

Navigation