Skip to main content

Part of the book series: The Information Retrieval Series ((INRE,volume 29))

Abstract

In this chapter, we report our experiences from attempting to measure the effectiveness of large e-Discovery result sets in the TREC Legal Track campaigns of 2007–2009. For effectiveness measures, we have focused on recall, precision and F 1. We state the estimators that we have used for these measures, and we outline both the rank-based and set-based approaches to sampling that we have taken. We share our experiences with the sampling error in the resulting estimates for the absolute performance on individual topics, relative performance on individual topics, mean performance across topics, and relative performance across topics. Finally, we discuss our experiences with assessor error, which we have found has often had a larger impact than sampling error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Allan J, Carterette B, Dachev B et al. (2008) Million query track 2007 overview. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf

    Google Scholar 

  2. Baron JR (ed) (2007) The Sedona conference® best practices commentary on the use of search and information retrieval methods in e-discovery. Sedona Conf J VIII:189–223

    Google Scholar 

  3. Baron JR, Lewis DD, Oard DW (2007) TREC-2006 legal track overview. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/LEGAL06.OVERVIEW.pdf

    Google Scholar 

  4. Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: SIGIR 2006, pp 619–620

    Chapter  Google Scholar 

  5. Büttcher S, Clarke CLA, Soboroff I (2007) The TREC 2006 terabyte track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf

    Google Scholar 

  6. Carterette B, Soboroff I (2010) The effect of assessor errors on IR system evaluation. In: SIGIR 2010, pp 539–546

    Google Scholar 

  7. Harman DK (2005) The TREC test collections. In: TREC: Experiment and evaluation in information retrieval, pp 21–52

    Google Scholar 

  8. Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 legal track. In: Proceedings of TREC 2009. http://trec-legal.umiacs.umd.edu/LegalOverview09.pdf

    Google Scholar 

  9. Lewis D, Agam G, Argamon S et al. (2006) Building a test collection for complex document information processing. In: SIGIR 2006, pp 665–666

    Chapter  Google Scholar 

  10. Oard DW, Baron JR, Hedin B et al (2010) Evaluation of information retrieval for e-discovery. Artif Intell Law

    Google Scholar 

  11. Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf

    Google Scholar 

  12. Thompson SK (2002) Sampling, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  13. Tomlinson S (2007) Experiments with the negotiated boolean queries of the TREC 2006 legal discovery track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/opentext.legal.final.pdf

    Google Scholar 

  14. Tomlinson S (2008) Experiments with the negotiated boolean queries of the TREC 2007 legal discovery track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf

    Google Scholar 

  15. Tomlinson S (2009) Experiments with the negotiated boolean queries of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/open-text.legal.rev.pdf

    Google Scholar 

  16. Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 legal track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf

    Google Scholar 

  17. van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html

    Google Scholar 

  18. Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Inf Process Manag 36(5):697–716

    Article  Google Scholar 

  19. Voorhees EM, Harman D (1997) Overview of the fifth text retrieval conference (TREC-5). In: Proceedings of TREC-5. http://trec.nist.gov/pubs/trec5/papers/overview.ps.gz

    Google Scholar 

  20. Webber W (2010) Accurate recall confidence intervals for stratified sampling. Manuscript

    Google Scholar 

  21. Webber W, Oard DW, Scholer F, Hedin B (2010) Assessor error in stratified evaluation. In: CIKM 2010, pp 539–548

    Google Scholar 

  22. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: CIKM 2006, pp 102–111

    Chapter  Google Scholar 

  23. Zobel J (1998) How reliable are the results of large-scale information retrieval experiments. In: SIGIR 1998, pp 307–314

    Chapter  Google Scholar 

Download references

Acknowledgements

We thank Doug Oard, William Webber, Jason Baron and the two anonymous reviewers for their helpful remarks on drafts of this chapter. Also, we would like to thank Jason Baron, Doug Oard, Ian Soboroff and Ellen Voorhees for their support and advice in undertaking the various challenges of measuring effectiveness in the TREC Legal Track, and also all of the track contributors and participants without whom the track would not have been possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Tomlinson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tomlinson, S., Hedin, B. (2011). Measuring Effectiveness in the TREC Legal Track. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19231-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19230-2

  • Online ISBN: 978-3-642-19231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics