Measuring Effectiveness in the TREC Legal Track

Tomlinson, Stephen; Hedin, Bruce

doi:10.1007/978-3-642-19231-9_8

Stephen Tomlinson⁵ &
Bruce Hedin⁶

Part of the book series: The Information Retrieval Series ((INRE,volume 29))

1621 Accesses

Abstract

In this chapter, we report our experiences from attempting to measure the effectiveness of large e-Discovery result sets in the TREC Legal Track campaigns of 2007–2009. For effectiveness measures, we have focused on recall, precision and F ₁. We state the estimators that we have used for these measures, and we outline both the rank-based and set-based approaches to sampling that we have taken. We share our experiences with the sampling error in the resulting estimates for the absolute performance on individual topics, relative performance on individual topics, mean performance across topics, and relative performance across topics. Finally, we discuss our experiences with assessor error, which we have found has often had a larger impact than sampling error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Measuring Effectiveness in the TREC Legal Track

How to Run an Evaluation Task

Fewer topics? A million topics? Both?! On topics subsets in test collections

Article 08 May 2019

References

Allan J, Carterette B, Dachev B et al. (2008) Million query track 2007 overview. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf
Google Scholar
Baron JR (ed) (2007) The Sedona conference^® best practices commentary on the use of search and information retrieval methods in e-discovery. Sedona Conf J VIII:189–223
Google Scholar
Baron JR, Lewis DD, Oard DW (2007) TREC-2006 legal track overview. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/LEGAL06.OVERVIEW.pdf
Google Scholar
Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: SIGIR 2006, pp 619–620
Chapter Google Scholar
Büttcher S, Clarke CLA, Soboroff I (2007) The TREC 2006 terabyte track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf
Google Scholar
Carterette B, Soboroff I (2010) The effect of assessor errors on IR system evaluation. In: SIGIR 2010, pp 539–546
Google Scholar
Harman DK (2005) The TREC test collections. In: TREC: Experiment and evaluation in information retrieval, pp 21–52
Google Scholar
Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 legal track. In: Proceedings of TREC 2009. http://trec-legal.umiacs.umd.edu/LegalOverview09.pdf
Google Scholar
Lewis D, Agam G, Argamon S et al. (2006) Building a test collection for complex document information processing. In: SIGIR 2006, pp 665–666
Chapter Google Scholar
Oard DW, Baron JR, Hedin B et al (2010) Evaluation of information retrieval for e-discovery. Artif Intell Law
Google Scholar
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf
Google Scholar
Thompson SK (2002) Sampling, 2nd edn. Wiley, New York
MATH Google Scholar
Tomlinson S (2007) Experiments with the negotiated boolean queries of the TREC 2006 legal discovery track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/opentext.legal.final.pdf
Google Scholar
Tomlinson S (2008) Experiments with the negotiated boolean queries of the TREC 2007 legal discovery track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf
Google Scholar
Tomlinson S (2009) Experiments with the negotiated boolean queries of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/open-text.legal.rev.pdf
Google Scholar
Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 legal track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf
Google Scholar
van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html
Google Scholar
Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Inf Process Manag 36(5):697–716
Article Google Scholar
Voorhees EM, Harman D (1997) Overview of the fifth text retrieval conference (TREC-5). In: Proceedings of TREC-5. http://trec.nist.gov/pubs/trec5/papers/overview.ps.gz
Google Scholar
Webber W (2010) Accurate recall confidence intervals for stratified sampling. Manuscript
Google Scholar
Webber W, Oard DW, Scholer F, Hedin B (2010) Assessor error in stratified evaluation. In: CIKM 2010, pp 539–548
Google Scholar
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: CIKM 2006, pp 102–111
Chapter Google Scholar
Zobel J (1998) How reliable are the results of large-scale information retrieval experiments. In: SIGIR 1998, pp 307–314
Chapter Google Scholar

Download references

Acknowledgements

We thank Doug Oard, William Webber, Jason Baron and the two anonymous reviewers for their helpful remarks on drafts of this chapter. Also, we would like to thank Jason Baron, Doug Oard, Ian Soboroff and Ellen Voorhees for their support and advice in undertaking the various challenges of measuring effectiveness in the TREC Legal Track, and also all of the track contributors and participants without whom the track would not have been possible.

Author information

Authors and Affiliations

Open Text Corporation, Ottawa, Ontario, Canada
Stephen Tomlinson
H5, 71 Stevenson St., San Francisco, CA, 94105, USA
Bruce Hedin

Authors

Stephen Tomlinson
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Hedin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Tomlinson .

Editor information

Editors and Affiliations

Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
Mihai Lupu
Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
Katja Mayer
Information Retrieval Facility, Donau-City Straße 1, Vienna, 1220, Austria
John Tait
3LP Advisors, Post Rd. 7003, Dublin, 43016, Ohio, USA
Anthony J. Trippe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tomlinson, S., Hedin, B. (2011). Measuring Effectiveness in the TREC Legal Track. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-19231-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19230-2
Online ISBN: 978-3-642-19231-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measuring Effectiveness in the TREC Legal Track

Abstract

Access this chapter

Similar content being viewed by others

Measuring Effectiveness in the TREC Legal Track

How to Run an Evaluation Task

Fewer topics? A million topics? Both?! On topics subsets in test collections

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Measuring Effectiveness in the TREC Legal Track

Abstract

Access this chapter

Similar content being viewed by others

Measuring Effectiveness in the TREC Legal Track

How to Run an Evaluation Task

Fewer topics? A million topics? Both?! On topics subsets in test collections

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation