Skip to main content

The Effect of Score Standardisation on Topic Set Size Design

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

  • 849 Accesses

Abstract

Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/index.html.

  2. 2.

    http://www.f.waseda.jp/tetsuya/CIKM2014/samplesizeANOVA.xlsx.

  3. 3.

    http://www.thuir.cn/ntcirwww/.

References

  1. Aramaki, E., Morita, M., Kano, Y., Ohkuma, T.: Overview of the NTCIR-12 MedNLPDoc task. In: Proceedings of NTCIR-12, pp. 71–75 (2016)

    Google Scholar 

  2. Carterette, B.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM TOIS 30(1) (2012). Article No. 4

    Google Scholar 

  3. Ellis, P.D.: The Essential Guide to Effect Sizes. Cambridge University Press, Cambridge (2010)

    Book  Google Scholar 

  4. Kato, M.P., Sakai, T., Yamamoto, T., Pavlu, V., Morita, H., Fujita, S.: Overview of the NTCIR-12 MobileClick task, pp. 104–114 (2016)

    Google Scholar 

  5. Lodico, M.G., Spaulding, D.T., Voegtle, K.H.: Methods in Educational Research, 2nd edn. Jossey-Bass, San Francisco (2010)

    Google Scholar 

  6. Sakai, T.: How intuitive are diversified search metrics? Concordance test results for the diversity U-measures. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 13–24. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Sakai, T.: A simple and effective approach to score standardisation. In: Proceedings of ACM ICTIR 2016 (2016)

    Google Scholar 

  8. Sakai, T.: Topic set size design. Inf. Retr. 19(3), 256–283 (2016)

    Article  Google Scholar 

  9. Sakai, T., Shang, L.: On estimating variances for topic set size design. In: Proceedings of EVIA 2016 (2016)

    Google Scholar 

  10. Sakai, T., Shang, L., Lu, Z., Li, H.: Topic set size design with the evaluation measures for short text conversation. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 319–331. Springer, Heidelberg (2015). doi:10.1007/978-3-319-28940-3_25

    Chapter  Google Scholar 

  11. Shang, L., Sakai, T., Lu, Z., Li, H., Higashinaka, R., Miyao, Y.: Overview of the NTCIR-12 short text conversation task. In: Proceedings of NTCIR-12, pp. 473–484 (2016)

    Google Scholar 

  12. Shibuki, H., Sakamoto, K., Ishioroshi, M., Fujita, A., Kano, Y., Mitamura, T., Mori, T., Kando, N.: Overview of the NTCIR-12 QA Lab-2 task. In: Proceedings of NTCIR-12, pp. 392–408 (2016)

    Google Scholar 

  13. Smucker, M.D., Clarke, C.L.A.: Time-based calibration of effectiveness measures. In: Proceedings of ACM SIGIR 2012, pp. 95–104 (2012)

    Google Scholar 

  14. Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of ACM SIGIR 2008, pp. 51–58 (2008)

    Google Scholar 

  15. Webber, W., Moffat, A., Zobel, J.: Statistical power in retrieval experimentation. In: Proceedings of ACM CIKM 2008, pp. 571–580 (2008)

    Google Scholar 

Download references

Acknowledgement

We thank the organisers of the NTCIR-12 MedNLPDoc, QALab-2, MobileClick-2, and STC tasks, in particular, Eiji Aramaki, Hideyuki Shibuki, and Makoto P. Kato, for providing us with their topic-by-run matrices of the official results prior to the NTCIR-12 conference.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuya Sakai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Sakai, T. (2016). The Effect of Score Standardisation on Topic Set Size Design. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48051-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48050-3

  • Online ISBN: 978-3-319-48051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics