The Effect of Score Standardisation on Topic Set Size Design

Sakai, Tetsuya

doi:10.1007/978-3-319-48051-0_2

Tetsuya Sakai²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Included in the following conference series:

Asia Information Retrieval Symposium

849 Accesses

Abstract

Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aramaki, E., Morita, M., Kano, Y., Ohkuma, T.: Overview of the NTCIR-12 MedNLPDoc task. In: Proceedings of NTCIR-12, pp. 71–75 (2016)
Google Scholar
Carterette, B.: Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM TOIS 30(1) (2012). Article No. 4
Google Scholar
Ellis, P.D.: The Essential Guide to Effect Sizes. Cambridge University Press, Cambridge (2010)
Book Google Scholar
Kato, M.P., Sakai, T., Yamamoto, T., Pavlu, V., Morita, H., Fujita, S.: Overview of the NTCIR-12 MobileClick task, pp. 104–114 (2016)
Google Scholar
Lodico, M.G., Spaulding, D.T., Voegtle, K.H.: Methods in Educational Research, 2nd edn. Jossey-Bass, San Francisco (2010)
Google Scholar
Sakai, T.: How intuitive are diversified search metrics? Concordance test results for the diversity U-measures. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 13–24. Springer, Heidelberg (2013)
Chapter Google Scholar
Sakai, T.: A simple and effective approach to score standardisation. In: Proceedings of ACM ICTIR 2016 (2016)
Google Scholar
Sakai, T.: Topic set size design. Inf. Retr. 19(3), 256–283 (2016)
Article Google Scholar
Sakai, T., Shang, L.: On estimating variances for topic set size design. In: Proceedings of EVIA 2016 (2016)
Google Scholar
Sakai, T., Shang, L., Lu, Z., Li, H.: Topic set size design with the evaluation measures for short text conversation. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 319–331. Springer, Heidelberg (2015). doi:10.1007/978-3-319-28940-3_25
Chapter Google Scholar
Shang, L., Sakai, T., Lu, Z., Li, H., Higashinaka, R., Miyao, Y.: Overview of the NTCIR-12 short text conversation task. In: Proceedings of NTCIR-12, pp. 473–484 (2016)
Google Scholar
Shibuki, H., Sakamoto, K., Ishioroshi, M., Fujita, A., Kano, Y., Mitamura, T., Mori, T., Kando, N.: Overview of the NTCIR-12 QA Lab-2 task. In: Proceedings of NTCIR-12, pp. 392–408 (2016)
Google Scholar
Smucker, M.D., Clarke, C.L.A.: Time-based calibration of effectiveness measures. In: Proceedings of ACM SIGIR 2012, pp. 95–104 (2012)
Google Scholar
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of ACM SIGIR 2008, pp. 51–58 (2008)
Google Scholar
Webber, W., Moffat, A., Zobel, J.: Statistical power in retrieval experimentation. In: Proceedings of ACM CIKM 2008, pp. 571–580 (2008)
Google Scholar

Download references

Acknowledgement

We thank the organisers of the NTCIR-12 MedNLPDoc, QALab-2, MobileClick-2, and STC tasks, in particular, Eiji Aramaki, Hideyuki Shibuki, and Makoto P. Kato, for providing us with their topic-by-run matrices of the official results prior to the NTCIR-12 conference.

Author information

Authors and Affiliations

Waseda University, Tokyo, Japan
Tetsuya Sakai

Authors

Tetsuya Sakai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuya Sakai .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Shaoping Ma
Renmin University of China , Beijing, China
Ji-Rong Wen
Tsinghua University , Beijing, China
Yiqun Liu
Renmin University of China , Beijing, China
Zhicheng Dou
Tsinghua University , Beijing, China
Min Zhang
Yahoo Labs , Sunnyvale, California, USA
Yi Chang
Renmin University of China , Beijing, China
Xin Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakai, T. (2016). The Effect of Score Standardisation on Topic Set Size Design. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-48051-0_2
Published: 15 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48050-3
Online ISBN: 978-3-319-48051-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics