Abstract
This paper describes the design of the first large-scale IR test collection built for the Czech language. The creation of this collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries. All aspects of the collection building are presented, together with some general findings of initial experiments.
This work was supported by projects MSMT LC536, GACR 1ET101470416, MSM0021620838 and NSF IIS-0122466.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Byrne, W., Doermann, D., Franz, M., Gustman, S., Hajič, J., Oard, D., Picheny, M., Psutka, J., Ramabhadran, B., Soergel, D., Ward, T., Zhu, W.J.: Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing 12(4), 420–435 (2004)
Oard, D., Soergel, D., Doermann, D., Huang, X., Murray, G.C., Wang, J., Ramabhadran, B., Franz, M., Gustman, S.: Building an Information Retrieval Test Collection for Spontaneous Conversational Speech. In: Proceedings of SIGIR 2004, Sheffield, UK, pp. 41–48 (2004)
Shafran, I., Byrne, W.: Task-Specific Minimum Bayes-risk Decoding Using Learned Edit Distance. In: Proceedings of ICSLP 2004, Jeju Island, South Korea, pp. 1945–1948 (2004)
Shafran, I., Hall, K.: Corrective Models for Speech Recognition of Inflected Languages. In: Proceedings of EMNLP 2006, Sydney, Australia, pp. 390–398 (2006)
Murray, C., Dorr, B.J., Lin, J., Hajič, J., Pecina, P.: Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation. In: Proceedings of ACL 2006, Sydney, Australia, pp. 945–952 (2006)
Olsson, S., Oard, D., Hajič, J.: Cross-Language Text Classification. In: Proceedings of SIGIR 2005, Salvador, Brazil, pp. 645–646 (2005)
Liu, B., Oard, D.: One-Sided Measures for Evaluating Ranked Retrieval Effectiveness with Spontaneous Conversational Speech. In: Proceedings of SIGIR 2006, Seattle, Washington, USA, pp. 673–674 (2006)
Ircing, P., Oard, D., Hoidekr, J.: First Experiments Searching Spontaneous Czech Speech. In: Proceedings of SIGIR 2007, Amsterdam, The Netherlands (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ircing, P., Pecina, P., Oard, D.W., Wang, J., White, R.W., Hoidekr, J. (2007). Information Retrieval Test Collection for Searching Spontaneous Czech Speech. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_57
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)