CLEF-2005 CL-SR at Maryland: Document and Query Expansion Using Side Collections and Thesauri

Wang, Jianqiang; Oard, Douglas W.

doi:10.1007/11878773_88

Jianqiang Wang²⁴ &
Douglas W. Oard²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4022))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

419 Accesses

Abstract

This paper reports results for the University of Maryland’s participation in the CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) cross-language speech retrieval using translation knowledge obtained from the statistics of a large parallel corpus. The results show that document expansion and query expansion using blind relevance feedback were effective, although optimal parameter choices differed somewhat between the training and evaluation sets. Document expansion in which manually assigned keywords were augmented with thesaurus synonyms yielded marginal gains on the training set, but no improvement on the evaluation set. Cross-language retrieval with French queries yielded 79% of monolingual mean average precision when searching manually assigned metadata despite a substantial domain mismatch between the parallel corpus and the retrieval task. Detailed failure analysis indicates that speech recognition errors for named entities were an important factor that substantially degraded retrieval effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval

Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

References

Allan, J.: Perspectives on information retrieval and speech. In: Information Retrieval Techniques for Speech Applications, pp. 1–10. Springer, London (2001)
Google Scholar
Darwish, K., Oard, D.W.: Probabilistic structured query methods. In: Proceedings of the 21st Annual 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2003, pp. 338–344. ACM Press, New York (2003)
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.E.: The TREC spoken document retrieval track: A successful story. In: Proceedings of the Nineth Text REtrieval Conference (TREC-9) (2000), http://trec.nist.dov
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (unpublished draft, 2002)
Google Scholar
Oard, D.W., Soergel, D., Doermann, D., Huang, X., Murray, G.C., Wang, J., Ramabhadran, B., Franz, M., Gustman, S., Mayfield, J., Kharevych, L., Strassel, S.: Building an information retrieval test collection for spontaneous conversational speech. In: Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–38 (2004)
Google Scholar
Robertson, S.E., Sparck-Jones, K.: Simple proven approaches to text retrieval. Cambridge University Computer Laboratory (1997)
Google Scholar
Singhal, A., Choi, J., Hindle, D., Pereira, F.: ATT at TREC-7. In: The Seventh Text REtrieval Conference, pp. 239–252 (November 1998), http://trec.nist.gov
Singual, A., Pereira, F.: Document expansion for speech retrieval. In: Proceedings of the 22st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 34–41. ACM Press, New York (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Studies and UMIACS, University of Maryland, College Park, MD, 20742, USA
Jianqiang Wang & Douglas W. Oard

Authors

Jianqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Oard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISTI-CNR, Area di Ricerca, Pisa, Italy
Carol Peters
University of California, Berkeley, CA, USA
Fredric C. Gey
No Affiliations,
Julio Gonzalo
Business Information Systems, University of Applied Sciences, Sierre, Switzerland
Henning Müller
Centre for Digital Video Processing & School of Computing, Dublin City University, Dublin 9, Ireland
Gareth J. F. Jones
German Institute for International and Security Affairs, Stiftung Wissenschaft und Politik (SWP), Ludwigkirchplatz 3-4, 10719, Berlin, Germany
Michael Kluck
ITC-IRST, Trento, Italy
Bernardo Magnini
ISLA, University of Amsterdam,
Maarten de Rijke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Oard, D.W. (2006). CLEF-2005 CL-SR at Maryland: Document and Query Expansion Using Side Collections and Thesauri. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_88

Download citation

DOI: https://doi.org/10.1007/11878773_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45697-1
Online ISBN: 978-3-540-45700-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics