Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

Khwileh, Ahmad; Way, Andy; Jones, Gareth J. F.

doi:10.1007/978-3-319-65813-1_4

Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

Ahmad Khwileh²¹,
Andy Way²¹ &
Gareth J. F. Jones²¹

Conference paper
First Online: 17 August 2017

981 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

Abstract

The high-variability in content and structure combined with transcription errors makes effective information retrieval (IR) from archives of spoken user generated content (UGC) very challenging. Previous research has shown that using passage-level evidence for query expansion (QE) in IR can be beneficial for improving search effectiveness. Our investigation of passage-level QE for a large Internet collection of UGC demonstrates that while it is effective for this task, the informal and variable nature of UGC means that different queries respond better to alternative types of passages or in some cases use of whole documents rather than extracted passages. We investigate the use of Query Performance Prediction (QPP) to select the appropriate passage type for each query, including the introduction of a novel Weighted Expansion Gain (WEG) as a QPP new method. Our experimental investigation using an extended adhoc search task based on the MediaEval 2012 Search task shows the superiority of using our proposed adaptive QE approach for retrieval. The effectiveness of this method is shown in a per-query evaluation of utilising passage and full document evidence for QE within the inconsistent, uncertain settings of UGC retrieval.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://archives.limsi.fr/tlp/topic6.html.
2.
http://www.multimediaeval.org/mediaeval2012/.
3.
http://www.mturk.com/.
4.
https://code.google.com/p/uima-text-passageer/.
5.
Confirmed by running query-level paired t-test comparison at the 0.05 confidence level [21].

References

Allan, J.: Relevance feedback with too much data. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 337–343. ACM (1995)
Google Scholar
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness (2002)
Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval (2012)
Google Scholar
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000 (2000)
Google Scholar
Eskevich, M.: Towards effective retrieval of spontaneous conversational spoken content. Ph.D. thesis, Dublin City University (2014)
Google Scholar
Eskevich, M., Jones, G.J.F., Wartena, C., Larson, M., Aly, R., Verschoor, T., Ordelman, R.: Comparing retrieval effectiveness of alternative content segmentation methods for internet video search. In: 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2012)
Google Scholar
Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: a success story. In: Proceedings of RIAO 2000, pp. 1–8 (2000)
Google Scholar
Gianni, A.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Department of Computing Science, University of Glasgow (2003)
Google Scholar
Gu, Z., Luo, M.: Comparison of using passages and documents for blind relevance feedback in information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 482–483. ACM (2004)
Google Scholar
He, B., Ounis, I.: Studying query expansion effectiveness. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 611–619. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_57
Chapter Google Scholar
Khwileh, A., Ganguly, D., Jones, G.J.: Utilisation of metadata fields and query expansion in cross-lingual search of user-generated internet video (2016)
Google Scholar
Khwileh, A., Jones, G.J.: Investigating segment-based query expansion for user-generated spoken content retrieval. In: 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2016)
Google Scholar
Kurland, O., Shtok, A., Hummel, S., Raiber, F., Carmel, D., Rom, O.: Back to the roots: a probabilistic framework for query-performance prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 823–832. ACM (2012)
Google Scholar
Lam-Adesina, A.M., Jones, G.J.F.: Dublin City University at CLEF 2005: cross-language speech retrieval (CL-SR) experiments. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 792–799. Springer, Heidelberg (2006). doi:10.1007/11878773_87
Chapter Google Scholar
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 375–382. ACM (2002)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214. ACM (1998)
Google Scholar
Larson, M., Jones, G.J.F.: Spoken content retrieval: a survey of techniques and technologies (2011)
Google Scholar
Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85760-0_86
Chapter Google Scholar
Schmiedeke, S., Xu, P., Ferrané, I., Eskevich, M., Kofler, C., Larson, M.A., Estève, Y., Lamel, L., Jones, G.J.F., Sikora, T.: Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM Multimedia Systems Conference, pp. 96–101. ACM (2013)
Google Scholar
Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04417-5_30
Chapter Google Scholar
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM (2007)
Google Scholar
Terol, R.M., Palomar, M., Martinez-Barco, P., Llopis, F., Muñoz, R., Noguera, E.: The University of Alicante at CL-SR track. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 769–772. Springer, Heidelberg (2006). doi:10.1007/11878773_84
Chapter Google Scholar
Wang, J., Oard, D.W.: CLEF-2005 CL-SR at Maryland: document and query expansion using side collections and thesauri. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 800–809. Springer, Heidelberg (2006). doi:10.1007/11878773_88
Chapter Google Scholar
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 543–550. ACM (2007)
Google Scholar

Download references

Acknowledgments

This research was partially supported by Science Foundation Ireland in the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University.

Author information

Authors and Affiliations

School of Computing, ADAPT Centre, Dublin City University, Dublin 9, Ireland
Ahmad Khwileh, Andy Way & Gareth J. F. Jones

Authors

Ahmad Khwileh
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Khwileh .

Editor information

Editors and Affiliations

Dublin City University, Dublin, Ireland
Gareth J.F. Jones
Trinity College Dublin, Dublin, Ireland
Séamus Lawless
National University of Distance Education, Madrid, Spain
Julio Gonzalo
Dublin City University, Dublin, Ireland
Liadh Kelly
Université Grenoble Alpes, Grenoble, France
Lorraine Goeuriot
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khwileh, A., Way, A., Jones, G.J.F. (2017). Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-65813-1_4
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65812-4
Online ISBN: 978-3-319-65813-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics