Skip to main content

Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10456))

Abstract

The high-variability in content and structure combined with transcription errors makes effective information retrieval (IR) from archives of spoken user generated content (UGC) very challenging. Previous research has shown that using passage-level evidence for query expansion (QE) in IR can be beneficial for improving search effectiveness. Our investigation of passage-level QE for a large Internet collection of UGC demonstrates that while it is effective for this task, the informal and variable nature of UGC means that different queries respond better to alternative types of passages or in some cases use of whole documents rather than extracted passages. We investigate the use of Query Performance Prediction (QPP) to select the appropriate passage type for each query, including the introduction of a novel Weighted Expansion Gain (WEG) as a QPP new method. Our experimental investigation using an extended adhoc search task based on the MediaEval 2012 Search task shows the superiority of using our proposed adaptive QE approach for retrieval. The effectiveness of this method is shown in a per-query evaluation of utilising passage and full document evidence for QE within the inconsistent, uncertain settings of UGC retrieval.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://archives.limsi.fr/tlp/topic6.html.

  2. 2.

    http://www.multimediaeval.org/mediaeval2012/.

  3. 3.

    http://www.mturk.com/.

  4. 4.

    https://code.google.com/p/uima-text-passageer/.

  5. 5.

    Confirmed by running query-level paired t-test comparison at the 0.05 confidence level [21].

References

  1. Allan, J.: Relevance feedback with too much data. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 337–343. ACM (1995)

    Google Scholar 

  2. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness (2002)

    Google Scholar 

  3. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval (2012)

    Google Scholar 

  4. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000 (2000)

    Google Scholar 

  5. Eskevich, M.: Towards effective retrieval of spontaneous conversational spoken content. Ph.D. thesis, Dublin City University (2014)

    Google Scholar 

  6. Eskevich, M., Jones, G.J.F., Wartena, C., Larson, M., Aly, R., Verschoor, T., Ordelman, R.: Comparing retrieval effectiveness of alternative content segmentation methods for internet video search. In: 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2012)

    Google Scholar 

  7. Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: a success story. In: Proceedings of RIAO 2000, pp. 1–8 (2000)

    Google Scholar 

  8. Gianni, A.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Department of Computing Science, University of Glasgow (2003)

    Google Scholar 

  9. Gu, Z., Luo, M.: Comparison of using passages and documents for blind relevance feedback in information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 482–483. ACM (2004)

    Google Scholar 

  10. He, B., Ounis, I.: Studying query expansion effectiveness. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 611–619. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00958-7_57

    Chapter  Google Scholar 

  11. Khwileh, A., Ganguly, D., Jones, G.J.: Utilisation of metadata fields and query expansion in cross-lingual search of user-generated internet video (2016)

    Google Scholar 

  12. Khwileh, A., Jones, G.J.: Investigating segment-based query expansion for user-generated spoken content retrieval. In: 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2016)

    Google Scholar 

  13. Kurland, O., Shtok, A., Hummel, S., Raiber, F., Carmel, D., Rom, O.: Back to the roots: a probabilistic framework for query-performance prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 823–832. ACM (2012)

    Google Scholar 

  14. Lam-Adesina, A.M., Jones, G.J.F.: Dublin City University at CLEF 2005: cross-language speech retrieval (CL-SR) experiments. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 792–799. Springer, Heidelberg (2006). doi:10.1007/11878773_87

    Chapter  Google Scholar 

  15. Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 375–382. ACM (2002)

    Google Scholar 

  16. Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214. ACM (1998)

    Google Scholar 

  17. Larson, M., Jones, G.J.F.: Spoken content retrieval: a survey of techniques and technologies (2011)

    Google Scholar 

  18. Pecina, P., Hoffmannová, P., Jones, G.J.F., Zhang, Y., Oard, D.W.: Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 674–686. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85760-0_86

    Chapter  Google Scholar 

  19. Schmiedeke, S., Xu, P., Ferrané, I., Eskevich, M., Kofler, C., Larson, M.A., Estève, Y., Lamel, L., Jones, G.J.F., Sikora, T.: Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM Multimedia Systems Conference, pp. 96–101. ACM (2013)

    Google Scholar 

  20. Shtok, A., Kurland, O., Carmel, D.: Predicting query performance by query-drift estimation. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04417-5_30

    Chapter  Google Scholar 

  21. Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 623–632. ACM (2007)

    Google Scholar 

  22. Terol, R.M., Palomar, M., Martinez-Barco, P., Llopis, F., Muñoz, R., Noguera, E.: The University of Alicante at CL-SR track. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 769–772. Springer, Heidelberg (2006). doi:10.1007/11878773_84

    Chapter  Google Scholar 

  23. Wang, J., Oard, D.W.: CLEF-2005 CL-SR at Maryland: document and query expansion using side collections and thesauri. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 800–809. Springer, Heidelberg (2006). doi:10.1007/11878773_88

    Chapter  Google Scholar 

  24. Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 543–550. ACM (2007)

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by Science Foundation Ireland in the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Khwileh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Khwileh, A., Way, A., Jones, G.J.F. (2017). Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction. In: Jones, G., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2017. Lecture Notes in Computer Science(), vol 10456. Springer, Cham. https://doi.org/10.1007/978-3-319-65813-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65813-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65812-4

  • Online ISBN: 978-3-319-65813-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics