Skip to main content

About Sound and Vision: CLEF Beyond Text Retrieval Tasks

  • Chapter
  • First Online:
  • 688 Accesses

Part of the book series: The Information Retrieval Series ((INRE,volume 41))

Abstract

CLEF was initiated with intention of providing a catalyst to research in Cross-Language Information Retrieval (CLIR) and Multilingual Information Retrieval (MIR). Focusing principally on European languages, it initially provided CLIR benchmark tasks to the research community within an annual cycle of task design, conduct and reporting. While the early focus was on textual data, the emergence of technologies to enable collection, archiving and content processing of multimedia content led to several initiatives which sought to address search for spoken and visual content. Similar to the interest in multilingual search for text, interest arose in working multilingually with multimedia content. To support research in these areas CLEF introduced a number of tasks in multilingual search for multimedia content. While investigation of image retrieval has formed the focus of the ImageCLEF task over many years, this chapter reviews tasks examining speech and video retrieval carried out within CLEF during its first 10 years, and overviews related work reported at other information retrieval benchmarks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Akiba T, Nishizaki H, Aikawa K, Kawahara T, Matsui T (2011) Overview of the IR for spoken documents task in NTCIR-9 workshop. In: Kando N, Ishikawa D, Sugimoto M (eds) Proceedings of the 9th NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering and cross-lingual information access. National Institute of Informatics, Tokyo

    Google Scholar 

  • Akiba T, Nishizaki H, Aikawa K, Hu X, Itoh Y, Kawahara T, Nakagawa S, Nanjo H, Yamashita Y (2013) Overview of the NTCIR-10 spokendoc-2 task. In: Kando N, Kishida K (eds) Proceedings of the 10th NTCIR conference on evaluation of information access technologies. National Institute of Informatics, Tokyo

    Google Scholar 

  • Akiba T, Nishizaki H, Nanjo H, Jones GJF (2014) Overview of the NTCIR-11 spokenquery&doc task. In: Kando N, Joho H, Kishida K (eds) Proceedings of the 11th NTCIR conference on evaluation of information access technologies. National Institute of Informatics, Tokyo

    Google Scholar 

  • Akiba T, Nishizaki H, Nanjo H, Jones GJF (2016) Overview of the ntcir-12 spokenquery&doc-2 task. In: Kando N, Sakai T, Sanderson M (eds) Proceedings of the 12th NTCIR conference on evaluation of information access technologies. National Institute of Informatics, Tokyo

    Google Scholar 

  • Aly R, Ordelman R, Eskevich M, Jones GJF, Chen S (2013) Linking inside a video collection - what and how to measure? In: Proceedings of the first worldwide web workshop on linked media (LiME-2013), International World Wide Web Conference Committee (IW3C2), Geneva

    Google Scholar 

  • Awad G, Fiscus J, Joy D, Michel M, Smeaton AF, Kraaij W, Eskevich M, Aly R, Ordelman R, Jones GJF, Huet B, Larson M (2016) TRECVID 2016: evaluating video search, video event detection, localization, and hyperlinking. In: The sixteenth international workshop on video retrieval evaluation (TRECVID 2016). National Institute of Standards and Technology (NIST), Special Publication 500–321, Washington

    Google Scholar 

  • Awad G, Butt A, Fiscus J, Joy D, Delgado A, Mcclinton W, Michel M, Smeaton A, Graham Y, Kraaij W, Quénot G, Eskevich M, Roeland Ordelman GJFJ, Huet B (2017) Trecvid 2017: evaluating ad-hoc and instance video search, events detection, video captioning, and hyperlinking. In: The seventeenth international workshop on video retrieval evaluation (TRECVID 2017). National Institute of Standards and Technology (NIST), Special Publication 500–321, Washington

    Google Scholar 

  • Byrne W, Doermann D, Franz M, Member S, Gustman S, Soergel D, Ward T, jing Zhu W (2004) Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Trans Speech Audio Process 12(4):420–435

    Article  Google Scholar 

  • Clough P, Sanderson M, Reid N (2006) The Eurovision St Andrews collection of photographs. SIGIR Forum 40(1):21–30

    Article  Google Scholar 

  • Eskevich M, Jones GJF (2014) Exploring speech retrieval from meetings using the AMI corpus. Comput Speech Lang (Special Issue on Information Extraction and Retrieval) 28(5):1021–1044

    Google Scholar 

  • Eskevich M, Jones GJF, Chen S, Aly R, Ordelman R, Larson M (2012a) Search and hyperlinking task at mediaeval 2012. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working Notes Proceedings of the MediaEval 2012 multimedia benchmark workshop. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-927/

  • Eskevich M, Jones GJF, Larson M, Wartena C, Aly R, Verschoor T, Ordelman R (2012b) Comparing retrieval effectiveness for alternative content segmentation methods for internet video. In: Proceedings of the 10th workshop on content-based multimedia indexing. IEEE, New Jersey, CBMI 2012

    Google Scholar 

  • Eskevich M, Jones GJF, Chen S, Aly R, Ordelman R (2013) The search and hyperlinking task at mediaeval 2013. In: Larson M, Anguera X, Reuter T, Jones GJF, Ionescu B, Schedl M, Piatrik T, Hauff C, Soleymani M (eds) Working notes proceedings of the MediaEval 2013 multimedia benchmark workshop. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-1043/

  • Eskevich M, Aly R, Racca DN, Ordelman R, Chen S, Jones GJF (2014) The search and hyperlinking task at mediaeval 2014. In: Larson M, Ionescu B, Anguera X, Eskevich M, Korshunov P, Schedl M, Soleymani M, Petkos P, Sutcliffe R, Choi J, Jones GJF (eds) Working notes proceedings of the MediaEval 2014 multimedia benchmark workshop. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-1263/

  • Eskevich M, Aly R, Ordelman R, Racca DN, Chen S, Jones GJF (2015) SAVA at Mediaeval 2015: Search and anchoring in video archives. In: Larson M, Ionescu B, Sjöberg M, Anguera X, Poignant J, Riegler M, Eskevich M, Hauff C, Sutcliffe R, Jones GJF, Yang YH, Soleymani M, Papadopoulos S (eds) Working notes proceedings of the MediaEval 2015 multimedia benchmark workshop. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-1436/

  • Federico M, Jones GJF (2004) The CLEF 2003 cross-language spoken document retrieval track. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Comparative evaluation of multilingual information access systems: fourth workshop of the cross–language evaluation forum (CLEF 2003) revised selected papers. Lecture notes in computer science (LNCS), vol 3237. Springer, Heidelberg, p 646

    Chapter  Google Scholar 

  • Federico M, Bertoldi N, Levow GA, Jones GJF (2005) CLEF 2004 cross-language spoken document retrieval track. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images: fifth workshop of the cross–language evaluation forum (CLEF 2004) revised selected papers. Lecture notes in computer science (LNCS), vol 3491. Springer, Heidelberg, pp 816–820

    Chapter  Google Scholar 

  • Garofolo JS, Auzanne CGP, Voorhees EM (2000) The trec spoken document retrieval track: a success story. In: Content-Based Multimedia Information Access - vol 1, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, Paris, France, France, RIAO ‘00, pp 1–20

    Google Scholar 

  • Glavitsch U, Schäuble P (1992) A system for retrieving speech documents. In: Belkin NJ, Ingwersen P, Mark Pejtersen A, Fox EA (eds) Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1992). ACM Press, New York, pp 168–176

    Chapter  Google Scholar 

  • Hauptmann AG, Witbrock MJ (1997) Informedia: news-on-demand multimedia information acquisition and retrieval. In: Maybury MT (ed) Intelligent multimedia information retrieval. MIT Press, Cambridge, pp 215–239

    Google Scholar 

  • James DA (1995) The application of classical information retrieval techniques to spoken documents. PhD thesis, Cambridge University

    Google Scholar 

  • Jones GJF (2000) Applying machine translation resources for cross-language information access from spoken documents. In: Proceedings of MT 2000: machine translation and multilingual applications in the new millennium. British Computer Society, pp 4-1–4-9

    Google Scholar 

  • Jones GJF (2001) New challenges for cross-language information retrieval: multimedia data and the user experience. In: Peters C (ed) Cross-language information retrieval and evaluation: workshop of cross-language evaluation forum (CLEF 2000). Lecture notes in computer science (LNCS), vol 2069. Springer, Heidelberg, pp 71–81

    Chapter  Google Scholar 

  • Jones GJF (2013) An introduction to crowdsourcing for language and multimedia technology research. In: Agosti M, Ferro N, Forner P, Müller H, Santucci G (eds) Information retrieval meets information visualization – PROMISE Winter School 2012, Revised Tutorial Lectures. Lecture notes in computer science (LNCS), vol 7757. Springer, Heidelberg, pp 132–154

    Google Scholar 

  • Jones GJF, Federico M (2003) CLEF 2002 cross-language spoken document retrieval pilot track report. In: Peters C, Braschler M, Gonzalo J, Kluck M (eds) Advances in cross-language information retrieval: third workshop of the cross–language evaluation forum (CLEF 2002) Revised Papers. Lecture notes in computer science (LNCS), vol 2785. Springer, Heidelberg, pp 446–457

    Chapter  Google Scholar 

  • Jones GJF, Foote JT, Spärck Jones K, Young SJ (1996) Retrieving spoken documents by combining multiple index sources. In: Frei HP, Harman D, Schaübie P, Wilkinson R (eds) Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1996). ACM Press, New York, pp 30–38

    Google Scholar 

  • Kekäläinen J, Järvelin K (2002) Using graded relevance assessments in IR evaluation. J Am Soc Inf Sci Technol 53(13):1120–1129

    Article  Google Scholar 

  • Khwileh A, Jones GJF (2016) Investigating segment-based query expansion for user-generated spoken content retrieval. In: 14th international workshop on content-based multimedia indexing, IEEE, CBMI 2016, pp 1–6

    Google Scholar 

  • Khwileh A, Afli H, Jones GJF, Way A (2017) Identifying effective translations for cross-lingual arabic-to-english user-generated speech search. In: Proceedings of the third arabic natural language processing workshop. Association for Computational Linguistics, pp 100–109

    Google Scholar 

  • Larson M, Jones GJF (2011) Spoken content retrieval: a survey of techniques and technologies. Found Trends Inf Retr 5(4–5):235—422

    Google Scholar 

  • Larson M, Newman E, Jones GJF (2009) Overview of VideoCLEF 2008: automatic generation of topic-based feeds for dual language audio-visual content. In: Peters C, Deselaers T, Ferro N, Gonzalo J, Jones GJF, Kurimo M, Mandl T, Peñas A (eds) Evaluating systems for multilingual and multimodal information access: ninth workshop of the cross-language evaluation forum (CLEF 2008). Revised selected papers. Lecture notes in computer science (LNCS), vol 5706. Springer, Heidelberg, pp 906–917

    Chapter  Google Scholar 

  • Larson M, Newman E, Jones GJF (2010) Overview of VideoCLEF 2009: new perspectives on speech-based multimedia content enrichment. In: Peters C, Tsikrika T, Müller H, Kalpathy-Cramer J, Jones GJF, Gonzalo J, Caputo B (eds) Multilingual information access evaluation Vol. II multimedia experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS). Springer, Heidelberg, pp 354–368

    Google Scholar 

  • Larson M, Eskevich M, Ordelman R, Kofler C, Schmiedeke S, Jones GJF (2011) Overview of mediaeval 2011 rich speech retrieval task and genre tagging task. In: Larson M, Rae A, Demarty CH, Kofler C, Metze F, Troncy R, Mezaris V, Jones GJF (eds) Working notes proceedings of the MediaEval 2011 multimedia benchmark workshop. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/Vol-807/

  • Marchand-Maillet S (2000) Content-based video retrieval: an overview. Technical report, Computer Vision Group, Computing Science Center, University of Geneva

    Google Scholar 

  • Marge M, Banerjee S, Rudnicky AI (2010) Using the Amazon Mechanical Turk for transcription of spoken language. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 2010). IEEE, Piscataway, pp 5270–5273

    Chapter  Google Scholar 

  • Oard DW, Wang J, Jones GJF, White RW, Pecina P, Soergel D, Huang X, Shafran I (2007) Overview of the CLEF-2006 cross-language speech retrieval track. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 744–758

    Chapter  Google Scholar 

  • Ordelman RJF, Eskevich M, Aly R, Huet B, Jones GJF (2015) Defining and evaluating video hyperlinking for navigating multimedia archives. In: Proceedings of the 24th international conference on world wide web. ACM, New York, WWW ‘15 Companion, pp 727–732

    Google Scholar 

  • Over P, Fiscus J, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Quénot G, Ordelman R, Aly R (2015) Trecvid 2015 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: The fifteenth international workshop on video retrieval evaluation (TRECVID 2015). National Institute of Standards and Technology (NIST), Special Publication 500-321, Washington

    Google Scholar 

  • Pecina P, Hoffmannová P, Jones GJF, Zhang Y, Oard DW (2008) Overview of the CLEF-2007 cross-language speech retrieval track. In: Peters C, Jijkoun V, Mandl T, Müller H, Oard DW, Peñas A, Petras V, Santos D (eds) Advances in multilingual and multimodal information retrieval: eighth workshop of the cross–language evaluation forum (CLEF 2007). Revised selected papers. Lecture notes in computer science (LNCS), vol 5152. Springer, Heidelberg, pp 674–686

    Chapter  Google Scholar 

  • Racca DN, Jones GJ (2016) On the effectiveness of contextualisation techniques in spoken query spoken content retrieval. In: Perego R, Sebastiani F, Aslam J, Ruthven I, Zobel J (eds) Proceedings of the 39th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2016). ACM Press, New York, pp 933–936

    Google Scholar 

  • Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Association for Computational Linguistics, pp 139–147

    Google Scholar 

  • Sanderson M, Shou XM (2007) Search of spoken documents retrieves well recognized transcripts. In: Amati G, Carpineto C, Romano G (eds) Advances in information retrieval. Proceedings of the 29th European conference on IR research (ECIR 2007). Lecture notes in computer science (LNCS), vol 4425. Springer, Heidelberg, pp 505–516

    Chapter  Google Scholar 

  • Schmiedeke S, Xu P, Ferrané I, Eskevich M, Kofler C, Larson M, Estève Y, Lamel L, Jones GJF, Sikora T (2013) Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of ACM multimedia systems. ACM, New York, MMSys’13

    Google Scholar 

  • Schoeffmann K, Hopfgartner F, Marques O, Böszörmenyi L, Jose JM (2010) Video browsing interfaces and applications: a review. SPIE Rev 1(1):1–35

    Google Scholar 

  • Sheridan P, Wechsler M, Schäuble P (1997) Cross-language speech retrieval: establishing a baseline performance. In: Belkin NJ, Narasimhalu AD, Willett P, Hersh W, Can F, Voorhees EM (eds) Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1997). ACM Press, New York, pp 99–108

    Chapter  Google Scholar 

  • Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and trecvid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, MIR ‘06, pp 321–330

    Google Scholar 

  • Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  • White RW, Oard DW, Jones GJF, Soergel D, Huang X (2006) Overview of the CLEF-2005 cross-language speech retrieval track. In: Peters C, Gey FC, Gonzalo J, Jones GJF, Kluck M, Magnini B, Müller H, de Rijke M (eds) Accessing multilingual information repositories: sixth workshop of the cross–language evaluation forum (CLEF 2005). Revised selected papers. Lecture notes in computer science (LNCS), vol 4022. Springer, Heidelberg, pp 744–759

    Chapter  Google Scholar 

Download references

Acknowledgements

The success of the CLEF and MediaEval tasks described in this chapter would not have been possible without the work of the task co-chairs Marcello Federico, Douglas W. Oard, Martha Larson, Maria Eskevich, Robin Aly and Roeland Ordelman.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gareth J. F. Jones .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jones, G.J.F. (2019). About Sound and Vision: CLEF Beyond Text Retrieval Tasks. In: Ferro, N., Peters, C. (eds) Information Retrieval Evaluation in a Changing World. The Information Retrieval Series, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-22948-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22948-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22947-4

  • Online ISBN: 978-3-030-22948-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics