Skip to main content

Multimodal Search for Effective Video Retrieval

  • Conference paper
Image and Video Retrieval (CIVR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4071))

Included in the following conference series:

Abstract

Semantic search and retrieval of multimedia content is a challenging research field that has drawn significant attention in the multimedia research community. With the dramatic growth of digital media at home, in enterprises, and on the web, methods for effective indexing and search of visual content are vital in unlocking the value of this content. Conventional database search and text search over large textual corpora are both well-understood problems with ubiquitous applications. However, search in non-textual unstructured content, such as image and video data, is not nearly as mature or effective. A common approach for video retrieval, for example, is to apply conventional text search techniques to the associated closed caption or speech transcript. This approach works fairly well for retrieving named entities, such as specific people, objects, or places. However, it does not work well for generic topics related to general settings, events, or people actions, as the speech track rarely describes the background setting or the visual appearance of the subject. Text-based search is not even applicable to scenarios that do not have speech transcripts or other textual metadata for indexing purposes (e.g., consumer photo collections). In addition, speech-based video retrieval frequently leads to false matches of segments that talk about but do not depict the entity of interest. Because of these and other limitations, it is now apparent that conventional text search techniques on their own are not sufficient for effective image and video retrieval, and they need to be combined with techniques that consider the visual semantics of the content. The most substantial work in this field is presented in the TREC Video Retrieval Evaluation (TRECVID) community, which focuses its efforts on evaluating video retrieval approaches by providing common video datasets and a standard set of queries.

This material is based upon work funded in part by the U. S. Government. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Natsev, A.(. (2006). Multimodal Search for Effective Video Retrieval. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds) Image and Video Retrieval. CIVR 2006. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788034_60

Download citation

  • DOI: https://doi.org/10.1007/11788034_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36018-6

  • Online ISBN: 978-3-540-36019-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics