Multimodal Search for Effective Video Retrieval

Natsev, Apostol (Paul)

doi:10.1007/11788034_60

Apostol (Paul) Natsev²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4071))

Included in the following conference series:

International Conference on Image and Video Retrieval

787 Accesses
1 Citations

Abstract

Semantic search and retrieval of multimedia content is a challenging research field that has drawn significant attention in the multimedia research community. With the dramatic growth of digital media at home, in enterprises, and on the web, methods for effective indexing and search of visual content are vital in unlocking the value of this content. Conventional database search and text search over large textual corpora are both well-understood problems with ubiquitous applications. However, search in non-textual unstructured content, such as image and video data, is not nearly as mature or effective. A common approach for video retrieval, for example, is to apply conventional text search techniques to the associated closed caption or speech transcript. This approach works fairly well for retrieving named entities, such as specific people, objects, or places. However, it does not work well for generic topics related to general settings, events, or people actions, as the speech track rarely describes the background setting or the visual appearance of the subject. Text-based search is not even applicable to scenarios that do not have speech transcripts or other textual metadata for indexing purposes (e.g., consumer photo collections). In addition, speech-based video retrieval frequently leads to false matches of segments that talk about but do not depict the entity of interest. Because of these and other limitations, it is now apparent that conventional text search techniques on their own are not sufficient for effective image and video retrieval, and they need to be combined with techniques that consider the visual semantics of the content. The most substantial work in this field is presented in the TREC Video Retrieval Evaluation (TRECVID) community, which focuses its efforts on evaluating video retrieval approaches by providing common video datasets and a standard set of queries.

This material is based upon work funded in part by the U. S. Government. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Hawthorne, NY, 10532
Apostol (Paul) Natsev

Authors

Apostol (Paul) Natsev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arts, Media and Engineering Program, Arizona State University, 85281, Tempe, AZ,
Hari Sundaram
Intelligent Information Management Department, IBM T.J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
Milind Naphade
Intelligent Information Management Department, IBM T. J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
John R. Smith
Microsoft Corporation, Microsoft China R&D Group, 49 Zhichun Road, 100080, Beijing, China
Yong Rui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Natsev, A.(. (2006). Multimodal Search for Effective Video Retrieval. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds) Image and Video Retrieval. CIVR 2006. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788034_60

Download citation

DOI: https://doi.org/10.1007/11788034_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36018-6
Online ISBN: 978-3-540-36019-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics