Elsevier

Information Systems

Volume 32, Issue 4, June 2007, Pages 545-559
Information Systems

Techniques used and open challenges to the analysis, indexing and retrieval of digital video

https://doi.org/10.1016/j.is.2006.09.001Get rights and content

Abstract

Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activity. In this paper, we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area.

Introduction

Video, in digital or analogue form, is a collation of still images presented to the viewer so fast they give the illusion of motion. Motion pictures were invented in the early 1890s by Thomas A. Edison, and Louis Lumière is often credited as beginning the motion picture camera industry in 1895 when he presented the first movie show to an audience. Video in digital rather than analogue form has only recently become commonplace with consumer-level devices for video capture and storage now reaching a mass market and this has opened up many possibilities for storage, distribution, analysis and access to digital video archives.

Video is almost always structured into a strict hierarchy. A “programme” is divided into “scenes”, and a scene is composed of one or more camera “shots”. A scene usually corresponds to some logical event in a program such as a sequence of shots making up a dialogue scene in a sitcom, or an action scene in a movie. A shot corresponds to video footage from a single camera in time and can involve camera motion like panning, zooming, tracking or booming as well as movements of objects within and/or into or out of the frame of view. Thus a single shot can have a lot of motion, such as in typical music videos, or no motion at all, such as still footage of an outdoor landscape. Shots usually transition from one to the next using what is called a “hard cut” where the last frame of the outgoing shot is followed immediately by the first frame of the incoming shot. Shot transitions can also be gradual such as a fade in/out, fade to black, wipes, or the more elaborate digitally-based shot transitions used to introduce action replays in TV sports programs.

There is a huge amount of video now stored in digital format. The movie industry, for example, is a massive producer of video and according to [1] we have a grand total surviving stock of 328,530 movies representing a total of 740,803 h. When we add to that broadcast TV content from thousands of TV stations worldwide, and footage from millions of CCTV cameras and from millions of mobile camera/video phones we really cannot comprehend the volume of video available to us from various sources.

Engineering the technology and building the structures to allow such a massive growth of video content has of course been driven by the market for video content and many of the issues associated with video in digital form are already solved or present few remaining technical challenges. The ability to easily capture, format, compress, store, transmit, and then display video on fixed devices such as computers or domestic set-top boxes, or on mobile devices such as iPODs or phones, is easily within the grasp of most home consumers. Manually editing video, however, to remove unwanted portions or to compose summaries of TV sports programs or highlights of family events from home camcorders, still remains a specialist rather than a commonplace application.

As a 1-sentence precis of the use of video in digital format we can say that it is in widespread use and the outstanding challenge(s) are in managing video content as an information resource. For us, managing information means many things including analysis, indexing, summarising, aggregating, browsing and searching. In the context of digital video some of these tasks will depend very much on the domain from which the video is taken e.g. footage of a sports event will require different analysis and have different criteria for generating summaries to, say, a television sitcom.

In this paper we are not concerned with technology, standards, streaming or security, or other areas which are all very valid research topics which demand considerable attention and present many challenges. Instead we address the task of video retrieval which forms one part of the set of functions used in the broad task of video navigation and information seeking. When we use a video retrieval or browsing system we do so in order to help achieve some broader information seeking task or goal which can be as varied as finding a home movie video clip of our cousin's recent wedding to finding a good documentary about space travel and the Apollo missions, to finding the tango scene in the movie Scent of a Woman. The difference between information seeking and information retrieval is discussed in detail in [2]. In this paper, and in most of the research into video retrieval we concentrate on the video search and retrieval operation and not on satisfying the user's broader information seeking goals.

This paper is not meant to be just a review of the author's or any other previous work in content-based video management although the paper will include a sizable component which describes previous work. The reason for including this is that we have categorised the current approaches to content-based video searching into five non-overlapping techniques and for each of these we illustrate the approach or technique with a description of a system which implements it and this, and the set of challenges laid out following that, are the main contributions of the paper.

The rest of this paper is organised as follows. In the next section, we give an overview of the basics of video retrieval and in the section that follows we present a categorisation of the five main approaches to video searching. We then present, in Section 4, a set of eight challenges to the research community to address the shortfalls and weaknesses in current capabilities.

Section snippets

Basics of video IR

In order to manage video content, or indeed any kind of digital media content, we need to structure the raw video and shot boundary detection will automatically segment video into its constituent shots. A shot may be completely still or may have either camera motion or motion of objects within the frame, or both. The reason for automatically structuring video is to allow content-based operations over video at the level of granularity of shot-sized units, which are more manageable.

Shot boundary

Current approaches to searching video archives

In this section, we present our five categories of contemporary approaches to video IR. We should remember that there are limitations and bounds put on the current state of the field of video IR. These include retrieving shots only rather than larger units, that the sizes of the collections of video we can retrieve from are small-scale compared to other media such as text and image collections, and that automatic feature extraction is currently noisy and can be inaccurate. However, in time and

Challenges

As stated at the beginning of this paper, the purpose of illustrating the range of contemporary approaches to video retrieval and to tease out their shortcomings is to then lay out a series of challenges which we in what could be termed the broad field of video navigation, should address. We now present these eight challenges and discuss each briefly.

  • (1)

    The first challenge we will raise is to highlight that almost all work in video navigation is based on treating video as a series of individual

Conclusions

This paper has presented a brief review of ways in which digital video information can be indexed and searched and has followed that with a set of eight challenges which need to be addressed if content-based access to video information is to be as easy and widespread as the ways we currently search huge amounts of text web pages. The interaction between people and video which we assume here is an interactive search task where a user wants to locate video segments because of some need to locate

References (20)

  • P. Lyman, H.R. Varian, How much information? (last checked June 2006),...
  • P. Ingwersen, K. Järvelin, The Turn: Integration of Information Seeking and Retrieval in Context, the Kluwer...
  • A. Hanjalic

    Shot boundary detection: unraveled and resolved

    IEEE Trans. Circuits Syst. Video Technol.

    (2002)
  • B. Manjunath, P. Salembier, T. Sikora (Eds.), Introduction to MPEG-7: Multimedia Content Description Language, Wiley,...
  • A.F. Smeaton, Large scale evaluations of multimedia information retrieval: the TRECVid experience, in: W.-K. Leow, M.S....
  • TRECVid video retrieval evaluation (last checked July 2006),...
  • The Internet archive: moving image archive (last checked June 2006),...
  • The open video project: a shared video collection (last checked June 2006),...
  • H. Lee, A.F. Smeaton, N.E. O’Connor, B. Smyth, User evaluation of Físchlár-News: an automatic broadcast news delivery...
  • The Google video search engine (last checked June 2006),...
There are more references available in the full text version of this article.

Cited by (60)

  • A systematic review on content-based video retrieval

    2020, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The main results of the conduction step are reported in Section 4. During the feasibility study, six recent surveys associated with CBVIR were found (Müller and Unay, 2017; Bhaumik et al., 2016; Priya and Shanmugam, 2013; Hu et al., 2011; Smeaton, 2007; Antani et al., 2002). However, differently from these surveys, this work: (1) pays attention on dimensionality reduction approaches, (2) proposes a research protocol on indexing and retrieval of videos from any domain and (3) reviews papers published from 2011 to 2018.

  • Latent topics-based relevance feedback for video retrieval

    2016, Pattern Recognition
    Citation Excerpt :

    Content-Based Video Retrieval (CBVR) is concerned about providing users with those videos which satisfy their queries by means of the video content analysis. As a result, the CBVR field has become a very important research area and a wide variety of CBVR systems have been developed [1–4]. The standard CBVR procedure involves three main components: (i) a query, containing a few video examples of the semantic concept that the user is looking for; (ii) a database, which is used to retrieve videos related to the query concept; and (iii) a ranking function, which sorts the database according to the relevance with respect to the user׳s query.

  • Adaptive key frame extraction for video summarization using an aggregation mechanism

    2012, Journal of Visual Communication and Image Representation
    Citation Excerpt :

    This prompt increase in video data stipulates efficient techniques for indexing, retrieval, and storage of this data. However, these techniques have not progressed at the same pace [1]. This is due to the substantially different nature of video data which is not suited for conventional retrieval, indexing, and storage techniques.

  • Shot boundary detection using Zernike moments in Multi-GPU Multi-CPU architectures

    2012, Journal of Parallel and Distributed Computing
    Citation Excerpt :

    Because transitions involve changes in the video sequence, many shot boundary detection algorithms are based on computing differences between consecutive frames or groups of frames. This paper is focused on non-compressed video segmentation [9,18], since it is an interesting workbench for primitives to be also used in a retrieval stage, as in [11], although there are other options [2,12]. SBD is a quite demanding task from a computational point of view, and new methods have been proposed to grant low response times even for large datasets, but they are far from real time [1].

  • A novel approach for semantic interoperability in the web based on the semantic triangle communication model

    2011, International Journal of Software Engineering and Knowledge Engineering
  • Video Indexing and Retrieval Techniques: A Review

    2023, Lecture Notes in Networks and Systems
View all citing articles on Scopus

This work was supported by Science Foundation Ireland under grant number 03/IN.3/I361. None of the work from my own research group reported in this paper would have been possible without the great effort of the many students and other researchers over the last several years.

View full text