Elsevier

Computers & Graphics

Volume 22, Issue 6, December 1998, Pages 675-685
Computers & Graphics

Graphics in/for Digital Libraries
Integrated information mining for texts, images, and videos

https://doi.org/10.1016/S0097-8493(98)00088-0Get rights and content

Abstract

The large amount and the ubiquitous availability of multimedia information (e.g., video, audio, image, and also text documents) require efficient, effective, and automatic annotation and retrieval methods. As videos start to play an even more important role in multimedia, content-based retrieval of videos becomes an issue, especially as there should be an integrated methodology for all types of multimedia documents.

Our approach for the integrated retrieval of videos, images, and text comprises three necessary steps: First, the detection and extraction of shots from a video, second, the construction of a still image from the frames in a shot. This is achieved by an extraction of key frames or a mosaicing technique. The result is a single image visualization of a shot, which in turn can be analyzed by the ImageMiner1 system.

The ImageMiner system was developed in cooperation with IBM at the University of Bremen in the Image Processing Department of the Center for Computing Technologies. It realizes the content-based retrieval of single images through a novel combination of techniques and methods from computer vision and artificial intelligence. Its output is a textual description of an image, and thus in our case, of the static elements of a video shot. In this way, the annotations of a video can be indexed with standard text retrieval systems, along with text documents or annotations of other multimedia documents, thus ensuring an integrated interface for all kinds of multimedia documents.

Introduction

Digital media originating from images, audio, video, and text are a comparably new data part in nowadays information systems. Although it is well-known how to index and retrieve text documents, the same task is very difficult for, e.g., images or single sequences out of long videos. It is the aim of this paper to contribute to the research in the automatic analysis of graphical data, such as videos, and to extend it to a semantical level for static properties.

There are several well-known systems for the analysis of multimedia data and their retrieval which mainly concentrate on non-textual graphical data, such as color and texture vector information, or video cut detection: The ART MUSEUM[13]is used to find images from a database which can contain only images of artistic paintings and photographs. The algorithm for a sketch retrieval and/or a similarity retrieval is based on graphical features. A user can formulate a query by using sketches, which are taken from templates or which can be drawn.

The PHOTOBOOK[24]system is a set of interactive tools for browsing and searching single images and video sequences. A query is based on image content rather than on text annotations. The VIRAGE VIDEO ENGINE[10]uses some video-specific data, like motion, audio, closed caption, etc., to build up an information structure representing the content information about a video. A user can formulate queries to retrieve, e.g., commercials, scenes with special camera motions, e.g., a talking head, or just a scene denoted by a short text.

One of the first image retrieval projects is QBIC. Using an interactive graphical query interface, a user can draw a sketch to find images with similar shapes, to find images with colors or textures positioned at specific places, or to denote an object motion for the video domain[22].

In this paper we concentrate on MPEG videos and describe a special algorithm for a fast automatic shot detection based on the difference of the chrominance and luminance values. This shot detection is a first step towards a logical segmentation of a video, which finally leads to the selection of key frames or the generation of a single image using a fast mosaicing technique. A video is then indexed by textual information describing camera parameters of the shots and also the content of the key frames or the mosaic images. The ImageMiner system[17]can be used to process the representative frames for color, texture, and contour features.

Section snippets

Shot detection in MPEG videos

To support videos in a multimedia retrieval system, the high number of frames in a video must be reduced to remove the enormous amount of redundant information. As the video data is not structured by tags, the frames are grouped by semantical units in a first step through an automatic shot analysis which detects cuts in the video stream.

Basically there are two different methods to perform a shot detection: using DCT-coefficients or determining the differences in the color distribution of

Generating still images from video shots

As the result of the shot detection a video isdivided in several cuts. Each cut is now treated as one unit. From these units only some frames should be analyzed by an image analysis system to derive information about color, texture, and contours. For browsing purposes, the computation of indices or other analysis functions, two different kinds of images can be used: significant key frames or generated mosaic images. The key frame methode is in principle suitable for arbitrary shots, but to

The ImageMiner system: overview

The ImageMiner system is a system for the automatic annotation of still images. Key frames and mosaic images obtained by the techniques described in Section 3.1and Section 3.2can be analyzed by the ImageMiner system. The ImageMiner system consists of two main modules: the image analysis module, which extracts the content information, and the image retrieval module. The functionality of the retrieval module is discussed in Section 4.3. The next paragraphs give an overview of the image analysis

Examples

This section shows an example of the whole process, which was described in the previous sections. The first step of the annotation process (see Section 2) is the shot detection. Table 3 gives an overview of the performance of our shot detection approach. The accuracy of the shot detection is given in percent.

For this special example we have tested the complete analysis—video analysis and still image analysis with ImageMiner—with a short MPEG-1 videostream containing several scenes from a

Summary and conclusions

We have shown a successful approach to analyze MPEG videos by dividing them into shots (Section 2). Then a representative image is constructed for each shot: if it contains camera motion the mosaicing technique (Section 3.2) is used, otherwise a significant key frame is extracted (Section 3.1). The still image—key frame or mosaic image—representing a shot is then analyzed with image processing methods as they are implemented in the ImageMiner system (Section 4). The novel feature of an

References (31)

  • M.M. Galloway

    Texture analysis using gray level run lengths

    Computer Graphics and Image Processing

    (1975)
  • C.-M. Wu et al.

    Statistical feature matrix for texture analysis

    CVGIP: Graphical Models and Image Processing

    (1992)
  • M. Amadasun et al.

    Textural features corresponding to textural properties

    IEEE Transactions on Systems, Man and Cybernetics

    (1989)
  • F. Arman et al.

    Image processing on encoded video sequences

    Multimedia Systems

    (1994)
  • Asendorf, G. and Hermes, Th., On Textures: an Approach for a New Abstract Description Language. In Proceedings of IS &...
  • J. Canny

    A computational approach to edge detection

    IEEE Trans Pattern Analysis and Machine Intelligence

    (1986)
  • Dammeyer, A., Jürgensen, W., Krüwel, C., Poliak, E., Ruttkowski, S., Schäfer, T., Sirava, M. and Hermes T.,...
  • Foley, J. D., van Damm, A., Feiner, S. K. and Hughes J. F., Computer Graphics, Principles and Practice. Addison-Wesley,...
  • Fröhlich, M. and Werner, M., Demonstration of the interactive Graph Visualization System daVinci. In Proceedings of...
  • Goeser, S., A Logic-based Approach to Thesaurus Modelling. In Proceedings of the International Conference on...
  • Hampapur, A., Virage Video Engine. In IS & T/SPIE Symposium on Electronical Imaging Science & Technology, pp. 188-198....
  • Hanschke, P., Abecker, A. and Drollinger, D., TAXON: A Concept Language with Concrete Domains. In Proceedings of the...
  • R.M. Haralick et al.

    Textural features for image classification

    IEEE Transactions on Systems, Man and Cybernetics

    (1973)
  • Hirata, K. and Kato, T., Query By Visual Example. In Proceedings of Third Intl. conf. on Extending Database Technology,...
  • Klauck, Ch., Eine Graphgrammatik zur Repräsentation und Erkennung von Features in CAD/CAM. DISKI No. 66. infix-Verlag,...
  • Cited by (6)

    1

    ImageMiner is a trademark of IBM Corp.

    View full text