Graphics in/for Digital LibrariesIntegrated information mining for texts, images, and videos
Introduction
Digital media originating from images, audio, video, and text are a comparably new data part in nowadays information systems. Although it is well-known how to index and retrieve text documents, the same task is very difficult for, e.g., images or single sequences out of long videos. It is the aim of this paper to contribute to the research in the automatic analysis of graphical data, such as videos, and to extend it to a semantical level for static properties.
There are several well-known systems for the analysis of multimedia data and their retrieval which mainly concentrate on non-textual graphical data, such as color and texture vector information, or video cut detection: The ART MUSEUM[13]is used to find images from a database which can contain only images of artistic paintings and photographs. The algorithm for a sketch retrieval and/or a similarity retrieval is based on graphical features. A user can formulate a query by using sketches, which are taken from templates or which can be drawn.
The PHOTOBOOK[24]system is a set of interactive tools for browsing and searching single images and video sequences. A query is based on image content rather than on text annotations. The VIRAGE VIDEO ENGINE[10]uses some video-specific data, like motion, audio, closed caption, etc., to build up an information structure representing the content information about a video. A user can formulate queries to retrieve, e.g., commercials, scenes with special camera motions, e.g., a talking head, or just a scene denoted by a short text.
One of the first image retrieval projects is QBIC. Using an interactive graphical query interface, a user can draw a sketch to find images with similar shapes, to find images with colors or textures positioned at specific places, or to denote an object motion for the video domain[22].
In this paper we concentrate on MPEG videos and describe a special algorithm for a fast automatic shot detection based on the difference of the chrominance and luminance values. This shot detection is a first step towards a logical segmentation of a video, which finally leads to the selection of key frames or the generation of a single image using a fast mosaicing technique. A video is then indexed by textual information describing camera parameters of the shots and also the content of the key frames or the mosaic images. The ImageMiner system[17]can be used to process the representative frames for color, texture, and contour features.
Section snippets
Shot detection in MPEG videos
To support videos in a multimedia retrieval system, the high number of frames in a video must be reduced to remove the enormous amount of redundant information. As the video data is not structured by tags, the frames are grouped by semantical units in a first step through an automatic shot analysis which detects cuts in the video stream.
Basically there are two different methods to perform a shot detection: using DCT-coefficients or determining the differences in the color distribution of
Generating still images from video shots
As the result of the shot detection a video isdivided in several cuts. Each cut is now treated as one unit. From these units only some frames should be analyzed by an image analysis system to derive information about color, texture, and contours. For browsing purposes, the computation of indices or other analysis functions, two different kinds of images can be used: significant key frames or generated mosaic images. The key frame methode is in principle suitable for arbitrary shots, but to
The ImageMiner system: overview
The ImageMiner system is a system for the automatic annotation of still images. Key frames and mosaic images obtained by the techniques described in Section 3.1and Section 3.2can be analyzed by the ImageMiner system. The ImageMiner system consists of two main modules: the image analysis module, which extracts the content information, and the image retrieval module. The functionality of the retrieval module is discussed in Section 4.3. The next paragraphs give an overview of the image analysis
Examples
This section shows an example of the whole process, which was described in the previous sections. The first step of the annotation process (see Section 2) is the shot detection. Table 3 gives an overview of the performance of our shot detection approach. The accuracy of the shot detection is given in percent.
For this special example we have tested the complete analysis—video analysis and still image analysis with ImageMiner—with a short MPEG-1 videostream containing several scenes from a
Summary and conclusions
We have shown a successful approach to analyze MPEG videos by dividing them into shots (Section 2). Then a representative image is constructed for each shot: if it contains camera motion the mosaicing technique (Section 3.2) is used, otherwise a significant key frame is extracted (Section 3.1). The still image—key frame or mosaic image—representing a shot is then analyzed with image processing methods as they are implemented in the ImageMiner system (Section 4). The novel feature of an
References (31)
Texture analysis using gray level run lengths
Computer Graphics and Image Processing
(1975)- et al.
Statistical feature matrix for texture analysis
CVGIP: Graphical Models and Image Processing
(1992) - et al.
Textural features corresponding to textural properties
IEEE Transactions on Systems, Man and Cybernetics
(1989) - et al.
Image processing on encoded video sequences
Multimedia Systems
(1994) - Asendorf, G. and Hermes, Th., On Textures: an Approach for a New Abstract Description Language. In Proceedings of IS &...
A computational approach to edge detection
IEEE Trans Pattern Analysis and Machine Intelligence
(1986)- Dammeyer, A., Jürgensen, W., Krüwel, C., Poliak, E., Ruttkowski, S., Schäfer, T., Sirava, M. and Hermes T.,...
- Foley, J. D., van Damm, A., Feiner, S. K. and Hughes J. F., Computer Graphics, Principles and Practice. Addison-Wesley,...
- Fröhlich, M. and Werner, M., Demonstration of the interactive Graph Visualization System daVinci. In Proceedings of...
- Goeser, S., A Logic-based Approach to Thesaurus Modelling. In Proceedings of the International Conference on...
Textural features for image classification
IEEE Transactions on Systems, Man and Cybernetics
Cited by (6)
Video retrieval using successive modular operations on temporal similarity
2004, Computer Standards and Interfaces[email protected] - Using CBIR Technology in Interactive Web-TV
2021, EGMM 2001 - EG Multimedia Workshop 2001A combination of machine learning and image processing technologies for the classification of image regions
2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Real-time text information extraction from videos
2002, Qinghua Daxue Xuebao/Journal of Tsinghua UniversityExtracting textual inserts from digital videos
2001, Proceedings of the International Conference on Document Analysis and Recognition, ICDARAVAnTA - Automatic video analysis and annotation
2000, IT - Information Technology
- 1
ImageMiner is a trademark of IBM Corp.