Elsevier

Pattern Recognition

Volume 39, Issue 11, November 2006, Pages 2092-2100
Pattern Recognition

A fuzzy logic approach for detection of video shot boundaries

https://doi.org/10.1016/j.patcog.2006.04.044Get rights and content

Abstract

Video temporal segmentation is normally the first and important step for content-based video applications. Many features including the pixel difference, colour histogram, motion, and edge information etc. have been widely used and reported in the literature to detect shot cuts inside videos. Although existing research on shot cut detection is active and extensive, it still remains a challenge to achieve accurate detection of all types of shot boundaries with one single algorithm. In this paper, we propose a fuzzy logic approach to integrate hybrid features for detecting shot boundaries inside general videos. The fuzzy logic approach contains two processing modes, where one is dedicated to detection of abrupt shot cuts including those short dissolved shots, and the other for detection of gradual shot cuts. These two modes are unified by a mode-selector to decide which mode the scheme should work on in order to achieve the best possible detection performances. By using the publicly available test data set from Carleton University, extensive experiments were carried out and the test results illustrate that the proposed algorithm outperforms the representative existing algorithms in terms of the precision and recall rates.

Introduction

Multimedia applications have been extremely expanded over the past decade, and a hierarchical structure is needed for managing the video content. With such a demand for a hierarchical model of videos, many researchers are moving into this area and developing algorithms for visual content interpretation, analysis, and management, where content-based video retrieval and content-based video coding [1] stand out as the most representative examples in this trend.

To build a system and manage the structure of videos, it is widely recognised that the first step would be the automatic detection of shot cuts or their boundaries to divide the video sequence into manageable sections, where the visual content remains consistent in terms of camera operations and other visual events. From the media production side, meaningful stories and sceneries are generated via sequences of video editing, which resulted in a range of different shot boundaries, including abrupt shot boundary, dissolved shot boundary, fade in, and fade out. These shot boundaries can be grouped into two categories according to the duration of the change. One is called abrupt shot cut if an instantaneous change happens from one shot to another. This type of shot boundaries is primarily used by editors to cut scenes into consistent sections. The other one is called gradual shot boundary including fade in, fade out and dissolve because the shot changes gradually. A fade is defined as a shot appear or disappear through several of frames. Dissolve is made when a shot fades in whilst another shot fades out. Samples of these shot boundaries are illustrated in Fig. 1.

Corresponding to these shot boundaries as illustrated in Fig. 1, many algorithms have been proposed and reported in the published literature [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Pixel-based difference analysis remains to be one of the most adopted approaches to find the dissimilarity between shot boundaries. Exploitation of temporal information, such as those based on motion estimation and compensation techniques [13], represents another major direction for shot cut detection, in which the large difference value caused by the global motion can be easily detected and extracted by a simple block matching procedure. In Ref. [2], shot cut boundaries are detected by calculating the normalised correlation among blocks and locating the maximum correlation coefficient in the frequency domain. In line with the motion information, spatial information such as edges etc. is often regarded as another important feature for shot cut detection. The edge tracking algorithm reported in [8] stands out as a representative example for this direction of research, where the proposed techniques are mainly based on the principle that, when most of the edge information is lost in the consecutive frames, a shot cut can be declared. In Ref. [4], the feature of edge pixel count is proposed for shot cut detection, where Sobel edge detector is used. But the most widely used feature is colour histogram, examples of which includes histogram intersection, “twin-comparison”, local histogram etc. [3], [4], [5], [6]. These histogram features, however, have drawbacks in the sense that, when the global motion is fast or the contrast of the frames is low, these colour histogram based techniques will miss many shot boundaries, and thus their recalling performance becomes poor. In Ref. [7], a visual rhythm-based technique is reported, which produced good experimental results in shot cut detection. Due to the complexity of the algorithm design, however, this technique can only be used for off-line shot cut detection. In contrast to most existing approaches, we propose a fuzzy logic approach in this paper to exploit the feature-based techniques and detect shot boundaries. The advantages of our contribution can be highlighted as: (i) a range of features can be integrated by fuzzy logic operation to exploit their individual strength collectively; and (ii) while directly thresholding features remains sensitive to noises, selecting threshold in fuzzy domain provides a buffered operation and thus makes the detection more reliable. The rest of this paper is organised as follows. Section 2 describes detailed design of the proposed algorithm. Section 3 reports experimental results in comparison with three existing algorithms, and Section 4 provides conclusions.

Section snippets

Feature extraction and mode selector design

To detect relatively abrupt scene changes between boundaries of shots, we propose to combine a range of representative features to construct a hybrid feature for shot cut detection. By representative, we mean those features that: (i) are widely used in existing shot cut detection algorithms; (ii) are relatively mature that substantial evaluations have been reported and supportive results obtained in the literature; and (iii) whose implementation does not incur intensive computing cost and their

Experimental result

To evaluate the proposed temporal segmentation algorithms, we need to address two important issues to prepare for experiments design. The first issue is the test data set and the second issue is the selection of a benchmark out existing work in relevant areas. In order to ensure that the proposed algorithm can be evaluated by any researcher in comparison with any other algorithms to be developed in the future, we select two open test data sets, one is publicly available via Internet download

Conclusions

In this paper, we proposed a fuzzy logic approach for temporal segmentation of videos, where a number of features are integrated towards better performances of shot cut detection. These features include color histogram intersection, motion compensation, texture change, and edge variances. Experimental results support that the proposed algorithm is effective in video segmentation benchmarked by three existing algorithms and measured by precision and recall rates.

Acknowledgment

The authors wish to acknowledge the financial support from the EU IST Framework-6 programme under the IP Project: Live staging of media events (IST-04-027312).

References (19)

There are more references available in the full text version of this article.

Cited by (71)

  • Exploring global diverse attention via pairwise temporal relation for video summarization

    2021, Pattern Recognition
    Citation Excerpt :

    Finally, we conclude this paper. Video summarization [15] has been a long-standing problem in multimedia analysis with great practical potential, and lots of relevant works have been explored in recent years. Traditional approaches can be generally divided into two categories, i.e., unsupervised learning and supervised learning.

  • A vague set approach for identifying shot transition in videos using multiple feature amalgamation

    2019, Applied Soft Computing Journal
    Citation Excerpt :

    For precise detection of shot boundaries, fuzzy logic, artificial neural networks, genetic algorithms, support vector machines, rough sets etc. have been applied in several works. Fang et al. [116] adopted an approach based on fuzzy logic to integrate several features extracted from the frames. A mode selector is used to combine several algorithms in order to detect abrupt and gradual transitions.

  • WOA-FNN: An innovative hybrid optimization technique for effective detection of shot boundaries

    2023, Conference Proceedings - 2023 IEEE Silchar Subsection Conference, SILCON 2023
  • Adaptive Multiview Graph Difference Analysis for Video Summarization

    2022, IEEE Transactions on Circuits and Systems for Video Technology
View all citing articles on Scopus
View full text