A fuzzy logic approach for detection of video shot boundaries

doi:10.1016/j.patcog.2006.04.044

Pattern Recognition

Volume 39, Issue 11, November 2006, Pages 2092-2100

https://doi.org/10.1016/j.patcog.2006.04.044 Get rights and content

Abstract

Video temporal segmentation is normally the first and important step for content-based video applications. Many features including the pixel difference, colour histogram, motion, and edge information etc. have been widely used and reported in the literature to detect shot cuts inside videos. Although existing research on shot cut detection is active and extensive, it still remains a challenge to achieve accurate detection of all types of shot boundaries with one single algorithm. In this paper, we propose a fuzzy logic approach to integrate hybrid features for detecting shot boundaries inside general videos. The fuzzy logic approach contains two processing modes, where one is dedicated to detection of abrupt shot cuts including those short dissolved shots, and the other for detection of gradual shot cuts. These two modes are unified by a mode-selector to decide which mode the scheme should work on in order to achieve the best possible detection performances. By using the publicly available test data set from Carleton University, extensive experiments were carried out and the test results illustrate that the proposed algorithm outperforms the representative existing algorithms in terms of the precision and recall rates.

Introduction

Multimedia applications have been extremely expanded over the past decade, and a hierarchical structure is needed for managing the video content. With such a demand for a hierarchical model of videos, many researchers are moving into this area and developing algorithms for visual content interpretation, analysis, and management, where content-based video retrieval and content-based video coding [1] stand out as the most representative examples in this trend.

To build a system and manage the structure of videos, it is widely recognised that the first step would be the automatic detection of shot cuts or their boundaries to divide the video sequence into manageable sections, where the visual content remains consistent in terms of camera operations and other visual events. From the media production side, meaningful stories and sceneries are generated via sequences of video editing, which resulted in a range of different shot boundaries, including abrupt shot boundary, dissolved shot boundary, fade in, and fade out. These shot boundaries can be grouped into two categories according to the duration of the change. One is called abrupt shot cut if an instantaneous change happens from one shot to another. This type of shot boundaries is primarily used by editors to cut scenes into consistent sections. The other one is called gradual shot boundary including fade in, fade out and dissolve because the shot changes gradually. A fade is defined as a shot appear or disappear through several of frames. Dissolve is made when a shot fades in whilst another shot fades out. Samples of these shot boundaries are illustrated in Fig. 1.

Corresponding to these shot boundaries as illustrated in Fig. 1, many algorithms have been proposed and reported in the published literature [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Pixel-based difference analysis remains to be one of the most adopted approaches to find the dissimilarity between shot boundaries. Exploitation of temporal information, such as those based on motion estimation and compensation techniques [13], represents another major direction for shot cut detection, in which the large difference value caused by the global motion can be easily detected and extracted by a simple block matching procedure. In Ref. [2], shot cut boundaries are detected by calculating the normalised correlation among blocks and locating the maximum correlation coefficient in the frequency domain. In line with the motion information, spatial information such as edges etc. is often regarded as another important feature for shot cut detection. The edge tracking algorithm reported in [8] stands out as a representative example for this direction of research, where the proposed techniques are mainly based on the principle that, when most of the edge information is lost in the consecutive frames, a shot cut can be declared. In Ref. [4], the feature of edge pixel count is proposed for shot cut detection, where Sobel edge detector is used. But the most widely used feature is colour histogram, examples of which includes histogram intersection, “twin-comparison”, local histogram etc. [3], [4], [5], [6]. These histogram features, however, have drawbacks in the sense that, when the global motion is fast or the contrast of the frames is low, these colour histogram based techniques will miss many shot boundaries, and thus their recalling performance becomes poor. In Ref. [7], a visual rhythm-based technique is reported, which produced good experimental results in shot cut detection. Due to the complexity of the algorithm design, however, this technique can only be used for off-line shot cut detection. In contrast to most existing approaches, we propose a fuzzy logic approach in this paper to exploit the feature-based techniques and detect shot boundaries. The advantages of our contribution can be highlighted as: (i) a range of features can be integrated by fuzzy logic operation to exploit their individual strength collectively; and (ii) while directly thresholding features remains sensitive to noises, selecting threshold in fuzzy domain provides a buffered operation and thus makes the detection more reliable. The rest of this paper is organised as follows. Section 2 describes detailed design of the proposed algorithm. Section 3 reports experimental results in comparison with three existing algorithms, and Section 4 provides conclusions.

Section snippets

Feature extraction and mode selector design

To detect relatively abrupt scene changes between boundaries of shots, we propose to combine a range of representative features to construct a hybrid feature for shot cut detection. By representative, we mean those features that: (i) are widely used in existing shot cut detection algorithms; (ii) are relatively mature that substantial evaluations have been reported and supportive results obtained in the literature; and (iii) whose implementation does not incur intensive computing cost and their

Experimental result

To evaluate the proposed temporal segmentation algorithms, we need to address two important issues to prepare for experiments design. The first issue is the test data set and the second issue is the selection of a benchmark out existing work in relevant areas. In order to ensure that the proposed algorithm can be evaluated by any researcher in comparison with any other algorithms to be developed in the future, we select two open test data sets, one is publicly available via Internet download

Conclusions

In this paper, we proposed a fuzzy logic approach for temporal segmentation of videos, where a number of features are integrated towards better performances of shot cut detection. These features include color histogram intersection, motion compensation, texture change, and edge variances. Experimental results support that the proposed algorithm is effective in video segmentation benchmarked by three existing algorithms and measured by precision and recall rates.

Acknowledgment

The authors wish to acknowledge the financial support from the EU IST Framework-6 programme under the IP Project: Live staging of media events (IST-04-027312).

References (19)

C. Lo et al.
Video segmentation using a histogram-based fuzzy c-means clustering algorithm
Comput. Standards & Interfaces
(2001)
R.S. Jadon et al.
A fuzzy theoretic approach for video segmentation using syntactic features
Pattern Recognition Lett.
(2001)
S.J.F. Guimaraes et al.
Video segmentation based on 2D image analysis
Pattern Recognition Lett.
(2003)
T.Y. Liu et al.
A new cut detection algorithm with constant false-alarm ratio for video segmentation
J. Vis. Commun. Image R.
(2004)
J. Jiang et al.
Video extraction for fast content access to MPEG compressed videos
IEEE Trans. Circuits, Systems Video Technol.
(2004)
S. Porter et al.
Temporal video segmentation and classification of edit effects
Image Vision Comput.
(2003)
A. Hanjalic
Shot-boundary detection: unraveled and resolved?
IEEE Trans. Circuits Syst. Video Technol.
(2002)
C. Huang et al.
A robust scene-change detection method for video segmentation
IEEE Trans. Circuits Syst. Video Technol.
(2001)
A. Whitehead, P. Bose, R. Laganiere, Feature based cut detection with automatic threshold selection, Proceedings,...

There are more references available in the full text version of this article.

Cited by (71)

Exploring global diverse attention via pairwise temporal relation for video summarization
2021, Pattern Recognition
Citation Excerpt :
Finally, we conclude this paper. Video summarization [15] has been a long-standing problem in multimedia analysis with great practical potential, and lots of relevant works have been explored in recent years. Traditional approaches can be generally divided into two categories, i.e., unsupervised learning and supervised learning.
Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames. Particularly, the GDA module has two advantages: (1) it models the relations within paired frames as well as the relations among all pairs, thus capturing the global attention across all frames of one video; (2) it reflects the importance of each frame to the whole video, leading to diverse attention on these frames. Thus, SUM-GDA is beneficial for generating diverse frames to form satisfactory video summary. Extensive experiments on three data sets, i.e., SumMe, TVSum, and VTW, have demonstrated that SUM-GDA and its extension outperform other competing state-of-the-art methods with remarkable improvements. In addition, the proposed models can be run in parallel with significantly less computational costs, which helps the deployment in highly demanding applications.
A vague set approach for identifying shot transition in videos using multiple feature amalgamation
2019, Applied Soft Computing Journal
Citation Excerpt :
For precise detection of shot boundaries, fuzzy logic, artificial neural networks, genetic algorithms, support vector machines, rough sets etc. have been applied in several works. Fang et al. [116] adopted an approach based on fuzzy logic to integrate several features extracted from the frames. A mode selector is used to combine several algorithms in order to detect abrupt and gradual transitions.
Shot boundary detection (SBD) is the preliminary and most significant step in Content Based Video Retrieval (CBVR). As such the effectiveness of a CBVR system depends heavily on reliable detection of shot boundaries. In this work, a simple yet effective technique for amalgamating several distance features extracted from video frames has been proposed. The aim here is to develop a technique which is able to produce a better distance feature from the existing ones by hybridizing several distance metrics. In the proposed model, any number of distance features can be incorporated and fused together. The resultant feature is not only more robust but also immune to features which are inefficient. Robustness of the proposed method is tested by combining several low performing features with the more efficient ones. Several statistical amalgamation functions are also tested for determining the most efficient one in terms of F1 score. The power of vague sets has been harnessed to detect the shot boundaries effectively using the resultant distance feature. The proposed method is proved to be effective by means of the results obtained, which show that multiple feature amalgamation can lead to a hybrid distance feature which performs better than the best feature incorporated for SBD. The proposed technique is analyzed using ANOVA. A comparison with the other existing methods portray the efficacy of the proposed approach. This method can also be applied for other research problems where several features are to be fused together for producing superior results than the ones obtained by individual methods.
Video shot-boundary detection: issues, challenges and solutions
2024, Artificial Intelligence Review
WOA-FNN: An innovative hybrid optimization technique for effective detection of shot boundaries
2023, Conference Proceedings - 2023 IEEE Silchar Subsection Conference, SILCON 2023
CNN-Based Temporal Video Segmentation Using a Nonlinear Hyperbolic PDE-Based Multi-Scale Analysis
2023, Mathematics
Adaptive Multiview Graph Difference Analysis for Video Summarization
2022, IEEE Transactions on Circuits and Systems for Video Technology

View all citing articles on Scopus

View full text

A fuzzy logic approach for detection of video shot boundaries

Abstract

Introduction

Section snippets

Feature extraction and mode selector design

Experimental result

Conclusions

Acknowledgment

Comput. Standards & Interfaces

Pattern Recognition Lett.

Pattern Recognition Lett.

J. Vis. Commun. Image R.

Video extraction for fast content access to MPEG compressed videos

IEEE Trans. Circuits, Systems Video Technol.

Temporal video segmentation and classification of edit effects

Image Vision Comput.

Shot-boundary detection: unraveled and resolved?

IEEE Trans. Circuits Syst. Video Technol.

A robust scene-change detection method for video segmentation

IEEE Trans. Circuits Syst. Video Technol.