A real-time shot cut detector: Hardware implementation

doi:10.1016/j.csi.2006.05.009

Computer Standards & Interfaces

Volume 29, Issue 3, March 2007, Pages 335-342

https://doi.org/10.1016/j.csi.2006.05.009 Get rights and content

Abstract

With the enormous growth in digital audiovisual (AV) information in our life, there is an important need for tools which enable describing the AV content information. In this context, the MPEG-7 standard was developed in order to provide a set of standardized description tools which generate metadata about AV content. However, before any content-based manipulations, the hierarchical structure of video must be determined. This process is known as shot boundary detection or in other case scene change detection. In this paper, an old and reliable method based on local histogram has been used to implement shot cut detector for real-time applications. Since software implementation on PC is not suitable for this algorithm due to the sequential treatments of the processor, we have used an FPGA-based platform.

Introduction

Video data is becoming very important in many application domains such as digital broadcasting, interactive-TV, video-on-demand, computer-based training, and multi-media processing tools. Furthermore the development of the hardware technology and communications infrastructure has made automatic analyzing of video content very challenging.

In this work, which is part of a project thesis, we present the different steps of the hardware implementation of shot cut detector based on local histogram algorithm. This old and reliable approach is a descriptor in the MPEG-7 standard. The objective of the MPEG-7 (“Multimedia Content Description Interface”) standard is to specify a standard set of descriptors and description schemes for describing the content of AV information. It specially standardizes a number of description tools which describe AV content ranging from low-level features to high level semantic information. In other words, it provides a set of standardized description tools which generate metadata (data about data) about AV content by extracting information of interest from it, to facilitate a variety of applications including image and scene retrieval [1]. In this context, the local histogram approach constitutes a low-level feature which is utilized for video segmentation and image and scene retrieval. In order to develop any content-based manipulations on video information, hierarchical structure must be determined. In this way, a standard hierarchical video model was defined as shown in Fig. 1. This model is composed of some elementary units as scenes, shots, and frames. In this structure a shot is defined as an unbroken sequence of frames from a single camera, where a scene is a set of shots with semantic link, location unit and action unit [2].

In produced video such as television or movies, shots are separated by different types of transitions, or boundaries. Although well known video editing programs such as Adobe Premiere or Ulead Media Studio provide more than 100 different types of edits, we classify in general transition effects into two categories [3]. The simplest transition is a cut, an abrupt shot change that occurs between two consecutive frames. Gradual transition such as fades and dissolves are more complex. Shot boundaries are fades when the frames of the shot gradually change from or to black, and can be dissolved when the frames of the first shot are gradually morphed into the frames of the second [4]. Fig. 2 shows an example of transition effects.

Most of the existing methods of video segmentation have to challenge the difficulty of finding shot boundaries in the presence of a camera or object motion and illumination variations which can lead to false detection. In other cases, frames that have different structures but similar color distributions can give a missed detection [5]. The study of the state of the art shows that several methods for SBD were proposed. These methods can operate in different environments such as temporal, frequency, uncompressed and compressed domains.

On the other hand, Dailianas and Lefèvre [6], [7] have distinguished two classes of methods: Those which could be done off-line and have high complexity, and others which are dedicated for real-time applications. In this case, some constraints have to be taken into account. In this paper we have used an old and reliable method based on local histogram and proposed by Nagasaka and Tanaka [8]. They have divided each frame into 16 blocks and computed local histogram before evaluating a difference metric. Histogram-based methods have shown a good performance for shot cut detection.

To operate in real-time condition, computational time of the difference metric mustn't exceed the blanking time which is about 2 ms. Since software implementation on PC is not suitable for the local histogram algorithm due to the serial architecture of the microprocessor, we have designed our system on a hardware platform based on Virtex xcv800 FPGA.

This paper is organized as follows. In Section 2 we present the different methods proposed for the detection of the abrupt shot changes. Section 3 describes the specifications of the local histogram method which we tested on a set of video sequences, in different color spaces, different types of quantization and different formats of sub-sampling. The concept of the hardware design and the interpretation of the hardware implementation results are presented in Section 4. Finally Section 5 brings the conclusions and the future works.

Section snippets

Related work

An important variety of shot boundary detection algorithms was proposed in the last decade. The study of the current state of the art shows that we can classify these algorithms into three generations. The first generation concerns methods which measure distance of similarity between adjacent frames by using elementary features extraction such as pixel differences, global and local histogram differences, motion compensated pixel differences and DCT coefficient differences [9], [10], [11], [12].

Local histogram specifications

The color histogram for an image is constructed by counting the number of pixels of each color. More formally, the color histogram is defined as the probability mass function of the image intensities.

To increase the quality of shot change detection block-based methods were proposed [8], [19], [20], [21]. The main advantage of these methods is their relative insensitivity to noise and camera or object motion.

In this work we have used the approach proposed by Nagasaka [8] who divided each frame

Conclusion

In this study, we have tested and evaluated the local histogram approach across several types of video, in different color spaces, different types of quantization and sub-sampling.

The experiments have shown that the gray space at four levels has presented reliable results and relatively low computation time.

On the other hand, the hardware implementation on a Virtex FPGA-based board has used almost 1% of logical resources plus two Block RAMs. By using a clock system of 50 MHz, the computation

L. Boussaid received his Master in NTSID (Nouvelles Technologies des Systèmes Informatiques Dédiés) from the National School of Engineering of Sfax, Tunisia (2003) and his Diploma in Electrical Engineering from the National School of Engineering of Monastir, Tunisia (1989).

Since 1990, he was a computer systems engineer in ENIM School. Currently, he is a PhD student. His research interests include hardware design of multimedia video content descriptors for real-time applications.

References (24)

S. Lefèvre
A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval
Real Time Imaging
(2003)
R.S. Jadon
A Fuzzy Theoretic Approach for Video Segmentation using Syntactic Features
P. Browne
Evaluating and Combining Digital Video Shot Boundary Detection Algorithms
R. Lienhart
A systematic method to compare and retrieve video sequences
R. Lienhart
Comparison of automatic shot boundary detection algorithms
S.V. Porter
Detection and classification of shot transitions
J. Mas
Video shot boundary detection using color histogram
A. Dailianas
Comparison of Automatic Video Segmentation Algorithms
A. Nagasaka et al.
Automatic Video Indexing and Full-Video Search for Object Appearances
E. Ardizzone et al.
Automatic video database indexing and retrieval
Multimedia Tools and Applications
(1997)

M. Abdel-Mottaleb

Content-based image and video access system

H.-H. Yu et al.

A Visual Search System for video and Image Databases

Cited by (10)

Real-time video segmentation using a vague adaptive threshold
2020, Hybrid Computational Intelligence: Challenges and Applications
For the last two decades, video shot segmentation has been a widely researched topic in the field of content-based video analysis (CBVA). However, over the course of time, researchers have aimed to improve upon the existing methods of shot segmentation in order to gain accuracy. Video shot segmentation or shot boundary analysis is a basic and vital step in CBVA, since any error incurred in this step reduces the precision of the other steps. The shot segmentation problem assumes greater proportions when detection is preferred in real time. A spatiotemporal fuzzy hostility index (STFHI) is proposed in this work which is used for edge detection of objects occurring in the frames of a video. The edges present in the frames are treated as features. Correlation between these edge-detected frames is used as a similarity measure. In a real-time scenario, the incoming images are processed and the similarities are computed for successive frames of the video. These values are assumed to be normally distributed. The gradients of these correlation values are taken to be members of a vague set. In order to obtain a threshold after defuzzification, the true and false memberships of the elements are computed using a novel approach. The threshold is updated as new frames are buffered in and is referred to as the vague adaptive threshold (VAT). The shot boundaries are then detected based on the VAT. The VAT for detecting the shot boundaries is determined by using the three-sigma rule on the defuzzified membership values. The effectiveness of the real-time video segmentation method is established by an experimental evaluation on a heterogeneous test set, comprising videos with diverse characteristics. The test set consists of videos from sports, movie songs, music albums, and documentaries. The proposed method is seen to achieve an average F1 score of 0.992 over the test set consisting of 15 videos. Videos from the benchmark TRECVID 2001 are selected for comparison with other state-of-the-art-methods. The proposed method achieves very high precision and recall, with an average F1 score of 0.939 on the videos chosen from the TRECVID 2001 dataset. This is a substantial improvement over the other existing methods.
FPGA-based SOC for hardware implementation of a local histogram-based video shot detector
2017, Turkish Journal of Electrical Engineering and Computer Sciences
Multi-video processing applications on FPGA
2015, International Journal of Advanced Media and Communication
Automatic logo transition detection in digital video contents
2012, Pattern Analysis and Applications
Partitioning and scheduling technique for run time reconfigured systems
2011, International Journal of Computer Aided Engineering and Technology
Video shot boundary detection using RBFNN minimizing the L-GEM
2010, 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010

View all citing articles on Scopus

A. Mtibaa received his PhD degree in Electrical Engineering at the National School of Engineering of Tunis. Since 1990 he has been an assistant professor in Micro-Electronics and Hardware Design with Electrical Department at the National School of Engineering of Monastir.

His research interests include high level synthesis, rapid prototyping and reconfigurable architecture for real-time multimedia applications.

M. Abid is a professor of electrical engineering at Sfax University in Tunisia. He holds a Diploma in electrical engineering in 1986 from the University of Sfax in Tunisia and received his PhD degree in Computer Science in 1989 at the University of Toulouse in France.

His current research interests include Hardware–Software codesign, design space exploration and prototyping strategies for real-time systems.

M. Paindavoine received the PhD in electronics and signal processing from the Montpellier University, France, in 1982. He was with Fairchild CCD Company for two years as an engineer specializing in CCD sensors.

He joined the Burgundy University in 1985 as “Maitre de Conferences” and is currently a full professor and member of LE2I, the laboratory of Electronic, Computing and Imaging Sciences, Burgundy University, France. His main research interests are image acquisition and real-time image processing. He is also a member of ISIS (a research group in signal and image processing of the French National Scientific Research Committee).

View full text

A real-time shot cut detector: Hardware implementation

Abstract

Introduction

Section snippets

Related work

Local histogram specifications

Conclusion

Real Time Imaging

Evaluating and Combining Digital Video Shot Boundary Detection Algorithms

A systematic method to compare and retrieve video sequences

Comparison of automatic shot boundary detection algorithms

Detection and classification of shot transitions

Video shot boundary detection using color histogram

Comparison of Automatic Video Segmentation Algorithms

Automatic Video Indexing and Full-Video Search for Object Appearances

Automatic video database indexing and retrieval

Multimedia Tools and Applications

Content-based image and video access system

A Visual Search System for video and Image Databases