A real-time shot cut detector: Hardware implementation

https://doi.org/10.1016/j.csi.2006.05.009Get rights and content

Abstract

With the enormous growth in digital audiovisual (AV) information in our life, there is an important need for tools which enable describing the AV content information. In this context, the MPEG-7 standard was developed in order to provide a set of standardized description tools which generate metadata about AV content. However, before any content-based manipulations, the hierarchical structure of video must be determined. This process is known as shot boundary detection or in other case scene change detection. In this paper, an old and reliable method based on local histogram has been used to implement shot cut detector for real-time applications. Since software implementation on PC is not suitable for this algorithm due to the sequential treatments of the processor, we have used an FPGA-based platform.

Introduction

Video data is becoming very important in many application domains such as digital broadcasting, interactive-TV, video-on-demand, computer-based training, and multi-media processing tools. Furthermore the development of the hardware technology and communications infrastructure has made automatic analyzing of video content very challenging.

In this work, which is part of a project thesis, we present the different steps of the hardware implementation of shot cut detector based on local histogram algorithm. This old and reliable approach is a descriptor in the MPEG-7 standard. The objective of the MPEG-7 (“Multimedia Content Description Interface”) standard is to specify a standard set of descriptors and description schemes for describing the content of AV information. It specially standardizes a number of description tools which describe AV content ranging from low-level features to high level semantic information. In other words, it provides a set of standardized description tools which generate metadata (data about data) about AV content by extracting information of interest from it, to facilitate a variety of applications including image and scene retrieval [1]. In this context, the local histogram approach constitutes a low-level feature which is utilized for video segmentation and image and scene retrieval. In order to develop any content-based manipulations on video information, hierarchical structure must be determined. In this way, a standard hierarchical video model was defined as shown in Fig. 1. This model is composed of some elementary units as scenes, shots, and frames. In this structure a shot is defined as an unbroken sequence of frames from a single camera, where a scene is a set of shots with semantic link, location unit and action unit [2].

In produced video such as television or movies, shots are separated by different types of transitions, or boundaries. Although well known video editing programs such as Adobe Premiere or Ulead Media Studio provide more than 100 different types of edits, we classify in general transition effects into two categories [3]. The simplest transition is a cut, an abrupt shot change that occurs between two consecutive frames. Gradual transition such as fades and dissolves are more complex. Shot boundaries are fades when the frames of the shot gradually change from or to black, and can be dissolved when the frames of the first shot are gradually morphed into the frames of the second [4]. Fig. 2 shows an example of transition effects.

Most of the existing methods of video segmentation have to challenge the difficulty of finding shot boundaries in the presence of a camera or object motion and illumination variations which can lead to false detection. In other cases, frames that have different structures but similar color distributions can give a missed detection [5]. The study of the state of the art shows that several methods for SBD were proposed. These methods can operate in different environments such as temporal, frequency, uncompressed and compressed domains.

On the other hand, Dailianas and Lefèvre [6], [7] have distinguished two classes of methods: Those which could be done off-line and have high complexity, and others which are dedicated for real-time applications. In this case, some constraints have to be taken into account. In this paper we have used an old and reliable method based on local histogram and proposed by Nagasaka and Tanaka [8]. They have divided each frame into 16 blocks and computed local histogram before evaluating a difference metric. Histogram-based methods have shown a good performance for shot cut detection.

To operate in real-time condition, computational time of the difference metric mustn't exceed the blanking time which is about 2 ms. Since software implementation on PC is not suitable for the local histogram algorithm due to the serial architecture of the microprocessor, we have designed our system on a hardware platform based on Virtex xcv800 FPGA.

This paper is organized as follows. In Section 2 we present the different methods proposed for the detection of the abrupt shot changes. Section 3 describes the specifications of the local histogram method which we tested on a set of video sequences, in different color spaces, different types of quantization and different formats of sub-sampling. The concept of the hardware design and the interpretation of the hardware implementation results are presented in Section 4. Finally Section 5 brings the conclusions and the future works.

Section snippets

Related work

An important variety of shot boundary detection algorithms was proposed in the last decade. The study of the current state of the art shows that we can classify these algorithms into three generations. The first generation concerns methods which measure distance of similarity between adjacent frames by using elementary features extraction such as pixel differences, global and local histogram differences, motion compensated pixel differences and DCT coefficient differences [9], [10], [11], [12].

Local histogram specifications

The color histogram for an image is constructed by counting the number of pixels of each color. More formally, the color histogram is defined as the probability mass function of the image intensities.

To increase the quality of shot change detection block-based methods were proposed [8], [19], [20], [21]. The main advantage of these methods is their relative insensitivity to noise and camera or object motion.

In this work we have used the approach proposed by Nagasaka [8] who divided each frame

Conclusion

In this study, we have tested and evaluated the local histogram approach across several types of video, in different color spaces, different types of quantization and sub-sampling.

The experiments have shown that the gray space at four levels has presented reliable results and relatively low computation time.

On the other hand, the hardware implementation on a Virtex FPGA-based board has used almost 1% of logical resources plus two Block RAMs. By using a clock system of 50 MHz, the computation

L. Boussaid received his Master in NTSID (Nouvelles Technologies des Systèmes Informatiques Dédiés) from the National School of Engineering of Sfax, Tunisia (2003) and his Diploma in Electrical Engineering from the National School of Engineering of Monastir, Tunisia (1989).

Since 1990, he was a computer systems engineer in ENIM School. Currently, he is a PhD student. His research interests include hardware design of multimedia video content descriptors for real-time applications.

References (24)

  • M. Abdel-Mottaleb

    Content-based image and video access system

  • H.-H. Yu et al.

    A Visual Search System for video and Image Databases

  • Cited by (10)

    • Real-time video segmentation using a vague adaptive threshold

      2020, Hybrid Computational Intelligence: Challenges and Applications
    • FPGA-based SOC for hardware implementation of a local histogram-based video shot detector

      2017, Turkish Journal of Electrical Engineering and Computer Sciences
    • Multi-video processing applications on FPGA

      2015, International Journal of Advanced Media and Communication
    • Partitioning and scheduling technique for run time reconfigured systems

      2011, International Journal of Computer Aided Engineering and Technology
    • Video shot boundary detection using RBFNN minimizing the L-GEM

      2010, 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
    View all citing articles on Scopus

    1. Download : Download full-size image

    L. Boussaid received his Master in NTSID (Nouvelles Technologies des Systèmes Informatiques Dédiés) from the National School of Engineering of Sfax, Tunisia (2003) and his Diploma in Electrical Engineering from the National School of Engineering of Monastir, Tunisia (1989).

    Since 1990, he was a computer systems engineer in ENIM School. Currently, he is a PhD student. His research interests include hardware design of multimedia video content descriptors for real-time applications.

    1. Download : Download full-size image

    A. Mtibaa received his PhD degree in Electrical Engineering at the National School of Engineering of Tunis. Since 1990 he has been an assistant professor in Micro-Electronics and Hardware Design with Electrical Department at the National School of Engineering of Monastir.

    His research interests include high level synthesis, rapid prototyping and reconfigurable architecture for real-time multimedia applications.

    1. Download : Download full-size image

    M. Abid is a professor of electrical engineering at Sfax University in Tunisia. He holds a Diploma in electrical engineering in 1986 from the University of Sfax in Tunisia and received his PhD degree in Computer Science in 1989 at the University of Toulouse in France.

    His current research interests include Hardware–Software codesign, design space exploration and prototyping strategies for real-time systems.

    1. Download : Download full-size image

    M. Paindavoine received the PhD in electronics and signal processing from the Montpellier University, France, in 1982. He was with Fairchild CCD Company for two years as an engineer specializing in CCD sensors.

    He joined the Burgundy University in 1985 as “Maitre de Conferences” and is currently a full professor and member of LE2I, the laboratory of Electronic, Computing and Imaging Sciences, Burgundy University, France. His main research interests are image acquisition and real-time image processing. He is also a member of ISIS (a research group in signal and image processing of the French National Scientific Research Committee).

    View full text