Motion detection with pyramid structure of background model for intelligent surveillance systems

https://doi.org/10.1016/j.engappai.2012.02.002Get rights and content

Abstract

This paper proposes a pyramidal background matching structure for motion detection. The proposed method utilizes spectral, spatial, and temporal features to generate a pyramidal structure of the background model. After performing the background subtraction based on the proposed background model, the moving targets can be accurately detected at each frame of the video sequence. In order to produce high accuracy for the motion detection, the proposed method also further includes a noise filter based on Bezier curve to smooth noise pixels, after which the binary motion mask can be computed by the proposed threshold function. Experimental results demonstrate that the proposed method substantially outperforms existing methods by perceptional evaluation.

Introduction

Video surveillance systems are very essential for safety and security. Actually, the automatic surveillance system has tremendously progressed due to the high applicability in public institutions, private firms, and houses. In fact, smart surveillance is a hot issue of extensive research, such as human actions recognition (Park and Aggarwal, 2004), traffic flow visualization (Shastry and Schowengerdt, 2005), association analysis of home videos (Pan and Ngo, 2007), home activity recognition (Naeem and Ham, 2009), explorative visualization and analysis (Buter et al., 2011), telecommunication applications (Rahman and Pathan, 2011), human–machine interaction (Halim et al., 2011), and many others. In the developing video surveillance systems, several significant functionalities must be taken into consideration, but not limited to, motion detection (Wren et al., 1997, Manzanera and Richefeu, 2004, Manzanera and Richefeu, 2007, Shoushtarian and Ghasem-aghaee, 2003, Wang et al., 2008), tracking (McFarlane and Schofield, 1995, Stauffer and Grimson, 2000), identification (Wang et al., 2003, Hayfron-acquah et al., 2003, Chong and Tanaka, 2010), data hiding (Wang et al., 2010), and feature recognition (Panganiban et al., 2011). This paper focuses on the design of motion detection since the first function significantly affects the performance of the surveillance system.

In general, motion detection can be performed by three categorized methods, including temporal difference, optical flow, and background subtraction methods (Hu et al., 2004). Temporal difference method can effectively accommodate environmental changes, but the shapes of moving objects are often not complete. On the other hand, optical flow method generally shows the projected motion on the image plane with good approximation. However, one common limitation of optical flow is that the computational complexity is often too high, making it difficult to implement. Background subtraction method consists of detecting moving objects that deviate from a maintained up-to-date background model. This is the most popular method for motion detection because it requires less computational complexity and provides high quality motion information. In other words, the background subtraction method is the most effective way to solve motion detection problems. The existing method of background subtraction computes the absolute difference between each pixel of the incoming video frame and background model. The threshold is then applied to get the binary motion detection mask (Pai et al., 2010). Although the existing background subtraction method can be easily implemented, threshold selection is still a critical operation for the noise immunity.

In order to distinguish background and foreground components, spectral, spatial, and temporal features are usually extracted from video sequences. Spectral features represent the gray-scale of an image frame. Spatial features are associated with regional variations of the local structure in the same frame. Temporal features represent changes in pixels within video streams. The organization of existing background models are broadly classified into pixel-based methods and block-based methods. Pixel-based methods use gray-level changes for each pixel in the video sequence to represent the spectral and temporal features. Block-based methods utilize variations within spaces between different frames for the representation of spatial and temporal features.

This paper presents a new background model that incorporates both pixel-based and block-based methods to represent the spectral, spatial, and temporal features in a video stream. The organization of the proposed method is as follows:

  • (1)

    An intelligent matching process that uses both pixel-based and block-based techniques for producing an updated background model.

  • (2)

    Using Bezier curve smoothing method to suppress the possible noise pixels.

  • (3)

    Achieving complete motion detection through the proposed automatic selection of a threshold using Probability Mass Function and Cumulative Distribution Function after deriving a good quality background model.

Compared with other state-of-the-art methods, our method is the most effective, as indicated by the qualitative and quantitative results of the performance study, which presented a wide range of natural video sequences. The rest of this paper is organized as follows: Section 2 gives a fairly compact overview of some of the compared approaches for background subtraction. Section 3 describes the proposed method in detail. Section 4 reports the experimental results and discusses this paper. Section 5 contains our concluding remarks.

Section snippets

Related work

Typically, reliable background models can offer significantly improved motion detection masks that are able to detect moving objects completely. Some state-of-the-art methods of background subtraction include Simple Background Subtraction (SBS), Running Average (RA) (Wren et al., 1997), Σ-Δ Estimation (SDE) (Manzanera and Richefeu, 2004), Multiple Σ-Δ Estimation (MSDE) (Manzanera and Richefeu, 2007), Simple Statistical Difference (SSD), Temporal Median Filter (TMF) (Shoushtarian and

Proposed method

In this section, a novel flowchart of a background subtraction approach is presented. The proposed method makes use of spectral, spatial, and temporal features extracted from the video sequence in order to completely achieve high performance detection of moving objects.

Initially, the proposed background model adopts both block-based and pixel-based matching methods for selecting the background candidates. After each background candidate is matched using the proposed intelligent matching

Experimental results

Using the method proposed in this paper along with several other state-of-the-art methods, experimental results have been produced for several natural video sequences and analyzed both qualitatively and quantitatively by our performance study. Table 1 reports 12 different video sequences in representation of typical situations which are critical for video surveillance systems. Both indoor and outdoor environments are presented with overall image size, noise level, object class, object size, and

Conclusion

We presented a novel background model modified by the best background candidates based on the background matching framework. After a high quality background model is generated, the Bezier curve smoothing method can reduce possible noise pixels to refine the subtraction results for motion detection. Finally, an intelligent threshold function was proposed by PMF and CDF to detect objects pixels for video surveillance systems. Compared with other state-of-the-art methods, the efficacy of our

Acknowledgment

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract No. NSC 100-2628-E-027-012-MY3.

References (31)

  • J.B. Hayfron-Acquah et al.

    Automatic gait recognition by symmetry analysis

    Pattern Recognition Lett.

    (2003)
  • A. Manzanera et al.

    A new motion detection algorithm based on Σ-Δ background estimation

    Pattern Recognition Lett.

    (2007)
  • Y.-T. Pai et al.

    Adaptive thresholding algorithm: efficient computation technique based on intelligent block detection for degraded document images

    Pattern Recognition

    (2010)
  • M. Albanese et al.

    A constrained probabilistic Petri net framework for human activity detection in video

    IEEE Trans. Multimedia

    (2008)
  • D. Avitzour

    Novel scene calibration procedure for video surveillance systems

    IEEE Trans. Aerosp. Electron. Syst.

    (2004)
  • F. Bartolini et al.

    Image authentication techniques for surveillance applications

    Proc. IEEE

    (2001)
  • B. Buter et al.

    Explorative visualization and analysis of a social network for arts: the case of deviantART

    J. Convergence

    (2011)
  • C.-Y. Chen et al.

    A visible/infrared fusion algorithm for distributed smart cameras

    IEEE J. Sel. Top. Signal Process.

    (2008)
  • M.-J. Chen et al.

    Spatial and temporal error concealment algorithms of shape information for mpeg-4 video

    IEEE Trans. Circuits Syst. Video Technol.

    (2005)
  • R.M. Chong et al.

    Motion blur identification using maxima locations for blind colour image restoration

    J. Convergence

    (2010)
  • P. Crouch et al.

    The De Casteljau algorithm on lie groups and spheres

    J. Dyn. Control Syst.

    (1999)
  • G. Gualdi et al.

    Video streaming for mobile video surveillance

    IEEE Trans. Multimedia

    (2008)
  • Z. Halim et al.

    Measuring entertainment and automatic generation of entertaining games

    Int. J. Inf. Technol. Commun. Convergence

    (2011)
  • W. Hu et al.

    A survey on visual surveillance of object motion and behaviors

    IEEE Trans. Syst. Man Cybern. Part C-Appl. Rev.

    (2004)
  • C.-K. Liang et al.

    Analysis and compensation of rolling shutter effect

    IEEE Trans. Image Process.

    (2008)
  • Cited by (31)

    • Motion detection based on the combining of the background subtraction and the structure-texture decomposition

      2015, Optik
      Citation Excerpt :

      The spatio-temporal processing is used to eliminate the “non significant” pixels from the detection, enhance the segmentation of the moving object, and reduce the “ghost effect” that produces false detection. Another method provides a pyramidal structure for motion detection (MDPS) [9]. The proposed method uses the spatial and temporal characteristics to generate a pyramidal structure of the background model.

    • Saliency-directed prioritization of visual data in wireless surveillance networks

      2015, Information Fusion
      Citation Excerpt :

      They modeled each pixel as a group of adaptive SIFT flow descriptors, which are computed over a rectangular region around the pixel, and dynamically updated the background model. Huang and Cheng [32] presented a pyramidal background matching structure for motion detection in surveillance systems. This method utilizes spectral, spatial, and temporal features to generate a pyramid structure of the background model.

    • Online adaptive motion model-based target tracking using local search algorithm

      2015, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Visual tracking is an important topic in computer vision which has many applications, such as surveillance (Benfold and Reid, 2011; Huang and Fu, 2011; Huang and Cheng, 2012; Rho et al., 2012) activity recognition (Choi and Savarese, 2012; Huang et al., 2014), robotics (Mekonnen et al., 2013; Richa et al., 2010) video summarization (Tavassolipour et al., 2013), and human–computer interaction (HCI) (Lupu et al., 2013; Xu and Lu, 2014).

    • Generation of human computational models with knowledge engineering

      2014, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Within the modelling and simulation community, the definition of validation is the process of determining to what extent a computerized model or simulation and their associated data are accurate representations of the real world from the perspective of the intended use (Schlesinger et al., 1979). CMHBs are used for testing AmI services and applications (e.g. an indoor location service, an in-home monitoring service for elders, motion detection services, Huang and Cheng, 2012, energy efficiency measures, Silva et al., 2012, and etcetera) before their real deployment. CMHBs represent users of the service or application.

    • Genetic programming based blind image deconvolution for surveillancesystems

      2013, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      In intelligent surveillance systems, high-level interpretation of events within the scene requires low-level vision computing of the image and the moving objects. In real applications, these systems are developed for motion detection (Delgado et al., 2010; Huang and Cheng, in press), fault detection (Jakubek and Strasser, 2004; Tan et al., 2007; Byttner et al., 2011), road detection (Chen and Tai, 2010), heterogeneous network (Leu et al., in press), smart home environment (Kang et al., in press), scenario-based video-surveillance (Şaykol et al., 2010) and real time skin color detection based surveillance system (Chen et al., in press). The first category of linear restoration approaches requires the complete knowledge of blur function and noise statistics.

    View all citing articles on Scopus
    View full text