Motion detection with pyramid structure of background model for intelligent surveillance systems

doi:10.1016/j.engappai.2012.02.002

Engineering Applications of Artificial Intelligence

Volume 25, Issue 7, October 2012, Pages 1338-1348

https://doi.org/10.1016/j.engappai.2012.02.002 Get rights and content

Abstract

This paper proposes a pyramidal background matching structure for motion detection. The proposed method utilizes spectral, spatial, and temporal features to generate a pyramidal structure of the background model. After performing the background subtraction based on the proposed background model, the moving targets can be accurately detected at each frame of the video sequence. In order to produce high accuracy for the motion detection, the proposed method also further includes a noise filter based on Bezier curve to smooth noise pixels, after which the binary motion mask can be computed by the proposed threshold function. Experimental results demonstrate that the proposed method substantially outperforms existing methods by perceptional evaluation.

Introduction

Video surveillance systems are very essential for safety and security. Actually, the automatic surveillance system has tremendously progressed due to the high applicability in public institutions, private firms, and houses. In fact, smart surveillance is a hot issue of extensive research, such as human actions recognition (Park and Aggarwal, 2004), traffic flow visualization (Shastry and Schowengerdt, 2005), association analysis of home videos (Pan and Ngo, 2007), home activity recognition (Naeem and Ham, 2009), explorative visualization and analysis (Buter et al., 2011), telecommunication applications (Rahman and Pathan, 2011), human–machine interaction (Halim et al., 2011), and many others. In the developing video surveillance systems, several significant functionalities must be taken into consideration, but not limited to, motion detection (Wren et al., 1997, Manzanera and Richefeu, 2004, Manzanera and Richefeu, 2007, Shoushtarian and Ghasem-aghaee, 2003, Wang et al., 2008), tracking (McFarlane and Schofield, 1995, Stauffer and Grimson, 2000), identification (Wang et al., 2003, Hayfron-acquah et al., 2003, Chong and Tanaka, 2010), data hiding (Wang et al., 2010), and feature recognition (Panganiban et al., 2011). This paper focuses on the design of motion detection since the first function significantly affects the performance of the surveillance system.

In general, motion detection can be performed by three categorized methods, including temporal difference, optical flow, and background subtraction methods (Hu et al., 2004). Temporal difference method can effectively accommodate environmental changes, but the shapes of moving objects are often not complete. On the other hand, optical flow method generally shows the projected motion on the image plane with good approximation. However, one common limitation of optical flow is that the computational complexity is often too high, making it difficult to implement. Background subtraction method consists of detecting moving objects that deviate from a maintained up-to-date background model. This is the most popular method for motion detection because it requires less computational complexity and provides high quality motion information. In other words, the background subtraction method is the most effective way to solve motion detection problems. The existing method of background subtraction computes the absolute difference between each pixel of the incoming video frame and background model. The threshold is then applied to get the binary motion detection mask (Pai et al., 2010). Although the existing background subtraction method can be easily implemented, threshold selection is still a critical operation for the noise immunity.

In order to distinguish background and foreground components, spectral, spatial, and temporal features are usually extracted from video sequences. Spectral features represent the gray-scale of an image frame. Spatial features are associated with regional variations of the local structure in the same frame. Temporal features represent changes in pixels within video streams. The organization of existing background models are broadly classified into pixel-based methods and block-based methods. Pixel-based methods use gray-level changes for each pixel in the video sequence to represent the spectral and temporal features. Block-based methods utilize variations within spaces between different frames for the representation of spatial and temporal features.

This paper presents a new background model that incorporates both pixel-based and block-based methods to represent the spectral, spatial, and temporal features in a video stream. The organization of the proposed method is as follows:

(1)
An intelligent matching process that uses both pixel-based and block-based techniques for producing an updated background model.
(2)
Using Bezier curve smoothing method to suppress the possible noise pixels.
(3)
Achieving complete motion detection through the proposed automatic selection of a threshold using Probability Mass Function and Cumulative Distribution Function after deriving a good quality background model.

Compared with other state-of-the-art methods, our method is the most effective, as indicated by the qualitative and quantitative results of the performance study, which presented a wide range of natural video sequences. The rest of this paper is organized as follows: Section 2 gives a fairly compact overview of some of the compared approaches for background subtraction. Section 3 describes the proposed method in detail. Section 4 reports the experimental results and discusses this paper. Section 5 contains our concluding remarks.

Section snippets

Related work

Typically, reliable background models can offer significantly improved motion detection masks that are able to detect moving objects completely. Some state-of-the-art methods of background subtraction include Simple Background Subtraction (SBS), Running Average (RA) (Wren et al., 1997), $Σ - Δ$ Estimation (SDE) (Manzanera and Richefeu, 2004), Multiple $Σ - Δ$ Estimation (MSDE) (Manzanera and Richefeu, 2007), Simple Statistical Difference (SSD), Temporal Median Filter (TMF) (Shoushtarian and

Proposed method

In this section, a novel flowchart of a background subtraction approach is presented. The proposed method makes use of spectral, spatial, and temporal features extracted from the video sequence in order to completely achieve high performance detection of moving objects.

Initially, the proposed background model adopts both block-based and pixel-based matching methods for selecting the background candidates. After each background candidate is matched using the proposed intelligent matching

Experimental results

Using the method proposed in this paper along with several other state-of-the-art methods, experimental results have been produced for several natural video sequences and analyzed both qualitatively and quantitatively by our performance study. Table 1 reports 12 different video sequences in representation of typical situations which are critical for video surveillance systems. Both indoor and outdoor environments are presented with overall image size, noise level, object class, object size, and

Conclusion

We presented a novel background model modified by the best background candidates based on the background matching framework. After a high quality background model is generated, the Bezier curve smoothing method can reduce possible noise pixels to refine the subtraction results for motion detection. Finally, an intelligent threshold function was proposed by PMF and CDF to detect objects pixels for video surveillance systems. Compared with other state-of-the-art methods, the efficacy of our

Acknowledgment

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract No. NSC 100-2628-E-027-012-MY3.

References (31)

J.B. Hayfron-Acquah et al.
Automatic gait recognition by symmetry analysis
Pattern Recognition Lett.
(2003)
A. Manzanera et al.
A new motion detection algorithm based on $Σ - Δ$ background estimation
Pattern Recognition Lett.
(2007)
Y.-T. Pai et al.
Adaptive thresholding algorithm: efficient computation technique based on intelligent block detection for degraded document images
Pattern Recognition
(2010)
M. Albanese et al.
A constrained probabilistic Petri net framework for human activity detection in video
IEEE Trans. Multimedia
(2008)
D. Avitzour
Novel scene calibration procedure for video surveillance systems
IEEE Trans. Aerosp. Electron. Syst.
(2004)
F. Bartolini et al.
Image authentication techniques for surveillance applications
Proc. IEEE
(2001)
B. Buter et al.
Explorative visualization and analysis of a social network for arts: the case of deviantART
J. Convergence
(2011)
C.-Y. Chen et al.
A visible/infrared fusion algorithm for distributed smart cameras
IEEE J. Sel. Top. Signal Process.
(2008)
M.-J. Chen et al.
Spatial and temporal error concealment algorithms of shape information for mpeg-4 video
IEEE Trans. Circuits Syst. Video Technol.
(2005)
R.M. Chong et al.
Motion blur identification using maxima locations for blind colour image restoration
J. Convergence
(2010)

P. Crouch et al.

The De Casteljau algorithm on lie groups and spheres

J. Dyn. Control Syst.

(1999)

G. Gualdi et al.

Video streaming for mobile video surveillance

IEEE Trans. Multimedia

(2008)

Z. Halim et al.

Measuring entertainment and automatic generation of entertaining games

Int. J. Inf. Technol. Commun. Convergence

(2011)

W. Hu et al.

A survey on visual surveillance of object motion and behaviors

IEEE Trans. Syst. Man Cybern. Part C-Appl. Rev.

(2004)

C.-K. Liang et al.

Analysis and compensation of rolling shutter effect

IEEE Trans. Image Process.

(2008)

Cited by (31)

Motion detection based on the combining of the background subtraction and the structure-texture decomposition
2015, Optik
Citation Excerpt :
The spatio-temporal processing is used to eliminate the “non significant” pixels from the detection, enhance the segmentation of the moving object, and reduce the “ghost effect” that produces false detection. Another method provides a pyramidal structure for motion detection (MDPS) [9]. The proposed method uses the spatial and temporal characteristics to generate a pyramidal structure of the background model.
To detect the moving objects in a video sequence based on background subtraction approaches, a background model should be estimated at the first time before subtract it from each image of the sequence and then segmenting the moving objects. In this paper, we present a new approach based on the combining of the background subtraction and the structure–texture decomposition (BS–STD). First, each gray-level image of the sequence will be decomposed on two components, structure and texture/noise by applying the Osher and Vese algorithm. The structure component of each image of the sequence will be taken to generate the background model using the median filter. The absolute difference used to subtracting the background before compute the binary image of the moving objects using the threshold generated by Otsu's method. The structure–texture decomposition (STD) is also used to ameliorate the results of methods, the background estimation algorithm with Σ − Δ (SDE), Simple Statistical Difference (SSD) and Motion detection with pyramid structure of background model (MDPS). The experimental results demonstrate that our approach is effective and accurate for moving objects detection and the structure–texture decomposition adds improvements to the results of state-of-the-art methods.
Saliency-directed prioritization of visual data in wireless surveillance networks
2015, Information Fusion
Citation Excerpt :
They modeled each pixel as a group of adaptive SIFT flow descriptors, which are computed over a rectangular region around the pixel, and dynamically updated the background model. Huang and Cheng [32] presented a pyramidal background matching structure for motion detection in surveillance systems. This method utilizes spectral, spatial, and temporal features to generate a pyramid structure of the background model.
In wireless visual sensor networks (WVSNs), streaming all imaging data is impractical due to resource constraints. Moreover, the sheer volume of surveillance videos inhibits the ability of analysts to extract actionable intelligence. In this work, an energy-efficient image prioritization framework is presented to cope with the fragility of traditional WVSNs. The proposed framework selects semantically relevant information before it is transmitted to a sink node. This is based on salient motion detection, which works on the principle of human cognitive processes. Each camera node estimates the background by a bootstrapping procedure, thus increasing the efficiency of salient motion detection. Based on the salient motion, each sensor node is classified as being high or low priority. This classification is dynamic, such that camera nodes toggle between high-priority and low-priority status depending on the coverage of the region of interest. High-priority camera nodes are allowed to access reliable radio channels to ensure the timely and reliable transmission of data. We compare the performance of this framework with other state-of-the-art methods for both single and multi-camera monitoring. The results demonstrate the usefulness of the proposed method in terms of salient event coverage and reduced computational and transmission costs, as well as in helping analysts find semantically relevant visual information.
Online adaptive motion model-based target tracking using local search algorithm
2015, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Visual tracking is an important topic in computer vision which has many applications, such as surveillance (Benfold and Reid, 2011; Huang and Fu, 2011; Huang and Cheng, 2012; Rho et al., 2012) activity recognition (Choi and Savarese, 2012; Huang et al., 2014), robotics (Mekonnen et al., 2013; Richa et al., 2010) video summarization (Tavassolipour et al., 2013), and human–computer interaction (HCI) (Lupu et al., 2013; Xu and Lu, 2014).
An adaptive tracker to address the problem of tracking objects which undergo abrupt and significant motion changes is introduced. Abrupt motion of objects is an issue which makes tracking a challenging task. To address this problem, a new adaptive motion model is proposed. The model is integrated into the sequential importance resampling particle filter (SIR PF), which is the most popular probabilistic tracking framework. In this model, in each time step, if necessary, the particles’ configurations are updated by using feedback information from the observation likelihood. In order to overcome the local-trap problem, local search algorithm with best improvement strategy is used to update particles’ configurations. Then, the motion model is updated online with respect to the configurations of the best particle in the current and previous time steps. By using this adaptive model, a more robust tracking is achieved to abrupt significant motion changes. The tracker is experimentally compared to other state-of-the-art trackers on BoBoT dataset. The experimental results confirm that the tracker outperforms the related trackers in many cases by having better PASCAL score. Furthermore, this tracker improves the accuracy of the conventional SIR PF approximately 15%.
Generation of human computational models with knowledge engineering
2014, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Within the modelling and simulation community, the definition of validation is the process of determining to what extent a computerized model or simulation and their associated data are accurate representations of the real world from the perspective of the intended use (Schlesinger et al., 1979). CMHBs are used for testing AmI services and applications (e.g. an indoor location service, an in-home monitoring service for elders, motion detection services, Huang and Cheng, 2012, energy efficiency measures, Silva et al., 2012, and etcetera) before their real deployment. CMHBs represent users of the service or application.
The Ambient Intelligence (AmI) paradigm envisions systems whose central entity is the user. AmI integrates technologies such as Artificial Intelligence, implicit Human Computer Interaction, and Ubiquitous Services. Each capability of AmI systems is oriented towards assistance of humans at work, in the classroom, or even at home. In consequence, the AmI development process usually incorporates the final user since the first stages. However, having users available during all this long process is not always possible. Agent-based social simulations where the users׳ role is played by simulated entities can be used to make the AmI development process faster and more effective. In this scenario, the modelling of CMHBs (Computational Models of Human Behaviour) is a major challenge. To address this issue, this paper proposes a methodology whose main contributions are: (1) the use of domain experts׳ knowledge to create CMHBs; (2) a common methodological framework to develop CMHBs by combining information obtained from sensors׳ perceptions and experts׳ experiences; and, (3) open source tools to support this development paradigm. The paper also presents a full case of study in a hospital which illustrates: the number of recommendations made by the methodology; the techniques proposed (mainly the use of ontologies and temporal reasoning); and, the potential of the methodology to model the personnel in a hospital.
Genetic programming based blind image deconvolution for surveillancesystems
2013, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In intelligent surveillance systems, high-level interpretation of events within the scene requires low-level vision computing of the image and the moving objects. In real applications, these systems are developed for motion detection (Delgado et al., 2010; Huang and Cheng, in press), fault detection (Jakubek and Strasser, 2004; Tan et al., 2007; Byttner et al., 2011), road detection (Chen and Tai, 2010), heterogeneous network (Leu et al., in press), smart home environment (Kang et al., in press), scenario-based video-surveillance (Şaykol et al., 2010) and real time skin color detection based surveillance system (Chen et al., in press). The first category of linear restoration approaches requires the complete knowledge of blur function and noise statistics.
Image acquisition, segmentation, object detection and tracking are essential parts of surveillance systems. Usually, image filtering approaches are employed as preprocessing step to reduce the effect of motion or out-of-focus blur problem. In this paper, we propose genetic programming (GP) based blind-image deconvolution filter. A GP based numerical expression is developed for image restoration which optimally combines and exploits dependencies among features of the blurred image. In order to develop such function, first, a set of feature vectors is formed by considering a small neighborhood around each pixel. At second stage, the estimator is trained and developed through GP process that automatically selects and combines the useful feature information under a fitness criterion. The developed function is then applied to estimate the image pixel intensity of the degraded images. The performance of filter function is estimated using various degraded image sequences. Our comparative analysis highlight the effectiveness of GP based proposed filter.
Automatic Object Detection and Direction Prediction of Unmanned Vessels Based on Multiple Convolutional Neural Network Technology
2022, International Journal of Pattern Recognition and Artificial Intelligence

View all citing articles on Scopus

View full text

Motion detection with pyramid structure of background model for intelligent surveillance systems

Abstract

Introduction

Section snippets

Related work

Proposed method

Experimental results

Conclusion

Acknowledgment

Pattern Recognition Lett.

Pattern Recognition Lett.

Pattern Recognition

A constrained probabilistic Petri net framework for human activity detection in video

IEEE Trans. Multimedia

Novel scene calibration procedure for video surveillance systems

IEEE Trans. Aerosp. Electron. Syst.

Image authentication techniques for surveillance applications

Proc. IEEE

Explorative visualization and analysis of a social network for arts: the case of deviantART

J. Convergence

A visible/infrared fusion algorithm for distributed smart cameras

IEEE J. Sel. Top. Signal Process.

Spatial and temporal error concealment algorithms of shape information for mpeg-4 video

IEEE Trans. Circuits Syst. Video Technol.

Motion blur identification using maxima locations for blind colour image restoration

J. Convergence

The De Casteljau algorithm on lie groups and spheres

J. Dyn. Control Syst.

Video streaming for mobile video surveillance

IEEE Trans. Multimedia

Measuring entertainment and automatic generation of entertaining games

Int. J. Inf. Technol. Commun. Convergence

A survey on visual surveillance of object motion and behaviors

IEEE Trans. Syst. Man Cybern. Part C-Appl. Rev.

Analysis and compensation of rolling shutter effect

IEEE Trans. Image Process.