Spatio-contextual Gaussian mixture model for local change detection in underwater video

https://doi.org/10.1016/j.eswa.2017.12.009Get rights and content

Highlights

  • MoG integrated with Wronskian framework for underwater local change detection.

  • Linear dependency of a pixel with background model is tested using Wronskian.

  • Objects can be detected efficiently in blurred and dynamic environment.

  • Adaptive weight updating for background model.

Abstract

In this article, a local change detection technique for underwater video sequences is proposed to detect the positions of the moving objects. The proposed change detection scheme integrates the Mixture of Gaussian (MoG) process in a Wronskian framework. It uses spatiotemporal modes (an integration of spatio-contextual and temporal modes) arising over the underwater video sequences to detect the local changes. The Wronskian framework takes care of the spatio-contextual modes whereas MoG models the temporal modes arising due to inter-dependency of a pixel in a video. The proposed scheme follows two steps: background construction and background subtraction. It takes initial few frames to construct a background model and thereby detection of the moving objects in the subsequent frames. During background construction stage; the linear dependency test between the region of supports/ local image patch in the target image frame and the reference background model are carried out using the Wronskian change detection model. The pixel values those are linearly dependent are assumed to be generated from an MoG process and are modeled using the same. Once the background is constructed, then the background subtraction and update process starts from the next frame. The efficiency of the proposed scheme is validated by testing it on two benchmark underwater video databases: fish4knowledge and underwaterchangedetection and one large scale outdoor video database: changedetection.net. The effectiveness of the proposed scheme is demonstrated by comparing it with eighteen state-of-the-art local change detection algorithms. The performance of the proposed scheme is carried out using one subjective and three quantitative evaluation measures.

Introduction

Underwater visual surveillance has emerged as a potential domain of research in the current decade. This area has got so much focus because of its enormous applications including submarine navigation and surveillance (Vaganay, Elkins, Esposito, O’Halloran, Hover, Kokko, 2006, Weidemann, Fournier, Forand, Mathieu, 2005), conservation of underwater animals (Bickford et al., 2007), deep sea exploration (Dover, 2011), underwater wreck detection (Singh, Adams, Mindell, & Foley, 2000), etc. Surveillance means detection of the moving objects followed by tracking. Detection of moving object is the primary task of any surveillance system. Among all detection algorithms, background subtraction (BGS) (Piccardi, 2004, Sobral, Vacavant, 2014) is popular and powerful one. BGS finds the deviation of each pixel in a frame from the actual background. If the background is not known a priori, then the situation becomes complicated. Thus to resolve this, the background has to be estimated/constructed and needs to be updated regularly. Many different BGS techniques are available in literature (Benezeth, Jodoin, Emile, Laurent, Rosenberger, 2008, McIvor, 2000, Piccardi, 2004). BGS is basically a pixel labeling problem, where each pixel in a frame is labeled either as a foreground or background.

Stauffer and Grimson (2000) have proposed an adaptive background mixture model for real-time object detection from the conventional video scene. Each pixel is modeled as a mixture of Gaussian process and an on-line approximation is used to update the background model. The algorithm yields good results in non-stationary background scenarios. A spatio-contextual temporal mode based change detection algorithm has been proposed by Subudhi, Ghosh, and Ghosh (2013), where the background is modeled in a Wronskian framework. Each pixel is assumed to be generated from a Gaussian distribution. The method yielded good results for dynamic background. In order to model the human interactions, a Bayesian computer vision system has been proposed by Oliver, Rosario, and Pentland (2000). An adaptive eigenspace has been built to model the background by taking few initial frames. The range of appearance that has been observed is described by the constructed eigenspace, which represents only the static part of the scene. Principal Component Analysis (PCA) has been used to reduce the dimension of the eigenspace. Manzanera and Richefeu (2004) have developed a ΣΔ background estimation based motion detector, which is claimed to be very robust and computationally efficient. In order to estimate the temporal statistics of pixels in a frame, the ΣΔ estimator is used, which is a simple recursive and non-linear estimator. Spatial correlation in the frame under consideration is exploited using the classical Markov model. A spatiotemporal framework based local binary pattern (LBP) subtraction algorithm has been proposed by Kumar, Rout, Kumar, Verma, and Kumar (2015), for object detection. Two separate LBP maps were constructed. The first one is constructed in a spatiotemporal framework, whereas the second LBP map is constructed in the spatial plane using an 8-neighborhood. Then the difference between the two LBP maps is compared with a threshold. Pixel locations for which the difference is higher than the threshold are considered to be the foreground pixel. It works well in case of nominal background variations, but it fails in case of camera movement and it cannot handle low illumination case.

A self-balanced sensitive segmenter has been proposed by St-Charles, Bilodeau, and Bergevin (2015). The authors have introduced a strategy to overcome the challenges that concentrate on stabilizing the inner working of a non-parametric model. The picture elements are modeled by the help of a spatiotemporal feature descriptor for higher sensitivity. They came up with a dynamically adjustable variable, which is self-tuned based on the moving average of the minimal L1 distance between the background model and pixel of the current frame. They have used the strategy followed in Hofmann, Tiefenbacher, and Rigoll (2012) except the fact that the minimum distance function gets updated irrespective of the belongingness of a new pixel to the object or the background. Goyat, Chateau, Malaterre, and Trassoudaine (2006) have developed the parametric probabilistic model called Vu Meter technique, where each pixel is modeled using a probability density function, which is characterized by Kronecker delta function. Pixels for which the probability density function value is higher than a threshold are classified as foreground and rest are classified as background. An adaptive background learning method has been proposed by Zhang, Chen, Shyu, and Peeta (2003), for the detection of objects. An unsupervised segmentation algorithm, known as Simultaneous Partition and Class Parameter Estimation (SPCPE) is used to detect the object in the scene. Elgammal, Duraiswami, Harwood, and Davis (2002), have proposed a non-parametric kernel density estimation method for the BGS. The background model estimates the probability of a gray level value based on a sample of gray values of each pixel. However all the above discussed approaches are not evaluated for underwater applications.

Moving object detection from underwater video sequences faces several challenges: underwater haze, decolorization of the object, poor contrast, noise in the image stacks, etc. The poor quality underwater video is because of the fact that the refractive index of the water medium is not uniform everywhere, due to the variation of temperature (Quan & Fry, 1995) and different in-water components like salinity (Jerlov, 1976, Quan, Fry, 1995, Temple, 2007), suspended pollutants (Twardowski et al., 2001), etc. Thus light reflected from the object get deviated from the path-to-camera, resulting in a reduction of object visibility. Some of the light those reflected from the object may get absorbed by the medium, results in increment of surrounding light. Thus, the difference between the object intensity and surrounding intensity reduces, resulting in poor contrast video (Jaffe, 2015, Serikawa, Lu, 2014). The situation gets worse under challenging illumination conditions (Schechner & Karpel, 2004). Another major challenge in the underwater video is non-static background. The non-static attribute of the scene of view is due to camera jitters, underwater vegetation, underwater current and fluctuation of water surface, etc.

Many researchers have contributed in resolving these issues of object detection. Spampinato, Chen-Burger, Nadarajan, and Fisher (2008) proposed an integration of moving average algorithm and adaptive Gaussian mixture model for object detection. Lee, Kim, Kim, Myung, and Choi (2012), developed an efficient object detection technique, which uses color restoration mechanism to enhance the visibility of the scene, before initiating the detection process. Feature-based image matching algorithm has been used to detect the object in the scene.

Zhou, Llewellyn, Wei, Creighton, and Nahavandi (2015) used Gaussian mixture model for construction of the background and detection of jelly fishes and sea snakes in underwater environment. An algorithm based on salient color uniformity and sharp contours has been developed by Rizzini, Kallasi, Oleari, and Caselli (2014) to detect human made artifacts laying on the seabed. Recently, Zhang, Kopanas, Desai, Chai, and Piacentino (2016) proposed an unsupervised convolutional neural network algorithm to detect fish from underwater sequences. A modified non maximum suppression algorithm is also used to suppress false detection. Spampinato et al. (2016) investigated the fine grained fish classification problem in low quality video. They have used Local Ternary Pattern (LTP) for the detection of fish and SVM for the classification of the fish species. A mechanism for underwater object detection using compressed sensing is developed by Qi, Sun, Sun, Lin, and Yao (2016). A fusion model is proposed by Zhu, Chang, Dai, Zheng, and Zheng (2016) to determine the position of the underwater object using saliency-based region merging scheme.

A scheme for detecting underwater animals is developed by Walther, Edgington, and Koch (2004) in which detection of the object is carried out by the saliency-based bottom-up attention scheme on different color planes separately. It works very well with poor visibility condition and the noise handling capability of the algorithm is also satisfactory. Fei, Xueli, and Dongsheng (2009) used Gaussian mixture model for BGS to detect drowning swimmers from the swimming pool. Gaussian averaging scheme on first few frames is carried out by Edgington, Cline, Davis, Kerkez, and Mariette (2006) to construct the background and detection of moving object is carried out using graph cut technique by comparing the constructed background with current frame. Chen, Shen, Fan, Sun, and Xu (2015) proposed a mechanism of detecting underwater object motion in which the distance of the object from the imaging device is estimated and injected to the shape information of the object. It has been assumed that the shape information is available apriori. Dark channel prior (DCP) is used to dehaze the scene of view before the detection process. The DCP alone may lead to decrement in the overall intensity of the enhanced scene, thereby the chance of failure increases. An algorithm for underwater fish detection for moving camera based on deformable multiple kernels has been developed by Chuang, Hwang, Ye, Huang, and Williams (2017). It performs well in case of low illumination underwater videos. However, it fails in case of the poor contrast and dense haze sequences.

It may be summarized from the above survey that, underwater object detection is a challenging task. As per author’s knowledge, there is no work reported in the literature that uses MoG in a Wronskian framework for object detection in underwater video sequences. The incorporation of Wronskian with MoG is in itself a first of its kind attempt to model the background and detect the underwater object. In the proposed scheme the MoG is integrated with a Wronskian framework for underwater local change detection. Here, the spatio-contextual modes arising from Wronskian scheme are integrated with the temporal modes arising through the MoG, to detect the local changes. Hence, it is expected that the spatiotemporal mode may solve the problem of underwater dynamism for object detection. The proposed spatio-contextual temporal modes finding based algorithm takes care of the spatial inter-dependency of a pixel with its N neighborhood as well as the temporal dependency of the pixel value with its previous observations. Each time a new pixel comes in, the linear dependency of it is tested with the background model, which is characterized by the MoG process. In case the new pixel is linearly dependent with any of the mixture components of the MoG, then that particular component’s mean, variance, and weight parameters are updated. If the new pixel does not belong to any of the mixture components, then a new mixture component is introduced in the MoG. The new Gaussian component is assigned an initial lower weight, a new mean and variance; which are calculated by taking mean and variance from a window of size 3 × 3. This process continued for each new coming pixel. It is assumed that all the mixture components of the background model do not contribute significantly to the background model. Thus, the most significant mixture components whose weight to standard deviation ratio is higher, are used to represent the background model. Once the background construction process is over, then the detection process starts. In order to detect the object, Wronskian change detection model is used. If the Wronskian function evaluated for a pixel location (x, y) is lower than an estimated threshold value, then it is labeled as background, thereby the background model parameters get updated, else, the pixel is considered as an object pixel.

In order to validate the effectiveness of the proposed scheme, we have tested it on underwater videos available at fish4knowledge1 and underwaterchangedetection,2 and a large scale outdoor video dataset changedetection.net.3 We have compared the detection results of the proposed scheme with various state-of-the-art BGS algorithms: Adaptive Background Learning (Zhang et al., 2003), Eigen Background (Oliver et al., 2000), Kernel Density Estimation (Elgammal et al., 2002), ΣΔ (Manzanera & Richefeu, 2004), Gaussian modeled Wronskian change detection scheme (Subudhi et al., 2013), Gaussian Mixture Model (Stauffer & Grimson, 2000), Vu Meter (Goyat et al., 2006) and SuBSENSE (St-Charles et al., 2015). We have done a subjective as well as quantitative analysis to validate the detection efficiency of our proposed scheme. Subjective analysis is carried out by reviewing the results of different schemes by seven independent experts. In order to carry out the quantitative analysis, all the above stated BGS techniques are compared with the proposed scheme by using three performance measures: average precision, average recall, and average F-measure.

The rest of the paper is organized as follows. Section 2 describes the use of a Wronskian framework for BGS. This section explicitly discussed the Wronskian change detection model and the Gaussian modeled Wronskian scheme used in literature for object detection. An overview of the proposed background construction and object detection stages along with the block diagram representation are described in Section 3. Section 4 discusses the proposed Gaussian Mixture Model integrated with the Wronskian framework for the local change detection with considered videos and potential applications. The detailed results and discussions with performance evaluation of the proposed scheme are drawn in Section 5. Section 6 draws the conclusions of the proposed scheme.

Section snippets

Wronskian framework for local change detection

In case of a video, there exists a strong inter-dependency between a pixel and its past values. Many algebraic models exist in literature, which can check the linear dependency of a pixel with its past occurred values and one of such linear dependency checking method is using determinant of a Wronskian matrix (Durucan & Ebrahimi, 2001). The test of linear dependency between a pixel and its past values can be done by computing the determinant of the Wronskian matrix at a local image patch taken

Overview of the proposed methodology

This article proposes a BGS algorithm for the underwater environment, which models each pixel as a mixture of Gaussian process in order to take care of the surface water fluctuation, underwater vegetation, illumination variation, etc. In this work, a Mixture of Gaussian (MoG) process is integrated with the Wronskian framework to achieve the objective. This paper work is demonstrated for underwater applications. Fig. 2 depicts the overview of the proposed algorithm. There are two stages of this

Proposed MoG integrated in a Wronskian framework

A mixture of Gaussian (MoG) model is established for each pixel, which models the linear dependency of a pixel with the M number of Gaussian components of weight wi,t. The weight wi,t represents the belongingness of a pixel to the ith Gaussian component at time t. This belongingness is characterized here as the measure of linear dependence of a pixel with the ith component of the MoG model. The values of wi,t are assigned in such a way that the sum of weights will be equal to unity. i=1Mwi,t=1.

Results and discussions

The effectiveness of the proposed scheme is corroborated by comparing it against Gaussian modeled Wronskian scheme (GW) (Subudhi et al., 2013), Adaptive Background Learning (ABL) (Zhang et al., 2003), Eigen Background (EB) (Oliver et al., 2000), ΣΔ (SD) (Manzanera & Richefeu, 2004), Kernel Density Estimation (KDE) (Elgammal et al., 2002), Gaussian Mixture Model (GMM) (Stauffer & Grimson, 2000), SuBSENSE (St-Charles et al., 2015), Vu Meter (Goyat et al., 2006), Adaptive Gaussian Mixture Model

Conclusions

Local change detection for moving object detection from the underwater video sequences has been addressed in this article. The spatio-contextual-temporal modes finding based object detection scheme has been proposed here to exploit the spatial as well as temporal inter-dependency of the image frames. Linear dependency between pixels in temporal direction has been tested using the Wronskian. Those pixels which are having interdependence are assumed to have come from a mixture of Gaussian (MoG)

References (53)

  • C.L.V. Dover

    Tighten regulations on deep-sea mining

    Nature

    (2011)
  • E. Durucan et al.

    Change detection and background extraction by linear algebra

    Proceedings of the IEEE

    (2001)
  • D.R. Edgington et al.

    Detecting, tracking and classifying animals in underwater video

    Proceedings of the IEEE oceans

    (2006)
  • A. Elgammal et al.

    Background and foreground modeling using non-parametric kernel density estimation for visual surveillance

    Proceedings of the IEEE

    (2002)
  • R.H. Evangelio et al.

    Splitting Gaussians in mixture models

    Proceedings of the IEEE ninth international conference on advanced video and signal-based surveillance

    (2012)
  • R.H. Evangelio et al.

    Complementary background models for the detection of static and moving objects in crowded environments

    Proceedings of the eight IEEE international conference on advanced video and signal-based surveillance

    (2011)
  • FeiL. et al.

    Drowning detection based on background subtraction

    Proceedings of the IEEE computer society conference on embedded software and systems

    (2009)
  • Y. Goyat et al.

    Vehicle trajectories evaluation by static video sensors

    Proceedings of the IEEE intelligent transportation systems conference

    (2006)
  • T.S. Haines et al.

    Background subtraction with Dirichlet processes

    Proceedings of the European conference on computer vision

    (2012)
  • F.J. Hernandez-Lopez et al.

    Change detection by probabilistic segmentation from monocular view

    Machine Vision and Applications

    (2014)
  • M. Hofmann et al.

    Background segmentation with feedback: The pixel-based adaptive segmenter

    Proceedings of the IEEE computer vision and pattern recognition workshops

    (2012)
  • J.S. Jaffe

    Underwater optical imaging: The past, the present, and the prospects

    IEEE Journal of Oceanic Engineering

    (2015)
  • N.G. Jerlov

    Marine optics

    (1976)
  • P. Kumar et al.

    Detection of video objects in dynamic scene using local binary pattern subtraction method

    Proceedings of the intelligent computing, communication and devices

    (2015)
  • L. Maddalena et al.

    A self-organizing approach to background subtraction for visual surveillance applications

    IEEE Transactions on Image Processing

    (2008)
  • A. Manzanera et al.

    A robust and computationally efficient motion detection algorithm based on ΣΔ background estimation

    Proceedings of the Indian conference on computer vision, graphics and image processing

    (2004)
  • Cited by (21)

    • Attention-guided dynamic multi-branch neural network for underwater image enhancement

      2022, Knowledge-Based Systems
      Citation Excerpt :

      The main purpose of UIE is to recover a clean image by eliminating degradations (e.g., color deviation and low contrast caused by wavelength-dependent attenuation) from its corresponding degraded version [1–3]. Research on this problem can be used in latent applications, such as underwater detection [4] and marine environmental surveillance [5]. However, it is an extremely challenging and ill-posed task, owing to medium attenuation properties or the diversity of underwater image distributions.

    • Encoder and decoder network with ResNet-50 and global average feature pooling for local change detection

      2022, Computer Vision and Image Understanding
      Citation Excerpt :

      In Subudhi et al. (2013) it is proposed a local change detection scheme that uses the Wronskian scheme to model the spatio-contextual properties of the background which follows Gaussian pdf. Further (Rout et al., 2018), proposed an underwater change detection scheme, where the background in the image sequences is modeled with MoG in the Wronskian function. In Haines and Xiang (2014) it is proposed a background subtraction technique where the Dirichlet process Gaussian mixture model is used to estimate the parameters of the background model.

    • Nested-Net: a deep nested network for background subtraction

      2023, International Journal of Multimedia Information Retrieval
    View all citing articles on Scopus
    View full text