Elsevier

Applied Soft Computing

Volume 36, November 2015, Pages 570-577
Applied Soft Computing

Fuzzy-neural self-adapting background modeling with automatic motion analysis for dynamic object detection

https://doi.org/10.1016/j.asoc.2015.08.007Get rights and content

Highlights

  • We propose a novel fuzzy-neural background modeling robust to scene changes.

  • The model involves self-adapting threshold and learning rates mechanism.

  • The system involves scene analysis to automatically update the model parameters.

  • An automatic optical flow-matting process improves dynamic object segmentation.

  • The model shows competitive performance compared with state-of-the-art models.

Abstract

In this paper we propose a system that involves a Background Subtraction, BS, model implemented in a neural Self Organized Map with a Fuzzy Automatic Threshold Update that is robust to illumination changes and slight shadow problems. The system incorporates a scene analysis scheme to automatically update the Learning Rates values of the BS model considering three possible scene situations. In order to improve the identification of dynamic objects, an Optical Flow algorithm analyzes the dynamic regions detected by the BS model, whose identification was not complete because of camouflage issues, and it defines the complete object based on similar velocities and direction probabilities. These regions are then used as the input needed by a Matte algorithm that will improve the definition of the dynamic object by minimizing a cost function. Among the original contributions of this work are; an adapting fuzzy-neural segmentation model whose thresholds and learning rates are adapted automatically according to the changes in the video sequence and the automatic improvement on the segmentation results based on the Matte algorithm and Optical flow analysis. Findings demonstrate that the proposed system produces a competitive performance compared with state-of-the-art reported models by using BMC and Li databases.

Introduction

Segmentation of dynamic objects for video analysis has turned out to be an indispensable task for different kind of applications such as surveillance systems [1], [2], automatic robot navigation [3], [4]; entertainment industry [5], [6], etc. Nevertheless, because most of these segmentation algorithms are application oriented it is complicated to affirm which of them produce the most accurate definition of dynamic objects.

As reported on many surveys, the most common approach used to identify dynamic objects in video sequences is based on Background Subtraction, BS, algorithms [7], [8], [9]. The first stage of a BS algorithm is to build a background model, B, considering N initial frames of the video sequence, then each incoming frame is subtracted from B and the result is defined as the foreground, F, which may contains the dynamic objects. A very important step on a BS algorithm is the B maintenance to assure that new video circumstances are incorporated into B to avoid false positive regions on F. These subtraction and maintenance steps continue until the end of the video. Therefore, segmentation models based on BS must define optimal threshold values on the subtraction step and learning factors on the maintenance stage.

The most common algorithm used in BS models is based on Gaussian Mixture Models, GMM. Yoshinaga presented in [10] a GMM spatio-temporal BS model that performs a region level statistical analysis with a sensitive selection of 9 parameters which are continuously updated as the video is analyzed. Yoshinaga's model was validated with the Background Models Challenge, BMC, database [11] achieving the best performance reported in the literature with this database. Chen and Ellis proposed in [12] a model based on GMM with an adaptive parameter update defined as SAM. SAM implements special filters to suppress image noise and sudden illumination changes. Also the model can handle shadows and reflection highlight issues and a final morphological operation was incorporated to fill holes in the final F. The segmentation results reported with SAM were not very accurate arguing that the different segmentation issues that affect the F cannot be solved simultaneously by only one BS model because of the different needs associated with the semantic interpretation of the F and B. Spampinato et al. proposed in [13] a texton based kernel density estimation based on color and texture features. Texton was validated with three different databases: I2R (available at http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html), Fish4Knowledge and BMC. Texton reported problems with night videos, scare illumination, severe dynamic background and rain/snow scenarios. Models based on biological process have also achieved very accurate definition of dynamic objects, this is the case of Maddalena and Petrosino that proposed the 3dSOBS in [14]. 3dSOBS is a BS model based on the SOM neural architecture where each pixel is represented by a map of 3 × 3 × 5 neurons. Similar to Yoshinaga, the 3dSOBS needs a careful initial definition of parameters (in this case 12 parameters) to produce the F definition. The DR-SOM, proposed in [15], is a neural inspired model approach based on the mechanisms of the visual cortex. The neural architecture of DR-SOM is mainly SOM. DR-SOM reported very accurate segmentation results on dynamic background and illumination changes. Similar than SAM, DR-SOM has the capability to auto-adapt its parameters as the video is analyzed. DR-SOM was validated with BMC achieving the second best performance compared against State of the Art, SoA, models.

Based on the works aforementioned we can appreciate that even when some of them auto-adapt their parameters as the analysis is carried out, the models tend to be very sensitive to their initial definition. Besides, most of these models are only validated with one video database, leaving as an open question how will be their F results with different video scenarios by using the same initial parameter definition. In addition, both statistical and neural inspired algorithms have in common that the model of each pixel is defined by a number of statistical distributions or neurons. However, the parameters in the neural models represent a lower computational burden. For these reasons, a new neural BS algorithm, where the B maintenance is highly adaptive is proposed in this paper. Depending on the difference between the input frame and the background model the learning parameters are treated differently considering three possible scenarios: a stable scene, scene with normal changes and a scene with drastic changes. The last stage in most BS models considers morphological operations to improve the segmentation results. In our proposed model, an automatic motion analysis and an optimization function is used to improve the F results. Therefore the proposed system is completely automatic and auto adaptive.

In order to test our auto-adaptive feature to any video condition, we decided to perform our validation by using two different databases with the same initial parameter definition. These databases are BMC [11] and Li [16]. We compare our proposed model with SoA demonstrating our high accuracy in the F definition and competitive performance. As our knowledge, this validation has not been performed previously by any other authors, therefore, it represents a novel BS model validation.

The rest of this paper proceeds as follows: Section 2 describes the different modules of the dynamic segmentation model proposed in this paper explaining how they are combined in an automatic way; Section 3 presents qualitative and quantitative results and Section 4 expose our conclusions.

Section snippets

Background modeling

Our proposed BS model is based on the SOM neural architecture working in combination with a Fuzzy System. Neural Networks (NN) and Fuzzy Systems are among the soft computing theories most frequently adopted in computer vision literature [17]. NN has demonstrated their adaptability to changes in the environment and their capacity to learn and incorporate new representations of the input space, whereas Fuzzy Systems allows handling the imprecision and uncertainty inherent in the background

Results

The results presented in this section correspond to the evaluation of the proposed algorithm in two different databases, and a comparison with SoA methods evaluated in the BMC database. The metrics reported in Table 1, Table 2 were calculated considering the comparison results of the segmentation methods, and the Ground Truth information provided in each video database.

Conclusions

Because many factors affect the performance of a video segmentation algorithm perfect segmentation of dynamic objects is a very challenging task. In order to deal with some of these factors, this paper presented an approach based on the neural SOM architecture that auto adjusts its parameter as the video is being analyzed. A fuzzy system defines optimal threshold values and a scene analyzes decides the learning factor parameters whose values affect the weight updates of the neurons. Besides,

Acknowledgements

The authors would like to thank FOMIX and TNM under grants CHIH-2012-C03-193760, CHI-IET-2012-105 and CHI-MCIET-2013-230.

Mario I. Chacon-Murguia received the B.Sc. (1982) and the M.Sc. (1985) degrees in EE from the Chihuahua Institute of Technology, ITCH, Mexico, and the Ph.D. (1998) in EE from New Mexico State University, USA. He has developed research projects for several companies. He currently works as a Research Professor at the ITCH. He has more than 140 works in international and national journals and congresses and published 2 books. His current research includes visual perception, computer vision, and

References (28)

  • Xue Bai et al.

    Geodesic matting: a framework for fast interactive image and video segmentation and matting

    Int. J. Comput. Vis.

    (2009)
  • Jue Wang et al.

    Interactive video cutout

    ACM Trans. Graphics

    (2005)
  • S. Brutzer et al.

    Evaluation of background subtraction techniques for video surveillance

    Comput. Vis. Pattern Recogn.

    (2011, June)
  • A. Vacavant et al.

    A Benchmark Dataset for Foreground/Background Extraction

    (2012, November)
  • Cited by (13)

    • Scalability of knowledge distillation in incremental deep learning for fast object detection[Formula presented]

      2022, Applied Soft Computing
      Citation Excerpt :

      Existing deep learning techniques in the literature can be categorized into two groups: two-stage and one-stage architectures. A two-stage architecture first segments the image into regions before extracting features to classify and localize the segmented objects in images [23]. Prominent techniques in the two-stage architecture include Regions with Convolutional Neural Networks (R-CNN) [24], Spatial Pyramid Pooling Networks (SPP-Net) [25], Fast R-CNN [26], and Faster R-CNN [27].

    • Biogeography based optimization method for robust visual object tracking[Formula presented]

      2022, Applied Soft Computing
      Citation Excerpt :

      Soft computing techniques aim to increase robustness against uncertainty and risk, whilst they achieve low costs and reliable results in various applications. In the field of objects tracking, soft computing approaches are categorized into methods based on neural network, fuzzy logic, evolutionary algorithms, swarm intelligence, and hybrid methods [41,42]. In the following, those studies which have been done in multiple objects tracking based on the above mentioned methods is briefly described.

    • Soft Computing based object detection and tracking approaches: State-of-the-Art survey

      2018, Applied Soft Computing Journal
      Citation Excerpt :

      Hybrid methods of object detection and tracking approaches in videos are motivated by the fact because they focus on combining the best properties of two or more object detection and tracking techniques. Recent initiatives such as [126–129,132] etc. shows that researchers take interest in applying hybrid techniques for object detection and tracking in videos to achieve accurate results. They have been successful in addressing issue related to uncertainty in detection process caused by the cited background maintenance phase of Background subtraction approach.

    • Towards the intrahour forecasting of direct normal irradiance using sky-imaging data

      2018, Heliyon
      Citation Excerpt :

      Estimating cloud motion from a sequence of sky images remains a really challenging task due to the non-linear phenomena of cloud formation and deformation, and the non-rigid motion and structure of clouds (Brad and Letia, 2002). In the literature, several methods for estimating cloud motion have been studied: pel-recursive techniques (Skowronski, 1999), Block Matching Algorithms (BMAs) (Song and Ra, 2000), optical flow algorithms (Chen and Mied, 2013) and fuzzy inference systems (Chacon-Murguia and Ramirez-Alonso, 2015). To date, block matching and optical flow algorithms are the most popular of all motion estimation techniques.

    • Self-Organizing Map based Extended Fuzzy C-Means (SEEFC) algorithm for image segmentation

      2017, Applied Soft Computing Journal
      Citation Excerpt :

      Among unsupervised methods the self-organizing map (SOM) [27] is one of the best approaches for this purpose. It may be noted that Neuro-fuzzy hybridization is a widely used tool of soft computing paradigm and has emerged as a promising field of research in recent years [38,39]. Neuro-fuzzy systems combine the advantages of both the uncertainty handling capability of fuzzy systems and the parallel learning ability of neural networks [40].

    View all citing articles on Scopus

    Mario I. Chacon-Murguia received the B.Sc. (1982) and the M.Sc. (1985) degrees in EE from the Chihuahua Institute of Technology, ITCH, Mexico, and the Ph.D. (1998) in EE from New Mexico State University, USA. He has developed research projects for several companies. He currently works as a Research Professor at the ITCH. He has more than 140 works in international and national journals and congresses and published 2 books. His current research includes visual perception, computer vision, and image and signal processing using computational intelligence. He is a Senior member of The IEEE, and member of the National Research System in Mexico.

    Graciela Ramirez-Alonso received the B.Sc. (2002) and the M.Sc. (2004) degrees in Electrical Engineering from the Chihuahua Institute of Technology, Chihuahua, Chih, Mexico. Currently, she is a Ph.D. student of the Institute of Technology, Chihuahua, Chih, Mexico. Her research interest includes computer vision, video signal processing and pattern recognition.

    View full text