Pattern mining-based video saliency detection: Application to moving object segmentation

https://doi.org/10.1016/j.compeleceng.2017.08.029Get rights and content

Abstract

In this paper, we present a new model for spatiotemporal saliency detection. Instead of previous works which combine the image saliency in the spatial domain with motion cues to build their video saliency model, we propose to apply the pattern mining (PM) algorithm. From initial saliency maps computed in spatial and temporal domains, discriminative spatiotemporal saliency patterns can be recognized and their label information is propagated to obtain the final saliency map. Our model ensures a good compromise between image saliency and motion saliency and presents an accurate prediction to estimate salient regions in comparison with other methods for video saliency detection. Finally, as an application of our algorithm, our spatiotemporal saliency map is combined with appearance models and dynamic location models into an energy minimization framework to segment salient moving object. Experiments show a good performance of our algorithm for moving object segmentation (MOS) on benchmark datasets.

Introduction

Recently, the emulation of human cognitive ability to distinct the most attractive regions in image and video is one of the most challenging and progressing trends in computer vision, due to its wide applications such as object detection/segmentation/recognition, content-based image/video compression, visual tracking and many more [1]. The methods of saliency detection can be divided into two categories: bottom-up or top-down approaches. The bottom-up models exploit the features extracted from the image for saliency computation such as intensity, color and orientation. The top-down approaches require an explicit understanding of the context of the image: supervised learning with a specific class is the most adopted principle in this category. A good review of saliency detection models, its applications and classification of them can be found in [1].

In this work, we focus on video saliency detection, where the goal is to identify the most salient regions in space and time by integrating motion cues in addition to the spatial information. Differently to the majority of existing methods which combine the image saliency with motion features [2], [3], [4], [5], [6], we propose in this paper a new model for video saliency detection using the pattern (PM) algorithm. Inspired by the work of [7] proposed to promote existing image saliency methods, we extend its principle into the spatiotemporal domain to estimate salient regions in video. First, for each frame from the input video, we compute two saliency maps: spatial saliency and temporal saliency. Then, from the generated maps, initial background and foreground regions can be detected and used as input of the PM algorithm to identify spatiotemporal saliency patterns. Finally, a label propagation process is applied to obtain the spatiotemporal saliency map. To illustrate the efficiency of our algorithm, we integrate the resultant saliency map into an energy minimization framework to segment salient moving object.

This paper is organized as follows: Section 2 presents the related work including video saliency detection and moving object segmentation. In Section 3 we describe the different steps of our proposed algorithm. Finally, experiments and comparative study are given in Section 4.

Section snippets

Video saliency detection

The most of video saliency models proposed in the literature combine the image saliency in the spatial domain with motion cues. In [2], Gao et al., added the motion channel for prediction of human eye fixations in dynamic scenes based on the center-surround hypothesis, to the image saliency. Ejaz et al. [3] proposed a feature aggregation based visual attention model combined with motion intensity for video summarization. Recently, a spatiotemporal saliency model based on superpixel-level

Proposed algorithm

Our main contribution in this work consists of using the PM algorithm to build our spatiotemporal saliency map instead of simple combination between spatial and motion cues as previous works. First, for each frame from the input video sequence, we apply the de-texturing method of [20] to preserve only the cartoon component of the image without unwanted edges and textures. Then the saliency map of the de-textured image in the spatial domain is obtained using [21]. In the other hand, we compute

Experiments and evaluation

In this section, we present the datasets used in our experiments. Then, we evaluate our video saliency model. Next, we show the performance of our algorithm to achieve MOS by comparing it with the different state of the art methods. Finally, we analyse the runtime of the proposed method.

Conclusion

While classical video saliency detection methods combine spatial information with motion cues to generate their video saliency maps, a new model based on PM algorithm is introduced in this work. We exploit an existed saliency method applied to both of the cartoon component of each frame and the image flow to generate initial spatial and motion saliency maps. From these maps, we can identify discriminative spatiotemporal saliency patterns using the PM algorithm to select background and

Hiba Ramadan is a Ph.D. student and member of the LIIAN Laboratory in the Department of Informatics at Sidi Mohamed Ben Abdellah University since November 2013. She received her M.Sc. in Informatics, Infographics and Image processing from Sidi Mohamed Ben Abdellah University in 2013. Her areas of research include Image and Video Analysis and Video Segmentation.

References (25)

  • W-T Li et al.

    Exploring visual and motion saliency for automatic video object extraction

    Image Process IEEE Trans

    (2013)
  • Z Tu et al.

    A new spatio-temporal saliency-based video object segmentation

    Cognit Comput

    (2016)
  • Cited by (6)

    • Lameness detection of dairy cows based on the YOLOv3 deep learning algorithm and a relative step size characteristic vector

      2020, Biosystems Engineering
      Citation Excerpt :

      Song, Jiang, Wu, Li, and He (2018) promoted a method based on the contour curve of the slope of a cow's head and neck, and showed that the detection accuracy reached 93.89% after data cleaning. Recently, target detection (Li, Song, Liu, & Wu, 2018; Ramadan & Tairi, 2018; Tu et al., 2017; Wang, Tang, & Xiao, 2019) and data classification (Fernández, Río, Chawla, & Herrera, 2017; Li & Mao, 2014; Tseng et al., 2015; Xu, Zheng, Yang, Xu, & Chen, 2017) have attracted increasing interest. It is therefore of great significance to identify lameness-related diseases and accurately diagnose lameness of dairy cows.

    • Video salient object detection using a virtual border and guided filter

      2020, Pattern Recognition
      Citation Excerpt :

      Chen et al. [31] detect the motion cues and spatial saliency map to get the motion energy term, which are combined with some constraints and formulated into the optimization framework. Ramadan et al. [32] applies the pattern mining algorithm to detect spatio-temporal saliency patterns. Guo et al. [33] select a set of salient proposals via a ranking strategy.

    Hiba Ramadan is a Ph.D. student and member of the LIIAN Laboratory in the Department of Informatics at Sidi Mohamed Ben Abdellah University since November 2013. She received her M.Sc. in Informatics, Infographics and Image processing from Sidi Mohamed Ben Abdellah University in 2013. Her areas of research include Image and Video Analysis and Video Segmentation.

    Hamid Tairi received his Ph.D. degree in 2001 from the University Sidi Mohamed Ben Abdellah (USMBA), Morocco. In 2002 he has done a postdoc in the Image Processing Group of the Laboratory LE2I in France. Since 2003, he has been an associate professor at the USMBA. His research interests are visual tracking for robotic control, 3-D reconstruction, and pattern recognition.

    Reviews processed and recommended for publication to the Editor-in-Chief by Area Editor Dr. E. Cabal-Yepez.

    View full text