Elsevier

Neurocomputing

Volume 86, 1 June 2012, Pages 24-32
Neurocomputing

Temporal Spectral Residual for fast salient motion detection

https://doi.org/10.1016/j.neucom.2011.12.033Get rights and content

Abstract

Motion saliency detection aims at finding the semantic regions in a video sequence. It is an important pre-processing step in many vision applications. In this paper, we propose a new algorithm, Temporal Spectral Residual, for fast motion saliency detection. Different from conventional motion saliency detection algorithms that use complex mathematical models, our goal is to find a good tradeoff between the computational efficiency and accuracy. The basic observation for salient motions is that on the cross section along the temporal axis of a video sequence, the regions of moving objects contain distinct signals while the background area contains redundant information. Thus our focus in this paper is to extract the salient information on the cross section, by utilizing the off-the-shelf method Spectral Residual, which is a 2D image saliency detection method. Majority voting strategy is also introduced to generate reliable results. Since the proposed method only involves Fourier spectrum analysis, it is computationally efficient. We validate our algorithm on two applications: background subtraction in outdoor video sequences under dynamic background and left ventricle endocardium segmentation in MR sequences. Compared with some state-of-art algorithms, our algorithm achieves both good accuracy and fast computation, which satisfies the need as a pre-processing method.

Introduction

Saliency detection has attracted much attention in recent years. Different from conventional segmentation problem of separating the whole scene into discrete parts, saliency detection aims at finding semantic regions and filtering out the unimportant area. The idea of saliency detection comes from human visual system, where the first stage of human vision is a fast but simple pre-attentive process. Saliency detection is an important topic in computer vision, since it provides a fast pre-processing stage for many vision applications.

Saliency detection on both images and videos has been studied in recent years. For image-based saliency detection, we want to find salient regions that are different from the background, e.g. a deer in a forest. Many algorithms have been proposed. Itti and Koch have designed a model of simulating the human visual search process to detect saliency in static images [1], [2], [3]. It has also been extended to visual recognition tasks [4], [5]. Recently Hou and Zhang [6] proposed a fast Fourier spectrum residual analysis for image saliency detection.

Video-based saliency detection is different from image saliency detection problem. It aims at finding salient motions from the background in a video sequence. The salient motion can be a running person on a beach or a beating heart in a MR sequence. Locating salient moving objects accurately and efficiently is a critical pre-processing step to many video understanding applications [7], [8]. It can also be used in video qualify assessment [9]. But it is still a challenging problem, since videos or 3D volume data can have various background motions. To solve this problem, many algorithms have been proposed: Gaussian Mixture Model [10], Nonparametric Kernel Density Estimation [11], adaptive KDE combined with motion information [12], Bayesian Learning approach [13], Linear Dynamic Model [14], Robust Kalman Filter [15], etc.

Though these models have achieved good results, they all use sophisticated models or algorithms. They are not fast enough as a pre-processing method. In this paper, we propose a fast motion saliency detection method Temporal Spectral Residual. Different from the complex background modeling, the proposed method is computationally efficient. It does not need any initial labeling and is free of training. It satisfies the need of a pro-processing method. The main idea of our method comes from the observation: the moving trajectories of salient objects on the cross section along temporal slices have its salient region and redundant area. Thus our focus is to extract the salient information on the cross section, by utilizing the off-the-shelf algorithm Spectral Residual [6]. Spectral Residual is a method to find salient information on a 2D image. Majority voting strategy is also introduced to produce a robust result. Since our algorithm only involves Fourier Spectrum Analysis, it is computationally efficient. We validate our assumptions on two different applications: (1) background subtraction under dynamic background in video sequences; (2) cardiac motion localization in MR sequences. The experiments show that our method works well for diverse applications and can handle various dynamic background motions. It also shows that our algorithm is computationally efficient.

It is worth mentioning that our algorithm does not aim at finding the motion perfectly, our goal is to find a good tradeoff between the accuracy and efficiency. The experiments show that our algorithm can find salient objects in a good quality, and also run efficiently. It provides a good pre-processing method for other applications. Section 2 gives the details of our algorithms. The experiment Section 3 will talk about more information of the literature review and the details of the two applications. Experiments validate the effectiveness and efficiency of our algorithm.

Section snippets

Methodology

We propose a new and efficient method to find salient motion regions in video sequences. The main idea is to roughly remove the redundant part of a volume data (the static part of temporal slices) and keep the salient motion regions. This algorithm is able to provide reliable motion regions but does not need initial labeling or any training data. Our method uses Spectral Residual (SR) algorithm [6] as the building bricks. SR is a saliency detection algorithm on 2D images by doing statistics of

Experiments

In this section, we apply our method in two applications: (1) background subtraction in video sequences with various types of backgrounds; (2) left ventricle segmentation from 2D MR sequences. We validate the robustness and computational efficiency of our algorithms on these two applications.

Conclusions

In this paper, we presented a fast motion saliency detection method Temporal Spectral Residual (TSR). Based on the observation that moving objects contain salient information along the temporal domain, we proposed to use Spectral Residual algorithm on temporal slices to detect the motion saliency. In order to produce reliable results, we also introduced majority voting strategy to further refine results. Since our algorithm only depends on Fourier Transform instead of complex models, it is

Xinyi Cui is a Ph.D. candidate in the Computer Science Department at Rutgers University. Her advisor is Dr. Dimitris N. Metaxas. Her major research interests are computer vision and machine learning. More specifically, she focuses on motion analysis for video sequences, human action/activity recognition, human behavior analysis, saliency detection, background modeling, object detection and recognition. she received her M.S. degree from the Computer Science and Engineering Department at Harbin

References (32)

  • X. Gao, N. Liu, W. Lu, D. Tao, X. Li, Spatio-temporal salience based video quality assessment, in: IEEE International...
  • C. Stauffer, W. Grimson, Adaptive background mixture models for real-time tracking, in: IEEE Conference on Computer...
  • A. Elgammal et al.

    Background and foreground modeling using nonparametric kernel density for visual surveillance

    Proc. IEEE

    (2002)
  • A. Mittal, N. Paragios, Motion-based background subtraction using adaptive kernel density estimation, in: IEEE...
  • O. Tuzel, F. Porikli, P. Meer, A Bayesian approach to background modeling, in: IEEE Computer Society Conference on...
  • A. Monnet, A. Mittal, N. Paragios, V. Ramesh, Background modeling and subtraction of dynamic scenes, in: IEEE...
  • Cited by (26)

    • Background–foreground interaction for moving object detection in dynamic scenes

      2019, Information Sciences
      Citation Excerpt :

      The oldest sample is discarded and a new sample is added in each step, preventing the loss of diversity among samples. The experiments are composed of two parts: (i) the performance of our approach is evaluated using various sample sizes; (ii) our approach is compared with the existing methods, i.e., ST-MoG (Spatio-Temporal MoG) [49], TSR (Temporal Spectral Residual) [50], BF-KDE (Background-Foreground KDE) [18], Vibe [16] and GFL(Generalized Fused Lasso) [32]. These methods use different strategies, all of which have an ability to handle dynamic backgrounds.

    • Motion saliency detection using a temporal fourier transform

      2016, Optics and Laser Technology
      Citation Excerpt :

      However, the object region is hardly recognizable in Fig. 2(d). By building the time slice, the Spectral Residual, which is initially used in the 2D images, is introduced into the video sequence, making the TSR model [12]. By adaptive threshold, TSR can largely remove the dynamic background, as shown in Fig. 2(e).

    View all citing articles on Scopus

    Xinyi Cui is a Ph.D. candidate in the Computer Science Department at Rutgers University. Her advisor is Dr. Dimitris N. Metaxas. Her major research interests are computer vision and machine learning. More specifically, she focuses on motion analysis for video sequences, human action/activity recognition, human behavior analysis, saliency detection, background modeling, object detection and recognition. she received her M.S. degree from the Computer Science and Engineering Department at Harbin Institute of Technology. She also received her B.E. from the same department.

    Qingshan Liu is a professor in the School of Information and Control Engineering, Nanjing University of Information Science and Technology, China. He received his Ph.D. from the National Laboratory of Pattern Recognition, Chinese Academic of Science in 2003 and his M.S. from the Department of Auto Control in South-East University in 2000. Dr. Qingshan Liu was an assistant research professor in the Department of Computer Science, Computational Biomedicine Imaging & Modeling Center (CBIM), Rutgers, the State University of New Jersey from 2010 to 2011. Before he joined in Rutgers University, he worked as an associate professor at the National Laboratory of Pattern Recognition, Chinese Academic of Science, and he worked as an associate researcher at the Multimedia Laboratory in Chinese University of Hong Kong during June 2004 and April 2005. He received the president scholarship of Chinese Academy of Sciences in 2003. His research interests are Image and Vision Analysis including Face Image Analysis, Graph & Hyper-graph based Image and Video understanding, Medical Image Analysis, Event-based Video Analysis, etc.

    Shaoting Zhang received his B.E. degree in Software Engineering from Zhejiang University, China, in 2005, M.S. degree in Computer Software and Theory from Shanghai Jiao Tong University, China, in 2007, and Ph.D. degree in Computer Science from Rutgers, the State University of New Jersey in 2011. His advisor is Dr. Dimitris N. Metaxas. His major research interests are focusing on deformable models, sparse learning methods, and their applications on medical image analysis, computer vision and computer graphics.

    Fei Yang is a Ph.D. candidate in the Computer Science Department at Rutgers University. He received the B.E. degree from Tsinghua University in 2003, and the M.E. degree from the Chinese Academy of Sciences in 2006. He was a software engineer in Microsoft China from 2006 to 2007. His current research focuses on facial feature localization, face tracking and face animation.

    Dimitris N. Metaxas is a professor in the Computer Science Department at Rutgers University. He is directing the Computational Biomedicine Imaging and Modeling Center (CBIM). He received the B.E. degree from the National Technical University of Athens, Greece, in 1986, M.S. degree from the University of Maryland in 1988, and Ph.D. from the University of Toronto in 1992. He has been conducting research toward the development of formal methods upon which computer vision, computer graphics, and medical imaging can advance synergistically.

    View full text