Elsevier

Neurocomputing

Volume 172, 8 January 2016, Pages 225-234
Neurocomputing

Object co-segmentation via salient and common regions discovery

https://doi.org/10.1016/j.neucom.2014.12.110Get rights and content

Abstract

The goal of this paper is to simultaneously segment the object regions in a set of images with the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the common regions among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose an adaptive discriminative low rank matrix recovery (ADLRR) algorithm to divide the over-completely segmented regions (i.e., super-pixels) of a given image set into object and non-object ones. The proposed ADLRR is formulated from two views: a low-rank matrix recovery term for salient regions detection and a discriminative learning term adopted to distinguish object regions from all super-pixels. An additional regularized term is incorporated to jointly measure the disagreement between the predicted saliency and the objectiveness probability. For the unified learning problem by connecting the above three terms, we design an efficient alternate optimization procedure based on block-coordinate descent and augmented Lagrange multipliers method. Extensive experiments are conducted on three public datasets, i.e., MSRC, iCoseg and Caltech101, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.

Introduction

Object segmentation is a fundamental task in computer vision and multimedia areas, which is beneficial to many applications, e.g., object retrieval, object recognition and image editing [1], [2], [3], [4], [5], [6]. Some fully supervised or interactive object extraction approaches [7], [8], [9], [1] have been proposed to address the object segmentation problem. Typically the interactive approaches require the user to manually indicate the location of objects in the image, then appearance models are derived from the user input. The segmentation process is often cast as minimization of a binary and pairwise energy function, which can be efficiently optimized by the standard minimum cut algorithm [8]. However the interactive approach cannot be applied to the large scale dataset due to expensive cost of human intervention. Then some fully supervised object detection approaches [10], [11], [12], [13] are proposed to avoid the human intervention during the test process. Generally, the object detectors follow the sliding-window paradigm. A classifier is first trained based on pixel-level label to distinguish windows containing instances of a given class from all other windows, then the classifier is used to localize instances of the specific class. However, the object detectors are specialized for one object class and cannot be applied to the general classes. Moreover, most supervised methods have to be given large numbers of training images to learn those detectors. Meanwhile, the presence of common object classes in multiple images makes up for the absence of detailed supervisory information. It becomes possible to segment multiple images jointly to get the common objects, which is known as co-segmentation and has been actively studied in recent years.

To alleviate the problems above, recent researches [14], [15], [16], [17], [18], [19] focus on the weakly supervised methods of object co-segmentation, which is to simultaneously segment the same or similar objects appearing in a set of images without pixel-level supervised information. Most of the previous approaches work on the assumption that the regions common among images are deemed as the object regions [14], [15], [16], [17], [18], [19]. However, the truth is not the case, since the background regions may also be consistent. For example, as shown in Fig. 1, the ‘grass’ regions may be common within the ‘baseball’ images, but they are not the target objects. How to effectively resist the disturbance of such case, and further to precisely identify the true object regions become our focus in this paper.

To this end, we jointly exploit the saliency detection and common region mining from a set of images to perform the task of object co-segmentation. We use the term object co-segmentation to emphasize the fact we are interested in segmenting “objects” rather than “stuff” [20]. Namely, the object regions are assumed to be not only common among images but also salient in contrast with background regions. It can naturally eliminate the disturbance of those background regions consistent with each other. As shown in Fig. 1, we can easily catch the ‘baseball player’ regions as the true object regions because they are salient and simultaneously appear in these images, while the common but non-salient regions (e.g., baseball field) and the salient but uncommon regions (e.g., baseball referee and billboard) are deemed as background.

To discover the salient and common regions, we propose a co-segmentation framework based on adaptive low rank matrix recovery and discriminative learning, named as adaptive discriminative low rank matrix recovery (ADLRR), as shown in Fig. 2. Given a set of images of an object class, we first over-completely segment each image into super-pixels, and then employ the proposed ADLRR to identify the salient and common regions in the class specific image set. Inspired by the work in [21], we adopt the basic idea of low-rank matrix recovery to detect the salient regions of an image, i.e., decomposing the super-pixel-wise representation of each image into a low-rank matrix and a sparse matrix, and using the l1-norm of each column in the sparse matrix to measure the saliency of the corresponding super-pixel. Besides, a class specific feature transform matrix is introduced to enhance the salient regions and ensure that the matrix representing the background has a low rank. Furthermore, discriminative learning is incorporated on the image set to learn the best hyperplane to separate the salient and common super-pixels from the non-salient regions and salient but uncommon regions. Extensive experiments on three publicly available benchmarks, i.e., MSRC, iCoseg and Caltech101, show the satisfied performance of our proposed method. Our main contributions are summarized as follows.

  • To the best of our knowledge, we are the first to consider the disturbance of consistent background regions for object co-segmentation.

  • To overcome the disturbance of consistent background, we propose to perform saliency detection and discriminative learning in a unified framework to identify the common and salient object regions.

  • To enhance the salient regions and make sure that the background lies in a low dimensional space, a class specific feature transform matrix is introduced.

The rest of the paper is organized as follows. Related works about image co-segmentation are conducted in Section 2. Then, we elaborate our proposed model for object co-segmentation in Section 3, and its optimization algorithm is presented in Section 4. The experimental evaluation is given in Section 5 followed with the conclusion in Section 6.

Section snippets

Related work

The co-segmentation problem has been actively studied in the past few years. Rother et al. [14] first addressed the co-segmentation of image pairs by histogram matching and incorporating a global constraint into MRFs. Mukherjee et al. [15] modified the energy function in [14] with the l2-norm instead of the l1-norm, since such a model has some interesting properties and allows the use of alternative optimization methods. Different from [14], [15], Batra et al. [22] used a single foreground and

Proposed model

We start with the saliency detection via low rank matrix recovery to eliminate the disturbance of those backgrounds consistent with each other. Then we introduce the discriminative learning term and the proposed ADLRR by integrating the two processes together into a unified framework.

The notations in this paper are demonstrated as follows. Given a set of images τ with the same class label, image over-segmentation is performed to each image i by mean-shift clustering based on extracted features

Model optimization

Considering the objective function is a difference of convex functions, we propose an alternate optimization procedure and summarize it in Algorithm 1. Feature transform matrix is first learnt based on gradient descent approach, and details have been introduced in Section 3.1. Given feature transform matrix T, the optimization procedure of the target function in Eq. (8) can be divided into three subprocedures. The first is to perform low rank matrix recovery given the guidance information y.

Experiments and results

To validate the effectiveness of the proposed method, we conduct experiments on three publicly available benchmarks, i.e., MSRC-v2,1 iCoseg [22] and Caltech101 [29]. In our experiments, to validate the effectiveness of the proposed ADLRR, we compare it with its special cases and a number of related state-of-the-art approaches, which are enumerated as follows.

  • LRR: It is to use the saliency detection result as the final object

Conclusions

In this paper, a novel adaptive discriminative low rank matrix recovery (ADLRR) algorithm is proposed to perform object co-segmentation. Our method works on the assumption that object regions should be not only common but also salient ones among images. This is the first to be used in the task of object co-segmentation. We import the low rank matrix recovery term to measure the saliency of super-pixels so as to eliminate the disturbance from those consistent backgrounds. While a discriminative

Acknowledgements

This work was supported by 863 Program (2014AA015104), National Natural Science Foundation of China ((61272329),(61472422) and (61402228)) and Open Projects Program of National Laboratory of Pattern Recognition.

Yong Li received the B.E. degree from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2011. He is currently pursuing the Ph.D. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His current research interests include image content analysis, multimedia, and subspace learning. He received the Microsoft young fellowship in 2010.

References (30)

  • J. Liu et al.

    Image annotation via graph learning

    Pattern Recognit.

    (2009)
  • C. Li et al.

    Ordinal regularized manifold feature extraction for image ranking

    Signal Process.

    (2013)
  • C. Rother et al.

    Grabcut–interactive foreground extraction using iterated graph cuts

    ACM Trans. Graph.

    (2004)
  • Y. Liu, J. Liu, Z. Li, J. Tang, H. Lu, Weakly-supervised dual clustering for image semantic segmentation, in: CVPR,...
  • J. Liu, B. Wang, M. Li, Z. Li, W. Ma, H. Lu, S. Ma, Dual cross-media relevance model for image annotation, in: ACM...
  • Z.-J. Zha et al.

    Visual query suggestiontowards capturing user intent in internet image search

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2010)
  • Z.-J. Zha et al.

    Interactive video indexing with statistical active learning

    IEEE. Trans. Multimed.

    (2012)
  • B. Alexe, T. Deselaers, V. Ferrari, What is an object?, in: CVPR, 2010, pp....
  • Y.Y. Boykov, M. P. Jolly, Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images,...
  • D. Kuettel, V. Ferrari, Figure-ground segmentation by transferring window masks, in: CVPR, 2012, pp....
  • C. Desai, D. Ramanan, C. Fowlkes, Discriminative models for multi-class object layout, in: ICCV, 2009, pp....
  • H. Harzallah, F. Jurie, C. Schmid, Combining efficient object localization and image classification, in: ICCV, 2009,...
  • P. Felzenszwalb et al.

    Object detection with discriminatively trained part-based models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • C.-F. Juang, W.-K. Sun, G.-C. Chen, Object detection by color histogram-based fuzzy classifier with support vector...
  • C. Rother, T. Minka, A. Blake, V. Kolmogorov, Cosegmentation of image pairs by histogram matching – incorporating a...
  • Cited by (20)

    • Extensible image object co-segmentation with sparse cooperative relations

      2020, Information Sciences
      Citation Excerpt :

      Particularly, for fair comparison, co-segmentation approaches for multiple foregrounds [7,13,35,37] are presented for quantitative comparison on iCoseg dataset. On MSRC dataset, considering its particularity, we compare our method with co-segmentation methods using invariant feature [28,46,47] or multi-class co-segmentation approaches [13,14,38,45]. The quantitative comparison results on iCoseg and MSRC are given in Tables 2 and 3, respectively.

    • A new challenging image dataset with simple background for evaluating and developing co-segmentation algorithms

      2020, Signal Processing: Image Communication
      Citation Excerpt :

      Then the normalized cuts was applied to find the segmentation leading to the minimum sum of this map consistency term and the segmentation term on each single image. Li et al. [42] found out salient and common regions as the segmentation result. Such objective was achieved by conducting the following three procedures alternately: (1) detecting the saliency on each image by using low rank matrix recovery; (2) learning the logistic function for superpixels belonging to the common foreground; and (3) updating the probability of superpixels being common foreground based on saliency detection and learned logistic function.

    • Salient object detection in hyperspectral imagery using multi-scale spectral-spatial gradient

      2018, Neurocomputing
      Citation Excerpt :

      Salient object detection that extracts objects attractive to human vision system from an image, plays an important role in many computer vision tasks, such as object co-segmentation [1,2], action recognition [3,4], even image compression [5–7] etc.

    • Unsupervised image co-segmentation via guidance of simple images

      2018, Neurocomputing
      Citation Excerpt :

      In [16], the segmentation problem is handled in a discriminative clustering framework, in which objects are separated from the background by merging the image pixels into two clusters that can be maximally distinguished. Disturbance from consistent background is considered in [14], and a framework of saliency detection and discriminative learning is designed to overcome the disturbance. In [15], each image is treated as a player, for which a constrained utility function is designed by using the cooperative game, heat diffusion, and image saliency.

    View all citing articles on Scopus

    Yong Li received the B.E. degree from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2011. He is currently pursuing the Ph.D. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His current research interests include image content analysis, multimedia, and subspace learning. He received the Microsoft young fellowship in 2010.

    Jing Liu received the B.E. and M.E. degrees from Shandong University, Shandong, in 2001 and 2004, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 2008. She is an Associate Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Her current research interests include machine learning, image content analysis and classification, multimedia. She published over 30 papers in those areas.

    Zechao Li received the B.E. degree from University of Science and Technology of China (USTC), Anhui, China, in 2008, and the Ph.D. degree from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences in 2013. He is an Assistant Professor with School of Computer Science, Nanjing University of Science and Technology. His current research interests include multimedia analysis and understanding, subspace learning, correlation mining, etc. He received the 2013 President Scholarship of Chinese Academy of Science.

    Hanqing Lu received his B.S. and M.S. from Department of Computer Science and Department of Electric Engineering in Harbin Institute of Technology in 1982 and 1985. He got his Ph.D. from Department of Electronic and Information Science in Huazhong University of sciences and Technology. He is a Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Current research interests include Image similarity measure, Video Analysis, Multimedia Technology and System. He published over 300 papers in those areas.

    Songde Ma received his B.S. in Automatic Control from the Tsinghua University in 1968, Ph.D. degree in University of Paris in 1983 and “Doctor at d’Etat es Science” in France in 1986 in image processing and computer vision. He is a Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision, image understanding and searching, robotics and computer graphics, etc.

    View full text