Object co-segmentation via salient and common regions discovery

doi:10.1016/j.neucom.2014.12.110

Neurocomputing

Volume 172, 8 January 2016, Pages 225-234

https://doi.org/10.1016/j.neucom.2014.12.110 Get rights and content

Abstract

The goal of this paper is to simultaneously segment the object regions in a set of images with the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the common regions among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose an adaptive discriminative low rank matrix recovery (ADLRR) algorithm to divide the over-completely segmented regions (i.e., super-pixels) of a given image set into object and non-object ones. The proposed ADLRR is formulated from two views: a low-rank matrix recovery term for salient regions detection and a discriminative learning term adopted to distinguish object regions from all super-pixels. An additional regularized term is incorporated to jointly measure the disagreement between the predicted saliency and the objectiveness probability. For the unified learning problem by connecting the above three terms, we design an efficient alternate optimization procedure based on block-coordinate descent and augmented Lagrange multipliers method. Extensive experiments are conducted on three public datasets, i.e., MSRC, iCoseg and Caltech101, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.

Introduction

Object segmentation is a fundamental task in computer vision and multimedia areas, which is beneficial to many applications, e.g., object retrieval, object recognition and image editing [1], [2], [3], [4], [5], [6]. Some fully supervised or interactive object extraction approaches [7], [8], [9], [1] have been proposed to address the object segmentation problem. Typically the interactive approaches require the user to manually indicate the location of objects in the image, then appearance models are derived from the user input. The segmentation process is often cast as minimization of a binary and pairwise energy function, which can be efficiently optimized by the standard minimum cut algorithm [8]. However the interactive approach cannot be applied to the large scale dataset due to expensive cost of human intervention. Then some fully supervised object detection approaches [10], [11], [12], [13] are proposed to avoid the human intervention during the test process. Generally, the object detectors follow the sliding-window paradigm. A classifier is first trained based on pixel-level label to distinguish windows containing instances of a given class from all other windows, then the classifier is used to localize instances of the specific class. However, the object detectors are specialized for one object class and cannot be applied to the general classes. Moreover, most supervised methods have to be given large numbers of training images to learn those detectors. Meanwhile, the presence of common object classes in multiple images makes up for the absence of detailed supervisory information. It becomes possible to segment multiple images jointly to get the common objects, which is known as co-segmentation and has been actively studied in recent years.

To alleviate the problems above, recent researches [14], [15], [16], [17], [18], [19] focus on the weakly supervised methods of object co-segmentation, which is to simultaneously segment the same or similar objects appearing in a set of images without pixel-level supervised information. Most of the previous approaches work on the assumption that the regions common among images are deemed as the object regions [14], [15], [16], [17], [18], [19]. However, the truth is not the case, since the background regions may also be consistent. For example, as shown in Fig. 1, the ‘grass’ regions may be common within the ‘baseball’ images, but they are not the target objects. How to effectively resist the disturbance of such case, and further to precisely identify the true object regions become our focus in this paper.

To this end, we jointly exploit the saliency detection and common region mining from a set of images to perform the task of object co-segmentation. We use the term object co-segmentation to emphasize the fact we are interested in segmenting “objects” rather than “stuff” [20]. Namely, the object regions are assumed to be not only common among images but also salient in contrast with background regions. It can naturally eliminate the disturbance of those background regions consistent with each other. As shown in Fig. 1, we can easily catch the ‘baseball player’ regions as the true object regions because they are salient and simultaneously appear in these images, while the common but non-salient regions (e.g., baseball field) and the salient but uncommon regions (e.g., baseball referee and billboard) are deemed as background.

To discover the salient and common regions, we propose a co-segmentation framework based on adaptive low rank matrix recovery and discriminative learning, named as adaptive discriminative low rank matrix recovery (ADLRR), as shown in Fig. 2. Given a set of images of an object class, we first over-completely segment each image into super-pixels, and then employ the proposed ADLRR to identify the salient and common regions in the class specific image set. Inspired by the work in [21], we adopt the basic idea of low-rank matrix recovery to detect the salient regions of an image, i.e., decomposing the super-pixel-wise representation of each image into a low-rank matrix and a sparse matrix, and using the l₁-norm of each column in the sparse matrix to measure the saliency of the corresponding super-pixel. Besides, a class specific feature transform matrix is introduced to enhance the salient regions and ensure that the matrix representing the background has a low rank. Furthermore, discriminative learning is incorporated on the image set to learn the best hyperplane to separate the salient and common super-pixels from the non-salient regions and salient but uncommon regions. Extensive experiments on three publicly available benchmarks, i.e., MSRC, iCoseg and Caltech101, show the satisfied performance of our proposed method. Our main contributions are summarized as follows.

•
To the best of our knowledge, we are the first to consider the disturbance of consistent background regions for object co-segmentation.
•
To overcome the disturbance of consistent background, we propose to perform saliency detection and discriminative learning in a unified framework to identify the common and salient object regions.
•
To enhance the salient regions and make sure that the background lies in a low dimensional space, a class specific feature transform matrix is introduced.

The rest of the paper is organized as follows. Related works about image co-segmentation are conducted in Section 2. Then, we elaborate our proposed model for object co-segmentation in Section 3, and its optimization algorithm is presented in Section 4. The experimental evaluation is given in Section 5 followed with the conclusion in Section 6.

Section snippets

Related work

The co-segmentation problem has been actively studied in the past few years. Rother et al. [14] first addressed the co-segmentation of image pairs by histogram matching and incorporating a global constraint into MRFs. Mukherjee et al. [15] modified the energy function in [14] with the l₂-norm instead of the l₁-norm, since such a model has some interesting properties and allows the use of alternative optimization methods. Different from [14], [15], Batra et al. [22] used a single foreground and

Proposed model

We start with the saliency detection via low rank matrix recovery to eliminate the disturbance of those backgrounds consistent with each other. Then we introduce the discriminative learning term and the proposed ADLRR by integrating the two processes together into a unified framework.

The notations in this paper are demonstrated as follows. Given a set of images τ with the same class label, image over-segmentation is performed to each image i by mean-shift clustering based on extracted features

Model optimization

Considering the objective function is a difference of convex functions, we propose an alternate optimization procedure and summarize it in Algorithm 1. Feature transform matrix is first learnt based on gradient descent approach, and details have been introduced in Section 3.1. Given feature transform matrix $T$ , the optimization procedure of the target function in Eq. (8) can be divided into three subprocedures. The first is to perform low rank matrix recovery given the guidance information y.

Experiments and results

To validate the effectiveness of the proposed method, we conduct experiments on three publicly available benchmarks, i.e., MSRC-v2,¹ iCoseg [22] and Caltech101 [29]. In our experiments, to validate the effectiveness of the proposed ADLRR, we compare it with its special cases and a number of related state-of-the-art approaches, which are enumerated as follows.

•
LRR: It is to use the saliency detection result as the final object

Conclusions

In this paper, a novel adaptive discriminative low rank matrix recovery (ADLRR) algorithm is proposed to perform object co-segmentation. Our method works on the assumption that object regions should be not only common but also salient ones among images. This is the first to be used in the task of object co-segmentation. We import the low rank matrix recovery term to measure the saliency of super-pixels so as to eliminate the disturbance from those consistent backgrounds. While a discriminative

Acknowledgements

This work was supported by 863 Program (2014AA015104), National Natural Science Foundation of China ((61272329),(61472422) and (61402228)) and Open Projects Program of National Laboratory of Pattern Recognition.

Yong Li received the B.E. degree from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2011. He is currently pursuing the Ph.D. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His current research interests include image content analysis, multimedia, and subspace learning. He received the Microsoft young fellowship in 2010.

References (30)

J. Liu et al.
Image annotation via graph learning
Pattern Recognit.
(2009)
C. Li et al.
Ordinal regularized manifold feature extraction for image ranking
Signal Process.
(2013)
C. Rother et al.
Grabcut–interactive foreground extraction using iterated graph cuts
ACM Trans. Graph.
(2004)
Y. Liu, J. Liu, Z. Li, J. Tang, H. Lu, Weakly-supervised dual clustering for image semantic segmentation, in: CVPR,...
J. Liu, B. Wang, M. Li, Z. Li, W. Ma, H. Lu, S. Ma, Dual cross-media relevance model for image annotation, in: ACM...
Z.-J. Zha et al.
Visual query suggestiontowards capturing user intent in internet image search
ACM Trans. Multimed. Comput. Commun. Appl.
(2010)
Z.-J. Zha et al.
Interactive video indexing with statistical active learning
IEEE. Trans. Multimed.
(2012)
B. Alexe, T. Deselaers, V. Ferrari, What is an object?, in: CVPR, 2010, pp....
Y.Y. Boykov, M. P. Jolly, Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images,...
D. Kuettel, V. Ferrari, Figure-ground segmentation by transferring window masks, in: CVPR, 2012, pp....

C. Desai, D. Ramanan, C. Fowlkes, Discriminative models for multi-class object layout, in: ICCV, 2009, pp....

H. Harzallah, F. Jurie, C. Schmid, Combining efficient object localization and image classification, in: ICCV, 2009,...

P. Felzenszwalb et al.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.

(2010)

C.-F. Juang, W.-K. Sun, G.-C. Chen, Object detection by color histogram-based fuzzy classifier with support vector...

C. Rother, T. Minka, A. Blake, V. Kolmogorov, Cosegmentation of image pairs by histogram matching – incorporating a...

Cited by (20)

A novel co-attention computation block for deep learning based image co-segmentation
2020, Image and Vision Computing
The correlation between images is crucial for solving the image co-segmentation problem that is segmenting common and salient objects from a set of related images. This paper proposes a novel co-attention computation block to compute the visual correlation between images for improving the co-segmentation performance. Here ‘co-attention’ means that we obtain the co-attention features in encoded features of an image to guide the attention in another image. To this purpose, we firstly introduce top-k average pooling to compute the channel co-attention descriptor. Then we explore the correlation between features in different spatial positions to get the spatial co-attention descriptor. Finally, these two types of co-attention descriptors are multiplied to generate a fused one. We obtain such a fused co-attention descriptor for each image and use it to produce the co-attention augmented feature map for the following processing in the applications. We embed the proposed co-attention block into a U-shaped Siamese network for fulfilling the image co-segmentation. It is proven to be able to improve the performance effectively in the experiments. To our best knowledge, it leads to the currently best results on Internet dataset and iCoseg dataset.
A survey on image and video cosegmentation: Methods, challenges and analyses
2020, Pattern Recognition
Image and video cosegmentation is a newly emerging and rapidly progressing area, which aims at delineating common objects at pixel-level from a group of images or a set of videos. Plenty of related works have been published and implemented in varied applications, but there lacks a systematic survey on both image and video cosegmentation. This paper provides a comprehensive overview including the existing methods, applications, and challenges. Specifically, different cosegmentation problem settings are described, the formulation details of the methods are summarized and their potential applications are listed. Moreover, the benchmark datasets and standard evaluation metrics are also given; and the future directions and unsolved challenges are discussed.
Extensible image object co-segmentation with sparse cooperative relations
2020, Information Sciences
Citation Excerpt :
Particularly, for fair comparison, co-segmentation approaches for multiple foregrounds [7,13,35,37] are presented for quantitative comparison on iCoseg dataset. On MSRC dataset, considering its particularity, we compare our method with co-segmentation methods using invariant feature [28,46,47] or multi-class co-segmentation approaches [13,14,38,45]. The quantitative comparison results on iCoseg and MSRC are given in Tables 2 and 3, respectively.
In the co-segmentation problems of sparse related image group, how to mine correlations between common objects has been one of the most significant steps. In this paper, we first prioritize the processing efficiency and construct sparse cooperative graph for related images in feature space, instead of examining the full correlations of common foregrounds which lead to high computation cost and great risk of false connection. And more importantly, we design a flexible and efficient extending strategy to reuse the cooperative graph for incremental image data. Then, with the sparse correlations of the image group, we propose a unilateral proposal co-selection method for pair-wise region level co-segmentation. Finally, we have achieved the segmentation results with further pixel level refinement. Our experimental results on publicly available datasets show that, compared with the approaches using dense cooperative constraints, the proposed method can achieve more competitive results with extremely sparse correlative constraints, which shows its bright application prospects for sparse correlated image groups with incremental data due to its high efficiency and flexible extensibility.
A new challenging image dataset with simple background for evaluating and developing co-segmentation algorithms
2020, Signal Processing: Image Communication
Citation Excerpt :
Then the normalized cuts was applied to find the segmentation leading to the minimum sum of this map consistency term and the segmentation term on each single image. Li et al. [42] found out salient and common regions as the segmentation result. Such objective was achieved by conducting the following three procedures alternately: (1) detecting the saliency on each image by using low rank matrix recovery; (2) learning the logistic function for superpixels belonging to the common foreground; and (3) updating the probability of superpixels being common foreground based on saliency detection and learned logistic function.
Many image co-segmentation algorithms have been proposed over the last decade. In this paper, we present a new dataset for evaluating co-segmentation algorithms, which contains 889 image groups with 18 images in each and the pixel-wise hand-annotated ground truths. The dataset is characterized by simple background produced from nearly a single color. It looks simple but is actually very challenging for current co-segmentation algorithms, because of four difficult cases in it: easy-confused foreground with background, transparent regions in objects, minor holes in objects, and shadows. In order to test the usefulness of our dataset, we review the state-of-the-art co-segmentation algorithms and evaluate seven algorithms on our dataset. The obtained performance of each algorithm is compared with those previously reported in the datasets with complex background. The results prove that our dataset is valuable for the development of co-segmentation techniques. It is more feasible to solve the four difficulties above on the simple background and then extend the solutions to the complex background problems. Our dataset can be freely downloaded from: http://www.iscbit.org/source/MLMR-COS.zip.
Salient object detection in hyperspectral imagery using multi-scale spectral-spatial gradient
2018, Neurocomputing
Citation Excerpt :
Salient object detection that extracts objects attractive to human vision system from an image, plays an important role in many computer vision tasks, such as object co-segmentation [1,2], action recognition [3,4], even image compression [5–7] etc.
Spectra in hyperspectral images (HSIs) benefit identifying objects from cluttered background, thus increasing effort has been made for salient object detection in HSIs. However, most existing methods are sensitive to the spectral variation brought by uneven illumination during imaging as well as objects of various scales. To address this problem, we propose a novel multi-scale spectral-spatial gradient based salient object detection method for HSIs. Through constructing a region based hierarchical structure, we obtain various saliency maps by evaluating each region in multiple scales with a spectra-spatial gradient saliency model, which not only depicts the global region contrast with the spectral gradient, but also exploits the spatial gradient to highlight regions with semantic edges. Given these saliency maps, the final result is given as their weighted summation. The proposed method is robust to spectral variation and can adaptively detect objects of various scales. Two prior models are further integrated into the proposed saliency model to enhance the detection accuracy. Experimental results on real HSIs validate the effectiveness of the proposed method.
Unsupervised image co-segmentation via guidance of simple images
2018, Neurocomputing
Citation Excerpt :
In [16], the segmentation problem is handled in a discriminative clustering framework, in which objects are separated from the background by merging the image pixels into two clusters that can be maximally distinguished. Disturbance from consistent background is considered in [14], and a framework of saliency detection and discriminative learning is designed to overcome the disturbance. In [15], each image is treated as a player, for which a constrained utility function is designed by using the cooperative game, heat diffusion, and image saliency.
This paper proposes a novel image co-segmentation method, which aims to segment the common objects in a group of images. The proposed method takes advantages of the reliability of simple images and successfully improves the performance. The images are first ranked by the complexities based on their saliency maps. Then, the simple images, in which objects are common and easy to be segmented, are selected and processed to obtain their segmentation results, these segmentation results are taken as the samples of the targeted objects. Finally, the remaining complicated images are segmented with the guidance of the samples. The experiments on the iCoseg dataset demonstrate the outperformance and robustness of the proposed method.

View all citing articles on Scopus

Jing Liu received the B.E. and M.E. degrees from Shandong University, Shandong, in 2001 and 2004, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, Beijing, in 2008. She is an Associate Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Her current research interests include machine learning, image content analysis and classification, multimedia. She published over 30 papers in those areas.

Zechao Li received the B.E. degree from University of Science and Technology of China (USTC), Anhui, China, in 2008, and the Ph.D. degree from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences in 2013. He is an Assistant Professor with School of Computer Science, Nanjing University of Science and Technology. His current research interests include multimedia analysis and understanding, subspace learning, correlation mining, etc. He received the 2013 President Scholarship of Chinese Academy of Science.

Hanqing Lu received his B.S. and M.S. from Department of Computer Science and Department of Electric Engineering in Harbin Institute of Technology in 1982 and 1985. He got his Ph.D. from Department of Electronic and Information Science in Huazhong University of sciences and Technology. He is a Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. Current research interests include Image similarity measure, Video Analysis, Multimedia Technology and System. He published over 300 papers in those areas.

Songde Ma received his B.S. in Automatic Control from the Tsinghua University in 1968, Ph.D. degree in University of Paris in 1983 and “Doctor at d’Etat es Science” in France in 1986 in image processing and computer vision. He is a Professor with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision, image understanding and searching, robotics and computer graphics, etc.

View full text

Object co-segmentation via salient and common regions discovery

Abstract

Introduction

Section snippets

Related work

Proposed model

Model optimization

Experiments and results

Conclusions

Acknowledgements

Pattern Recognit.

Signal Process.

Grabcut–interactive foreground extraction using iterated graph cuts

ACM Trans. Graph.

Visual query suggestiontowards capturing user intent in internet image search

ACM Trans. Multimed. Comput. Commun. Appl.

Interactive video indexing with statistical active learning

IEEE. Trans. Multimed.

Object detection with discriminatively trained part-based models

IEEE Trans. Pattern Anal. Mach. Intell.