Noise-aware co-segmentation with local and global priors
Introduction
Image segmentation has been a fundamental problem in computer vision. Many approaches [1], [2], [3] have been proposed to deal with this problem. Due to the interior complexity in both object and context world, it is still now a challenging problem. One of these methods is to simultaneously segment common objects from a collection of images, which is well-known as co-segmentation.
The early image co-segmentation methods [3], [4], [5], [6], [7] mainly focus on segmenting image pair given by the user. Many existing methods [8], [9], [10], [11], [12], [13] for multiple image co-segmentation usually assume that the input images are clean, i.e., the co-segmentation of multiple photos taken at a certain period. In most of cases, we suppose to perform co-segmentation on ordinary image sets, where the common objects often vary in shape, pose, color and viewpoint. Due to the semantic similarity between the foreground and common object, noisy images without the region of interest are inevitably involved. Fig. 1 shows an example of image of the automobile interior. To deal with this problem, we propose to first find out the clean images before co-segmenting the common objects.
Recently, some methods have been proposed to solve the co-segmentation with noisy images. Rubinstein et al. [14] tried to establish a shape prior based on the saliency of each image. Chen et al. [15] divided a large set into many small subcategories, and build a shape template based on the latent-SVM detector for each subcategory. These methods are not scalable for a large set of input images. As in [14], their method requires a cluster with 36 CPU cores, which greatly limits its application domain. Since neither the saliency nor the bounding box detected by latent-SVM is able to approximate the true boundary of the foreground object, it may lead to the overfitting issue. Moreover, the salient part of an image may not always reflect the desired object region. Alternatively, Wang et al. [16] directly estimated the common context of the input images by training an auto-context model. As the context of noisy image set is usually more complex than the foreground, it involves with a time-consuming iterative scheme.
We obtain a clean image collection by taking advantage of a two-step scheme: (1) to ensure that an image collection contains common foreground object, we need to find it out first. To this end, we introduce a measure called “attentiveness” on the object. We compute the attentiveness scores of different categories in the image collection so as to identify the target object; (2) affinity propagation clustering is employed to further purify the collection. The second step aims to cluster foreground objects into similar shape or viewpoint. This step is important due to the difference of the difficulty between these two cases: co-segmentation on images with front and side view of planes and co-segmentation on images with only side view of planes. One merit of our approach is that this process on image set is co-segmentation scheme-agnostic. It implies that many previous methods can easily make use of this process to perform co-segment on ordinary image set.
Given a clean image set, the prevailing co-segmentation approaches [3], [4], [9] depend on the user-interaction. They require user scribbles to indicate the foreground and background. Since it is inefficient to manually label each image for performing co-segmentation, this limits the applying of these methods on large scale image set. Different from these methods, we propose to use shape priors to replace user scribbles as initialization. In contrast to the conventional methods using soft-boundary [17], [18], transferred maps of nearest neighbours [19], [20], and exemplar-based detectors [15], [21] to model the shape prior, we try to simultaneously build both local and global priors to model the common object on the a whole input image set.
To this end, we take advantage of the semantic proposals from Simultaneous Detection and Segmentation (SDS) framework [22]. A local shape prior is constructed upon those semantic proposals from a single image, while a global one is built on the whole image set. Both two priors are conducted on uniform spatial proposals. Since these priors are relatively coarse representation of the common object, we employ dense correspondence mapping to further optimize them. Finally, a Markov Random Field (MRF) based energy functions is constructed using these two priors. The global prior can approximately estimate the viewpoint the common object, and the local prior captures foreground boundary. Our segmentation results are boosted via combining the information of these two priors. The co-segmentation is performed by minimizing the energy function of the graph cut [23], [24], [25]. The whole process of our proposed approach is fully automatic without human supervision. Thus, it is easy to use in practice.
We aim at simultaneously selecting the clean image from the noisy image set and segmenting the object region from them with local and global priors. In summary, the main contributions of this paper are: (1) a new approach to refine image set by first identifying the target object using a measure of attention. Thus, the noisy images are filtered out by the target object category, and then by clustering images into similar shape or viewpoint through affinity propagation; (2) a novel object-level prior that estimates the shape of object region, which is further optimized by the dense correspondence mapping. It is able to outperform the conventional priors using saliency or bounding box; (3) an energy-based formulation to co-segment the target object that takes advantage of both local and global shape prior information.
Section snippets
Related work
Co-segmentation is an interesting problem in computer vision [5]. The early methods [3], [4] mainly focus on finding the segmentations with the similar foreground in image pairs, where the user interaction is typically used. Recently, Wang and Shen [9] employed higher-order energy based function for co-segmentation on multiple images, which still relies on user-interaction.
In recent years, some methods that do not need user interaction are proposed. Firstly, researchers focus on the problem
Preprocessing with attentiveness
In this section, we describe in details the preprocessing stage of our co-segmentation method. We start by introducing the definition of attentiveness. Then, we describe how to obtain clean images with image attentiveness. Finally, we further refine them by clustering their object proposals. The overall framework of the preprocessing pipeline on image set is illustrated in Fig. 2.
Co-segmentation with priors
Once a collection of clean images of the cluster are obtained, we cast the co-segmentation of these images as an energy minimization problem. In this section, we will describe in detail our proposed image co-segmentation method using a local and a global priors. Suppose the cleaned collection is represented as and is the labeling of an image . lx is the label of pixel x. In this paper, we restrict lx to be {0, 1}, i.e. labeling pixels as foreground or background
Experiment
In this section, we first describe the implementation of our proposed approach and the testbeds with evaluation protocols in our experiment. Then, we compare our method with the state-of-the-art approaches. Finally, we conduct several experiments to evaluate the efficacy of our proposed method.
Conclusion and future work
By simultaneously computing the attentiveness scores and exploiting subcategory clustering on the semantic proposals, we propose an approach to purify a noisy image collection. Then co-segmentation is conducted on the clean images by combining both local and global shape priors. We refine these two priors using dense correspondence mapping and perform co-segmentation by solving an energy minimization problem. Finally, the extensive experiments on Internet image, Graz02, and MSRC datasets show
Acknowledgment
The authors would like to thank the reviews for their constructive and informative comments. This work is supported by the National Key Research and Development Program of China (No. 2016YFB1001501). Mingli Song is supported by Fundamental Research Funds for the Central Universities (2017FZA5014).
Qingqun Ning is currently a Ph.D. candidate in the Department of Computer Science and Technology, Zhejiang University of China. Before that, he received the B.S. degree from Northwestern Polytechnical University of China in 2011. His research interests include machine learning and computer vision, with a focus on large scale image search and object detection.
References (47)
- et al.
Stitching contaminated images
Neurocomputing
(2016) - et al.
Object cosegmentation by nonrigid mapping
Neurocomputing
(2014) - et al.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2000) - et al.
Efficient graph-based image segmentation
Int. J. Comput. Vis.
(2004) - et al.
Cosegmentation of image pairs by histogram matching – incorporating a global constraint into MRFS
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
(2006) - et al.
Interactively co-segmentating topically related images with intelligent scribble guidance
Int. J. Comput. Vis.
(2011) - et al.
Cosegmentation revisited: models and optimization
Proceedings of the European Conference on Computer Vision
(2010) - et al.
An efficient algorithm for co-segmentation
Proceedings of the International Conference on Computer Vision
(2009) - et al.
Half-integrality based algorithms for cosegmentation of images
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
(2009) - et al.
Object co-segmentation based on shortest path algorithm and saliency model
IEEE Trans. Multimedia
(2012)
Higher-order image co-segmentation
IEEE Trans. Multimedia
Unsupervised co-segmentation through region matching
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Co-localization in real-world images
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Object cosegmentation
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Scale invariant cosegmentation for image groups
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Unsupervised joint object discovery and segmentation in internet images
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Enriching visual knowledge bases via object discovery and segmentation
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Joint segmentation and recognition of categorized objects from noisy web image collection
IEEE Trans. Image Process.
Boundary preserving dense local regions
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Key-segments for video object segmentation
Proceedings of the International Conference on Computer Vision
Segmentation propagation in imagenet
Proceedings of the European Conference on Computer Vision
Figure-ground segmentation by transferring window masks
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
Exemplar cut
Proceedings of the International Conference on Computer Vision
Cited by (4)
A survey on image and video cosegmentation: Methods, challenges and analyses
2020, Pattern RecognitionWeak Supervision Learning for Object Co-Segmentation
2022, IEEE Transactions on Big DataImage cosegmentation via cosaliency guided and spline regression
2019, Proceedings - 2019 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019A Review of Recent Advances in Image Co-Segmentation Techniques
2019, IEEE Access
Qingqun Ning is currently a Ph.D. candidate in the Department of Computer Science and Technology, Zhejiang University of China. Before that, he received the B.S. degree from Northwestern Polytechnical University of China in 2011. His research interests include machine learning and computer vision, with a focus on large scale image search and object detection.
Zhao Liu obtained his Ph.D. degree in Computer Science from Zhejiang University in 2017. He received his B.S. degree in Computer Science and Technology from Zhejiang University in the year of 2009. His research interests are mainly in computer vision, including the problem of human pose estimation, image segmentation and the analysis of example-based images.
Jianke Zhu is an Associate Professor in College of Computer Science at Zhejiang University. He received his Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong. He was a postdoc in BIWI Computer Vision Lab at ETH Zurich. His research interests include computer vision and multimedia information retrieval. He is a senior member of the IEEE.
Mingli Song received the Ph.D. degree in computer science from Zhejiang University, Hangzhou, China, in 2006. He is currently a Professor with the Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Zhejiang University. His research interests include pattern classification, weakly supervised clustering, color and texture analysis, object recognition, and reconstruction. Dr. Song was a recipient the Microsoft Research Fellowship Award in 2004. He is an Associate Editor of Information Sciences, Neurocomputing, Neural Processing Letters, and the Journal of Visual Communication and Image Representation.
Jiajun Bu received the B.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.
Chun Chen is a Professor in College of Computer Science, Zhejiang University. He received his Ph.D. degree in College of Computer Science from Zhejiang University. His research interests include Image Processing, Computer Vision, CSCW and Embedded Systems.