Object cosegmentation by nonrigid mapping
Introduction
Image segmentation is an important research topic in image processing and computer vision. Recently, cosegmentation has received more and more attention. Different from the traditional segmentation methods, cosegmentation mainly focuses on the problem of handling clusters of images with similar foreground, which is more efficient in practice.
Most of the current existing cosegmentation studies aim at solving the problem with single object. However, images containing multiple foreground regions are common in the real-world applications. And it is necessary to investigate the image cosegmentation with multiple objects. Different from the single object cosegmentation, multiple foreground objects cosegmentation is a more difficult problem. Related challenges mainly come from the variation of different foreground types, the occlusions between the target regions and different target poses in many images. The existing cosegmentation methods have made big progress in recent years in this field [1], [2]. However, most of these methods have two common problems. First, these methods ignore the process of identifying different objects first, which always cause unclear judgements of the foreground regions. Second, the new existing methods cannot recognize the same object when it perform a nonrigid transformation between different images, and this problem is conventional in real world applications.
To tackle the above issues, in this paper, we propose an optimal framework to solve the multiple targets cosegmentation problem, we use several deformable part models to detect different target foreground regions in an image set. The reason we use deformable part model is that it is not only able to represent different kinds of foreground regions but also be able to capture the variation of these regions between different images. In addition, since the deformable model has a multiple level structure, we can trace the vote of each level of the foreground objects to improve the object retrieval process. Moreover, we use a mixed knowledge transfer learning process in our framework, besides the traditional rigid mapping process used in previous methods, we add an extra term to map the nonrigid object. By adding the nonrigid mapping, our method is able to recognize the same objects with high variations in different images.
Our contribution can be generalized to three points: (1) a mixed knowledge mapping method in estimating our foreground mask; (2) the detect stage using the deformable part models; (3) several discriminative features in representing our detected window, where our system is able to retrieve the accurate mask from the many candidate training regions.
We have conducted performance evaluation on the benchmark datasets including FlickrMFC and iCoseg. Our experimental results show that our proposed method outperforms the state-of-art-art methods around 10% on average.
Section snippets
Related work
Image segmentation is a long-studied computer vision problem, useful in extracting the crucial image regions, it is always used as a pre-process in tracking and human pose estimation. The previous image segmentation methods, such as [3], [4], need users’ interactions during the segmentation process. Though can get more accurate results, these methods are constrained in real applications because they are complex and inefficient. In recent years, model-based segmentation methods have been
Problem formulation
The segmentation process can be considered as a binary labeling problem over all the pixels in the image sets. For each image, define the pixels in a vector , and we have trained the corresponding deformable part models , the potential equation has a normal form
The first sum represents the probability a pixel vp which will take a binary label lp (1 be the foreground and 0 be the background), and the second sum is the
Dataset
We perform plenty of experiments to evaluate our method. These experiments are based on the FlickrMFC [1] and iCoseg datasets. The FlickrMFC database consists of 14 subsets, 17–20 pictures each subset. It is a challenging database for cosegmentation. Since many pictures in this database contain multiple objects, and the objects always have pose variations and contact, which makes the detection even harder. On the other hand, the iCoseg dataset consists of 38 sets, totally 643 pictures, each
Conclusion and future work
We propose a multi-level framework for solving the cosegmentation problem, which is based on the deformable part model detection and a mixed knowledge mapping process. Our presented method is able to handle both the single and multiple objects segmentation situations. In future work, we look forward to exploring an automatic mechanism to learn the parameters to retrieve features. Moreover, we will try to employ super pixel to improve the foreground recognition. Furthermore, we will investigate
Acknowledgments
The authors appreciate the reviewers for their extensive and informative comments for the improvement of this manuscript. This research was supported in part by National Natural Science Foundation of China under the Grants (61103105, 91120302).
Zhao Liu is currently a Ph.D. student in College of Computer and Software College, Zhejiang University. He received his B.S. degree in Computer Science and Technology from Zhejiang University in the year of 2009. His research interests are mainly in computer vision, including the problem of human pose estimation, image segmentation and the analysis of example-based images.
References (29)
- G. Kim, P.X. Eric, On multiple foreground cosegmentation, in: Proceedings of the Conference on Computer Vision and...
- K. Daniel, F. Vittorio, Figure-ground segmentation by transferring window masks, in: Proceedings of the Conference on...
- et al.
Grabcutinteractive foreground extraction using iterated graph cuts
ACM Trans. Graph.
(2004) - V. Gulshan, C. Rother, A. Criminisi, A. Blake, A. Zisserman, Geodesic star convexity for interactive image...
- I. Budvytis, V. Badrinarayanan, R. Cipolla, Semi-supervised video segmentation using tree structured graphical models,...
- T. Cour, F. Be´ne´zit, J. Shi, Spectral segmentation with multiscale graph decomposition, in: Proceedings of the...
- J. Kim, K. Grauman, Boundary preserving dense local regions, in: Proceedings of the Conference on Computer Vision and...
- Y.J. Lee, J. Kim, K. Grauman, Key-segments for video object segmentation, in: IEEE International Conference on Computer...
- C. Rother, T. Minka, A. Blake, V. Kolmogorov, Cosegmentation of image pairs by histogram matching—incorporating a...
- G. Andrew, C. Tsuhan, Clothing cosegmentation for recognizing people, in: Proceedings of Conference on Computer Vision...
Cited by (13)
A novel co-attention computation block for deep learning based image co-segmentation
2020, Image and Vision ComputingCitation Excerpt :We try to find out an optimal common object model according to all the considered images. Such model implies the correlation between the images [3,14,36–41]. For deep learning based image co-segmentation, the correlation computation methods are summarized as follows.
A survey on image and video cosegmentation: Methods, challenges and analyses
2020, Pattern RecognitionExtensible image object co-segmentation with sparse cooperative relations
2020, Information SciencesA new challenging image dataset with simple background for evaluating and developing co-segmentation algorithms
2020, Signal Processing: Image CommunicationCitation Excerpt :Wang and Shen [8] defined an energy function based on the background and foreground seed regions indicated by the user, which was minimized by using the graph cuts. Liu et al. [43] computed the energy of each pixel based on deformable part model learned from the training data and the energy for reflecting the labeling smoothness. The sum of these two energies was minimized by using the graph cuts to get the segmentation result.
A comprehensive overview of relevant methods of image cosegmentation
2020, Expert Systems with ApplicationsCitation Excerpt :Zhu et al. (2014) and Liu, Zhu, Bu, and Chen (2014) provided 0.58 and 0.65 with supervised information in terms of Jaccard similarity metric, respectively, and they outperformed (Kim & Xing, 2012). Then, the unsupervised method of Jerripothula et al. (2016) can handle the Flickr MFC dataset that contains multiple common objects across the images and outperformed (Liu et al., 2014) in terms of foreground/background segmentation, and not multilabel segmentation (Table 7). For the PASCAL-VOC dataset, which is more challenging and difficult than other datasets due to its extremely large intra-class variability and distracting background clutter, the work of Han et al. (2018) outperforms, as it can be seen from the results in Table 8, other state-of-the-art cosegmentation methods in both terms (average Precision and Jaccard index).
Noise-aware co-segmentation with local and global priors
2018, NeurocomputingCitation Excerpt :In order to clearly show the improvement of our proposed method, we have compared our method with the results obtained by using the original SDS framework [22] and non-rigid mapping [46]. In contrast to [46] using clean images as the input, our method is able to outperform [22,46]. This is because that the pre-filtering process helps us to find the noisy images.
Zhao Liu is currently a Ph.D. student in College of Computer and Software College, Zhejiang University. He received his B.S. degree in Computer Science and Technology from Zhejiang University in the year of 2009. His research interests are mainly in computer vision, including the problem of human pose estimation, image segmentation and the analysis of example-based images.
Jianke Zhu (IEEE member) is currently an assistant professor in College of Computer Science, Zhejiang University. He was a postdoc in BIWI computer vision lab of ETH Zurich under the supervision of Prof. Luc Van Gool. He received his Bachelor degree in Mechatronics and Computer Engineering from Beijing University of Chemical Technology, Beijing, China, his Master degree in Electrical and Electronics Engineering from University of Macau, and Ph.D. degree in Computer Science and Engineering from Chinese University of Hong Kong.
Jiajun Bu (IEEE member) received the B.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.
Chun Chen received the B.S. degree in Mathematics from Xiamen University, China, in 1981, and his M.S. and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1984 and 1990, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.