Abstract
This paper proposes a new U-shape structure to extract multi-view features from different images which is incorporated into the network that can be trained end-to-end for image co-segmentation task. The multi-view features integrate global correlations between images, so we can segment the common objects in different images from the features directly. Before getting the multi-view features, we extract the deep features of input images through two weights-shared streams. Then we get pixel-level similarity maps through a similarity layer from deep features. The whole architecture is a Siamese U-net hinged by multi-view features, called iMFNet for short. We further introduce Dice loss and employ both positive and negative examples to train the whole network. Furthermore, a learnable conditional random field (CRF) layer is added to iMFNet for more accurate results. Using the training data from MSRC and PASCAL VOC 2012 datasets, the iMFNet achieves the state-of-the-art performance on the Internet datasets and the competitive performance on the iCoseg datasets.
Similar content being viewed by others
References
Ahmad M, Lee S W (2006) Human action recognition using multi-view image sequences. In: International conference on pattern recognition
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence (39):2481–2495
Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) icoseg: Interactive co-segmentation with intelligent scribble guidance. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3169–3176
Chandra S, Kokkinos I (2016) Fast, exact and multi-scale inference for semantic image segmentation with deep gaussian crfs. In: Proceedings of the European conference on computer vision, pp 402–418
Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Choy CB, Gwak J, Savarese S, Chandraker M (2016) Universal correspondence network. In: Advances in neural information processing systems, pp 2414–2422
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Faktor A, Irani M (2013) Co-segmentation by composition. In: Proceedings of the IEEE international conference on computer vision, pp 1297–1304
Fu H, Xu D, Lin S, Liu J (2015) Object-based rgbd image co-segmentation with mutex constraint. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4428–4436
Han J, Quan R, Zhang D, Nie F (2018) Robust object co-segmentation using background prior. IEEE Trans Image Process 27(4):1639–1651
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hong R, Hu Z, Wang R, Wang M, Tao D (2016) Multi-view object retrieval via multi-scale topic models. IEEE Trans Image Process 25(12):5814–5827
Hu YT, Huang JB, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the European conference on computer vision, pp 54–70
Huang T W, Cai J, Yang H, Hsu H M, Hwang J N (2019) Multi-view vehicle re-identification using temporal attention model and metadata re-ranking. In: Proceedings of The IEEE conference on computer vision and pattern recognition workshops
Thewlis J, Zheng S, Torr P, Vedaldi A (2016) Fully-trainable deep matching. In: Proceedings of the British machine vision conference, pp 145.1–145.12
Jerripothula KR, Cai J, Yuan J (2016) Image co-segmentation via saliency co-fusion. IEEE Trans Multimedia 18(9):1896–1909
Jerripothula KR, Cai J, Lu J, Yuan J (2017) Object co-skeletonization with co-segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3881–3889
Joulin A, Bach F, Ponce J (2010) Discriminative clustering for image co-segmentation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 1943–1950
Kim E, Li H, Huang X (2012) A hierarchical image clustering cosegmentation framework. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 686–693
Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: 2011 international conference on computer vision, pp 169–176
Kim S, Min D, Ham B, Jeon S, Lin S, Sohn K (2017) Fcss: Fully convolutional self-similarity for dense semantic correspondence. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6560–6569
Le W, Gang H, Sukthankar R, Xue J, Zheng N (2014) Video object discovery and co-segmentation with extremely weak supervision. In: Proceedings of the European conference on computer vision
Lee C, Jang WD, Sim JY, Kim CS (2015) Multiple random walkers and their application to image cosegmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3837–3845
Li L, Liu Z, Zhang J (2018) Unsupervised image co-segmentation via guidance of simple images. Neurocomputing 275:1650–1661
Li S, Shao M, Fu Y (2017) Person re-identification by cross-view multi-level dictionary learning. IEEE Trans pattern Anal Mach Intell 40(12):2963–2977
Li W, Jafari O H, Rother C (2018) Deep object co-segmentation. In: ACCV
Li Y, Liu J, Li Z, Lu H, Ma S (2016) Object co-segmentation via salient and common regions discovery. Neurocomputing 172:225–234
Liu L, Li K, Liao X (2017) Image co-segmentation by co-diffusion. Circ Syst Signal Process 36(11):4423–4440
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu X, Ma C, Ni B, Yang X, Reid I, Yang MH (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision, pp 353–369
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3623–3632
Luo W, Schwing AG, Urtasun R (2016) Efficient deep learning for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5695–5703
Ma J, Li S, Qin H, Hao A (2017) Unsupervised multi-class co-segmentation via joint-cut over l1-manifold hyper-graph of discriminative image regions. IEEE Trans Image Process 26(3):1216–1230
Meng F, Cai J, Li H (2016) Cosegmentation of multiple image groups. Comput Vis Image Underst 146:67–76
Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3D vision (3DV), pp 565–571
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Quan R, Han J, Zhang D, Nie F (2016) Object co-segmentation via graph optimized-flexible manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 687–695
Rocco I, Arandjelovic R, Sivic J (2017) Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6148–6157
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Rother C, Minka T, Blake A, Kolmogorov V (2006) Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 1, pp 993–1000
Rubinstein M, Joulin A, Kopf J, Liu C (2013) Unsupervised joint object discovery and segmentation in internet images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1939–1946
Rubio JC, Serrat J, López A, Paragios N (2012) Unsupervised co-segmentation through region matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–756
Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of the European conference on computer vision, pp 1–15
Sun J, Ponce J (2013) Learning discriminative part detectors for image classification and cosegmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3400–3407
Taniai T, Sinha SN, Sato Y (2016) Joint recovery of dense correspondence and cosegmentation in two images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4246–4255
Tao D, Guo Y, Yu B, Pang J, Yu Z (2017) Deep multi-view feature learning for person re-identification. IEEE Trans Circ Syst Vid Technol 28(10):2657–2666
Tao Z, Liu H, Fu H, Fu Y (2017) Image cosegmentation via saliency-guided constrained clustering with cosine similarity. In: AAAI, pp 4285–4291
Vemulapalli R, Tuzel O, Liu MY, Chellapa R (2016) Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3224–3233
Wang C, Zhang H, Yang L, Cao X, Xiong H (2017) Multiple semantic matching on augmented n-partite graph for object co-segmentation. IEEE Trans Image Process 26(12):5825–5839
Wang F, Huang Q, Guibas LJ (2013) Image co-segmentation via consistent functional maps. In: Proceedings of the IEEE international conference on computer vision, pp 849–856
Wang W, Shen J (2016) Higher-order image co-segmentation. IEEE Trans Multimedia 18(6):1011–1021
Wang Z, Feng Y, Qi T, Yang X, Zhang JJ (2016) Adaptive multi-view feature selection for human motion retrieval. Signal Process 120:691–701
Wug OS, Lee JY, Sunkavalli K, Joo KS (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385
Yoon JS, Rameau F, Kim J, Lee S, Shin S, Kweon IS (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2186–2195
Yuan ZH, Lu T, Wu Y (2017) Deep-dense conditional random fields for object co-segmentation. In: IJCAI, pp 3371–3377
Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1592–1599
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1-32):2
Zhang Q, Zhou J, Wang Y, Ye J, Li B (2014) Image cosegmentation via multi-task learning. In: Proceedings of the british machine vision conference
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85
Zhou Y, Shao L (2018) Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhou T, Zhang C, Gong C, Bhaskar H, Yang J (2018) Multiview latent space learning with feature redundancy minimization. IEEE Trans Cybern
Zhou T, Zhang C, Peng X, Bhaskar H, Yang J (2019) Dual shared-specific multiview subspace clustering. IEEE Trans Cybern
Zhou T, Fu H, Chen G, Shen J, Shen J, Shao L (2020) Hi-net: hybrid-fusion network for multi-modal MR image synthesis. IEEE Trans Med Imaging
Acknowledgements
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the NVIDIA TITAN XP GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Liu, X., Gong, X. et al. A Multi-View features hinged siamese U-Net for image Co-segmentation. Multimed Tools Appl 80, 22965–22985 (2021). https://doi.org/10.1007/s11042-020-08794-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08794-w