Skip to main content
Log in

A Multi-View features hinged siamese U-Net for image Co-segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a new U-shape structure to extract multi-view features from different images which is incorporated into the network that can be trained end-to-end for image co-segmentation task. The multi-view features integrate global correlations between images, so we can segment the common objects in different images from the features directly. Before getting the multi-view features, we extract the deep features of input images through two weights-shared streams. Then we get pixel-level similarity maps through a similarity layer from deep features. The whole architecture is a Siamese U-net hinged by multi-view features, called iMFNet for short. We further introduce Dice loss and employ both positive and negative examples to train the whole network. Furthermore, a learnable conditional random field (CRF) layer is added to iMFNet for more accurate results. Using the training data from MSRC and PASCAL VOC 2012 datasets, the iMFNet achieves the state-of-the-art performance on the Internet datasets and the competitive performance on the iCoseg datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ahmad M, Lee S W (2006) Human action recognition using multi-view image sequences. In: International conference on pattern recognition

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence (39):2481–2495

  3. Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) icoseg: Interactive co-segmentation with intelligent scribble guidance. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3169–3176

  4. Chandra S, Kokkinos I (2016) Fast, exact and multi-scale inference for semantic image segmentation with deep gaussian crfs. In: Proceedings of the European conference on computer vision, pp 402–418

  5. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  6. Choy CB, Gwak J, Savarese S, Chandraker M (2016) Universal correspondence network. In: Advances in neural information processing systems, pp 2414–2422

  7. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

  8. Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766

  9. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

  10. Faktor A, Irani M (2013) Co-segmentation by composition. In: Proceedings of the IEEE international conference on computer vision, pp 1297–1304

  11. Fu H, Xu D, Lin S, Liu J (2015) Object-based rgbd image co-segmentation with mutex constraint. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4428–4436

  12. Han J, Quan R, Zhang D, Nie F (2018) Robust object co-segmentation using background prior. IEEE Trans Image Process 27(4):1639–1651

    Article  MathSciNet  Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  14. Hong R, Hu Z, Wang R, Wang M, Tao D (2016) Multi-view object retrieval via multi-scale topic models. IEEE Trans Image Process 25(12):5814–5827

    Article  MathSciNet  Google Scholar 

  15. Hu YT, Huang JB, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the European conference on computer vision, pp 54–70

  16. Huang T W, Cai J, Yang H, Hsu H M, Hwang J N (2019) Multi-view vehicle re-identification using temporal attention model and metadata re-ranking. In: Proceedings of The IEEE conference on computer vision and pattern recognition workshops

  17. Thewlis J, Zheng S, Torr P, Vedaldi A (2016) Fully-trainable deep matching. In: Proceedings of the British machine vision conference, pp 145.1–145.12

  18. Jerripothula KR, Cai J, Yuan J (2016) Image co-segmentation via saliency co-fusion. IEEE Trans Multimedia 18(9):1896–1909

    Article  Google Scholar 

  19. Jerripothula KR, Cai J, Lu J, Yuan J (2017) Object co-skeletonization with co-segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3881–3889

  20. Joulin A, Bach F, Ponce J (2010) Discriminative clustering for image co-segmentation. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 1943–1950

  21. Kim E, Li H, Huang X (2012) A hierarchical image clustering cosegmentation framework. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 686–693

  22. Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: 2011 international conference on computer vision, pp 169–176

  23. Kim S, Min D, Ham B, Jeon S, Lin S, Sohn K (2017) Fcss: Fully convolutional self-similarity for dense semantic correspondence. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6560–6569

  24. Le W, Gang H, Sukthankar R, Xue J, Zheng N (2014) Video object discovery and co-segmentation with extremely weak supervision. In: Proceedings of the European conference on computer vision

  25. Lee C, Jang WD, Sim JY, Kim CS (2015) Multiple random walkers and their application to image cosegmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3837–3845

  26. Li L, Liu Z, Zhang J (2018) Unsupervised image co-segmentation via guidance of simple images. Neurocomputing 275:1650–1661

    Article  Google Scholar 

  27. Li S, Shao M, Fu Y (2017) Person re-identification by cross-view multi-level dictionary learning. IEEE Trans pattern Anal Mach Intell 40(12):2963–2977

    Article  Google Scholar 

  28. Li W, Jafari O H, Rother C (2018) Deep object co-segmentation. In: ACCV

  29. Li Y, Liu J, Li Z, Lu H, Ma S (2016) Object co-segmentation via salient and common regions discovery. Neurocomputing 172:225–234

    Article  Google Scholar 

  30. Liu L, Li K, Liao X (2017) Image co-segmentation by co-diffusion. Circ Syst Signal Process 36(11):4423–4440

    Article  MathSciNet  Google Scholar 

  31. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  32. Lu X, Ma C, Ni B, Yang X, Reid I, Yang MH (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European conference on computer vision, pp 353–369

  33. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3623–3632

  34. Luo W, Schwing AG, Urtasun R (2016) Efficient deep learning for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5695–5703

  35. Ma J, Li S, Qin H, Hao A (2017) Unsupervised multi-class co-segmentation via joint-cut over l1-manifold hyper-graph of discriminative image regions. IEEE Trans Image Process 26(3):1216–1230

    Article  MathSciNet  Google Scholar 

  36. Meng F, Cai J, Li H (2016) Cosegmentation of multiple image groups. Comput Vis Image Underst 146:67–76

    Article  Google Scholar 

  37. Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Fourth international conference on 3D vision (3DV), pp 565–571

  38. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528

  39. Quan R, Han J, Zhang D, Nie F (2016) Object co-segmentation via graph optimized-flexible manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 687–695

  40. Rocco I, Arandjelovic R, Sivic J (2017) Convolutional neural network architecture for geometric matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6148–6157

  41. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241

  42. Rother C, Minka T, Blake A, Kolmogorov V (2006) Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 1, pp 993–1000

  43. Rubinstein M, Joulin A, Kopf J, Liu C (2013) Unsupervised joint object discovery and segmentation in internet images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1939–1946

  44. Rubio JC, Serrat J, López A, Paragios N (2012) Unsupervised co-segmentation through region matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 749–756

  45. Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of the European conference on computer vision, pp 1–15

  46. Sun J, Ponce J (2013) Learning discriminative part detectors for image classification and cosegmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3400–3407

  47. Taniai T, Sinha SN, Sato Y (2016) Joint recovery of dense correspondence and cosegmentation in two images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4246–4255

  48. Tao D, Guo Y, Yu B, Pang J, Yu Z (2017) Deep multi-view feature learning for person re-identification. IEEE Trans Circ Syst Vid Technol 28(10):2657–2666

    Article  Google Scholar 

  49. Tao Z, Liu H, Fu H, Fu Y (2017) Image cosegmentation via saliency-guided constrained clustering with cosine similarity. In: AAAI, pp 4285–4291

  50. Vemulapalli R, Tuzel O, Liu MY, Chellapa R (2016) Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3224–3233

  51. Wang C, Zhang H, Yang L, Cao X, Xiong H (2017) Multiple semantic matching on augmented n-partite graph for object co-segmentation. IEEE Trans Image Process 26(12):5825–5839

    Article  MathSciNet  Google Scholar 

  52. Wang F, Huang Q, Guibas LJ (2013) Image co-segmentation via consistent functional maps. In: Proceedings of the IEEE international conference on computer vision, pp 849–856

  53. Wang W, Shen J (2016) Higher-order image co-segmentation. IEEE Trans Multimedia 18(6):1011–1021

    Article  Google Scholar 

  54. Wang Z, Feng Y, Qi T, Yang X, Zhang JJ (2016) Adaptive multi-view feature selection for human motion retrieval. Signal Process 120:691–701

    Article  Google Scholar 

  55. Wug OS, Lee JY, Sunkavalli K, Joo KS (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385

  56. Yoon JS, Rameau F, Kim J, Lee S, Shin S, Kweon IS (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2186–2195

  57. Yuan ZH, Lu T, Wu Y (2017) Deep-dense conditional random fields for object co-segmentation. In: IJCAI, pp 3371–3377

  58. Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1592–1599

  59. Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1-32):2

    MATH  Google Scholar 

  60. Zhang Q, Zhou J, Wang Y, Ye J, Li B (2014) Image cosegmentation via multi-task learning. In: Proceedings of the british machine vision conference

  61. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537

  62. Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85

    Article  Google Scholar 

  63. Zhou Y, Shao L (2018) Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  64. Zhou T, Zhang C, Gong C, Bhaskar H, Yang J (2018) Multiview latent space learning with feature redundancy minimization. IEEE Trans Cybern

  65. Zhou T, Zhang C, Peng X, Bhaskar H, Yang J (2019) Dual shared-specific multiview subspace clustering. IEEE Trans Cybern

  66. Zhou T, Fu H, Chen G, Shen J, Shen J, Shao L (2020) Hi-net: hybrid-fusion network for multi-modal MR image synthesis. IEEE Trans Med Imaging

Download references

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the NVIDIA TITAN XP GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yushuo Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Liu, X., Gong, X. et al. A Multi-View features hinged siamese U-Net for image Co-segmentation. Multimed Tools Appl 80, 22965–22985 (2021). https://doi.org/10.1007/s11042-020-08794-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08794-w

Keywords

Navigation