skip to main content
research-article

Affinity Derivation for Accurate Instance Segmentation

Published: 16 April 2021 Publication History

Abstract

Affinity, which represents whether two pixels belong to a same instance, is an equivalent representation to the instance segmentation labels. Conventional works do not make an explicit exploration on the affinity. In this article, we present two instance segmentation schemes based on pixel affinity information and show the effectiveness of affinity in both aspects. For proposal-free method, we predict pixel affinity for each image and then propose a simple yet effective graph merge algorithm to cluster pixels into instances. It shows that the affinity is powerful as an instance-relevant information to guide the clustering procedure in proposal-free instance segmentation. For proposal-based methods, we extend conventional framework with affinity head and introduce affinity as attached supervision in training phase. Without any additional inference cost, we can improve the performance of existing proposal-based instance segmentation methods, which shows that the affinity can also be applied as an auxiliary loss and training with such extra loss is beneficial to the training progress. Experimental results show that our schemes achieve comparable performance to other state-of-the-art instance segmentation methods. With Cityscapes training data, the proposed proposal-free method achieves 28.8 AP and the proposal-based method gets 27.2 AP both on test sets.

References

[1]
Jiwoon Ahn and Suha Kwak. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4981–4990.
[2]
A. Arnab and P. H. S. Torr. 2017. Pixelwise instance segmentation with a dynamically instantiated network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 879–888.
[3]
M. Bai and R. Urtasun. 2017. Deep watershed transform for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2858–2866.
[4]
B. D. Brabandere, D. Neven, and L. V. Gool. 2017. Semantic instance segmentation for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 478–480.
[5]
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and F. A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (Apr. 2018), 834–848.
[6]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[7]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV’18). 801–818.
[8]
Xinlei Chen, Ross Girshick, Kaiming He, and Piotr Dollar. 2019. TensorMask: A foundation for dense object segmentation. In IEEE International Conference on Computer Vision (ICCV’19). 2061–2069.
[9]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3213–3223.
[10]
Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision (ECCV’16). 534–549.
[11]
Jifeng Dai, Kaiming He, and Jian Sun. 2015. Convolutional feature masking for joint object and stuff segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3992–4000.
[12]
J. Dai, K. He, and J. Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3150–3158.
[13]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems (NeurlIPS’16). 379–387.
[14]
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. 2017. Deformable convolutional networks. In IEEE International Conference on Computer Vision (ICCV’17). 764–773.
[15]
D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. 2014. Scalable object detection using deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2155–2162.
[16]
Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P. Murphy. 2017. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277 (2017).
[17]
Jun Fu, Jing Liu, Yuhang Wang, and Hanqing Lu. 2017. Stacked deconvolutional network for semantic segmentation. arXiv preprint arXiv:1708.04943 (2017).
[18]
Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, and Kaiqi Huang. 2019. SSAP: Single-shot instance segmentation with affinity pyramid. In IEEE International Conference on Computer Vision (ICCV’19). 642–651.
[19]
Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and Jose Garcia-Rodriguez. 2017. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017).
[20]
R. Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision (ICCV’15). 1440–1448.
[21]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580–587.
[22]
Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. Detectron. Retrieved from https://github.com/facebookresearch/detectron.
[23]
K. Grauman and T. Darrell. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In IEEE International Conference on Computer Vision (ICCV), Vol. 2. 1458–1465.
[24]
Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Semantic contours from inverse detectors. In IEEE International Conference on Computer Vision (ICCV’11). IEEE, 991–998.
[25]
Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV’14). 297–312.
[26]
Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Hypercolumns for object segmentation and fine-grained localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 447–456.
[27]
Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2016. Shape-aware instance segmentation. arXiv preprint arXiv:1612.03129 (2016).
[28]
Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2017. Boundary-aware instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5696–5704.
[29]
K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.
[30]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.
[31]
Yen-Chang Hsu, Zheng Xu, Zsolt Kira, and Jiawei Huang. 2018. Learning to cluster for proposal-free instance segmentation. In International Joint Conference on Neural Networks (IJCNN’18). 1–8.
[32]
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3296–3297.
[33]
M. A. Islam, M. Rochan, N. D. B. Bruce, and Y. Wang. 2017. Gated feedback refinement network for dense image labeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4877–4885.
[34]
Long Jin, Zeyu Chen, and Zhuowen Tu. 2016. Object detection free instance segmentation with labeling transformations. arXiv preprint arXiv:1611.08991 (2016).
[35]
Tsung-Wei Ke, Jyh-Jing Hwang, Ziwei Liu, and Stella X. Yu. 2018. Adaptive affinity fields for semantic segmentation. In European Conference on Computer Vision (ECCV’18). 605–621.
[36]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2017. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint arXiv:1705.07115 3 (2017).
[37]
Margret Keuper, Evgeny Levinkov, Nicolas Bonneel, Guillaume Lavoué, Thomas Brox, and Bjorn Andres. 2015. Efficient decomposition of image and mesh graphs by lifted multicuts. In IEEE International Conference on Computer Vision (ICCV’15). 1751–1759.
[38]
A. Kirillov, E. Levinkov, B. Andres, B. Savchynskyy, and C. Rother. 2017. InstanceCut: From edges to instances with multicut. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 7322–7331.
[39]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 2169–2178.
[40]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 4 (Dec. 1989), 541–551.
[41]
Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, and Bjoern Andres. 2017. Joint graph decomposition & node labeling: Problem, algorithms, applications. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[42]
Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. 2017. Fully convolutional instance-aware semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4438–4446.
[43]
Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. DetNet: Design backbone for object detection. In European Conference on Computer Vision (ECCV’18). 334–350.
[44]
X. Liang, L. Lin, Y. Wei, X. Shen, J. Yang, and S. Yan. 2018. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (Dec. 2018), 2978–2991.
[45]
Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015).
[46]
G. Lin, C. Shen, A. v. d. Hengel, and I. Reid. 2016. Efficient piecewise training of deep structured models for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3194–3203.
[47]
T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 936–944.
[48]
S. Liu, J. Jia, S. Fidler, and R. Urtasun. 2017. SGN: Sequential grouping networks for instance segmentation. In IEEE International Conference on Computer Vision (ICCV’17). 3516–3524.
[49]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8759–8768.
[50]
Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, and Jiaya Jia. 2016. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3141–3149.
[51]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision (ECCV’16). 21–37.
[52]
Wei Liu, Andrew Rabinovich, and Alexander C. Berg. 2015. ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579 (2015).
[53]
Z. Liu, X. Li, P. Luo, C. C. Loy, and X. Tang. 2015. Semantic image segmentation via deep parsing network. In IEEE International Conference on Computer Vision (ICCV’15). 1377–1385.
[54]
Davy Neven, Bert De Brabandere, Marc Proesmans, and Luc Van Gool. 2019. Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8837–8845.
[55]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, Cham, 483–499.
[56]
Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Advances in Neural Information Processing Systems (NIPS’15). 1990–1998.
[57]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 779–788.
[58]
S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.
[59]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (01 Dec. 2015), 211–252.
[60]
E. Shelhamer, J. Long, and T. Darrell. 2017. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 640–651.
[61]
Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Shao-Yi Chien, Ming-Hsuan Yang, and Jan Kautz. 2018. Learning superpixels with segmentation-aware affinity loss. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 568–576.
[62]
Jonas Uhrig, Marius Cordts, Uwe Franke, and Thomas Brox. 2016. Pixel-level encoding and depth layering for instance-level semantic labeling. In German Conference on Pattern Recognition (GCPR’16). 14–25.
[63]
Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. 2018. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 2569–2578.
[64]
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. 2019. UPSNet: A unified panoptic segmentation network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 8818–8826.
[65]
Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
[66]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6230–6239.
[67]
Yueqing Zhuang, Li Tao, Fan Yang, Cong Ma, Ziwei Zhang, Huizhu Jia, and Xiaodong Xie. 2018. RelationNet: Learning deep-aligned representation for semantic image segmentation. In International Conference on Pattern Recognition (ICPR’18). 1506–1511.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1
February 2021
392 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3453992
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2021
Accepted: 01 June 2020
Revised: 01 April 2020
Received: 01 October 2019
Published in TOMM Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Instance segmentation
  2. semantic segmentation
  3. graph

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • NSFC
  • Youth Innovation Promotion Association CAS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 217
    Total Downloads
  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media