YOLO-CORE: Contour Regression for Efficient Instance Segmentation

Liu, Haoliang; Xiong, Wei; Zhang, Yu

doi:10.1007/s11633-022-1379-3

YOLO-CORE: Contour Regression for Efficient Instance Segmentation

Research Article
Published: 15 September 2023

Volume 20, pages 716–728, (2023)
Cite this article

Machine Intelligence Research Aims and scope Submit manuscript

252 Accesses
1 Altmetric
Explore all metrics

Abstract

Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boundary-Preserving Mask R-CNN

Towards Corner Case Detection by Modeling the Uncertainty of Instance Segmentation Networks

Region-Based Semantic Segmentation with End-to-End Training

References

N. Y. Gao, Y. H. Shan, Y. P. Wang, X. Zhao, Y. N. Yu, M. Yang, K. Q. Huang. SSAP: Single-shot instance segmentation with affinity pyramid. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 642–651, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00073.
Google Scholar
Y. Z. Zhou, Y. Zhu, Q. X. Ye, Q. Qiu, J. B. Jiao. Weakly supervised instance segmentation using class peak response. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3791–3800, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00399.
Google Scholar
Z. Y. Zhang, S. Fidler, R. Urtasun. Instance-level segmentation for autonomous driving with deep densely connected MRFs. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 669–677, 2016. DOI: https://doi.org/10.1109/CVPR.2016.79.
B. Romera-Paredes, P. H. S. Torr. Recurrent instance segmentation. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 312–329, 2016. DOI: https://doi.org/10.1007/978-3-319-46466-4_19.
Google Scholar
A. Khoreva, R. Benenson, J. Hosang, M. Hein, B. Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1665–1674, 2017. DOI: https://doi.org/10.1109/CVPR.2017.181.
W. C. Gu, S. Bai, L. X. Kong. A review on 2D instance segmentation based on deep neural networks. Image and Vision Computing, vol. 120, Article number 104401, 2022. DOI: https://doi.org/10.1016/j.imavis.2022.104401.
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.322.
S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia. Path aggregation network for instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 8759–8768, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00913.
Google Scholar
M. Bai, R. Urtasun. Deep watershed transform for instance segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2858–2566, 2017. DOI: https://doi.org/10.1109/CVPR.2017.305.
W. Q. Xu, H. Y. Wang, F. B. Qi, C. W. Lu. Explicit shape encoding for real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 5167–5176, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00527.
Google Scholar
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 91–99, 2015.
H. Ling, J. Gao, A. Kar, W. Z. Chen, S. Fidler. Fast interactive object annotation with curve-GCN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5252–5261, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00540.
Google Scholar
G. Liu, J. Han, W. Z. Rong. Feedback-driven loss function for small object detection. Image and Vision Computing, vol. 111, Article number 104197, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104197.
Z. Chen, Z. H. Fu, J. Q. Huang, M. Y. Tao, R. X. Jiang, X. Tian, Y. W. Chen, X. S. Hua. Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection. Image and Vision Computing, vol. 116, Article number 104314, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104314.
J. Y. Chen, T. Y. Bai. SAANet: Spatial adaptive alignment network for object detection in automatic driving. Image and Vision Computing, vol. 94, Article number 103873, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103873.
L. Aziz, S. B. H. Salam, S. Ayub. Multi-level refinement enriched feature pyramid network for object detection. Image and Vision Computing, vol. 115, Article number 104287, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104287.
J. Redmon, A. Farhadi. YOLOv3: An incremental improvement, [Online], Available: https://arxiv.org/abs/1804.02767, 2018.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
Z. J. Huang, L. C. Huang, Y. C. Gong, C. Huang, X. G. Wang. Mask scoring R-CNN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 6402–6411, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00657.
Google Scholar
J. F. Dai, K. M. He, J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 3150–3158, 2016. DOI: https://doi.org/10.1109/CVPR.2016.343.
S. Liu, J. Y. Jia, S. Fidler, R. Urtasun. SGN: Sequential grouping networks for instance segmentation. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 3516–3524, 2017. DOI: https://doi.org/10.1109/IC-CV.2017.378.
D. Bolya, C. Zhou, F. Y. Xiao, Y. J. Lee. YOLACT: Real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 9156–9165, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00925.
Google Scholar
X. L. Chen, R. Girshick, K. M. He, P. Dollár. TensorMask: A foundation for dense object segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 2061–2069, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00215.
Google Scholar
P. O. Pinheiro, R. Collobert, P. Dollár. Learning to segment object candidates. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 1990–1998, 2015.
P. O. Pinheiro, T. Y. Lin, R. Collobert, P. Dollár. Learning to refine object segments. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 75–91, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_5.
Google Scholar
S. Jetley, M. Sapienza, S. Golodetz, P. H. S. Torr. Straight to shapes: Real-time detection of encoded shapes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4207–4216, 2017. DOI: https://doi.org/10.1109/CVPR.2017.448.
J. Guo, H. He, T. He, L. Lausen, M. Li, H. B. Lin, X. J. Shi, C. G. Wang, J. Y. Xie, S. Zha, A. Zhang, H. Zhang, Z. Zhang, Z. Y. Zhang, S. Zheng, Y. Zhu. Gluoncv and gluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, vol. 21, no. 1, Article number 23, 2020.
Google Scholar
B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, J. Malik. Semantic contours from inverse detectors. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 991–998, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126343.
Google Scholar
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.
Google Scholar
S. D. Peng, W. Jiang, H. J. Pi, X. L. Li, H. J. Bao, X. W. Zhou. Deep snake for real-time instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8530–8539, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00856.
Google Scholar
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.
Article Google Scholar
Y. Li, H. Z. Qi, J. F. Dai, X. Y. Ji, Y. C. Wei. Fully convolutional instance-aware semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4438–4446, 2017. DOI: https://doi.org/10.1109/CVPR.2017.472.
E. Z. Xie, P. Z. Sun, X. G. Song, W. H. Wang, X. B. Liu, D. Liang, C. H. Shen, P. Luo. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 12190–12199, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01221.
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Nos. 2018AAA0100104 and 2018AAA0100100), Natural Science Foundation of Jiangsu Province, China (No. BK20211164). We thank the Big Data Computing Center of Southeast University, China, for providing the facility support on the numerical calculations in this paper.

Author information

These authors contribute equally to this work

Authors and Affiliations

School of Computer Science and Engineering and the Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University, Nanjing, 211189, China
Haoliang Liu, Wei Xiong & Yu Zhang

Authors

Haoliang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Haoliang Liu received the M. Sc. degree in computer technology from Southeast University, China in 2022. He is now with Alimama Corporation, China.

His research interest is object detection.

Wei Xiong received the B. Sc. degree in computer science and technology from Soochow University, China in 2021. He is currently a master student in computer technology at School of Computer Science and Engineering, Southeast University, China.

His research interests include action quality assessment, computer vision, and deep learning.

Yu Zhang received the B. Sc. and M. Sc. degrees in telecommunications engineering from Xidian University, China in 2008 and 2010, respectively, and the Ph. D. degree in computer engineering from Nanyang Technological University, Singapore in 2015. He has been a postdoctoral fellow in Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore. He is now an associate professor in Southeast University, China.

His research interest is computer vision.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Xiong, W. & Zhang, Y. YOLO-CORE: Contour Regression for Efficient Instance Segmentation. Mach. Intell. Res. 20, 716–728 (2023). https://doi.org/10.1007/s11633-022-1379-3

Download citation

Received: 06 April 2022
Accepted: 13 October 2022
Published: 15 September 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11633-022-1379-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-CORE: Contour Regression for Efficient Instance Segmentation

Abstract

Access this article

Similar content being viewed by others

Boundary-Preserving Mask R-CNN

Towards Corner Case Detection by Modeling the Uncertainty of Instance Segmentation Networks

Region-Based Semantic Segmentation with End-to-End Training

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

YOLO-CORE: Contour Regression for Efficient Instance Segmentation

Abstract

Access this article

Similar content being viewed by others

Boundary-Preserving Mask R-CNN

Towards Corner Case Detection by Modeling the Uncertainty of Instance Segmentation Networks

Region-Based Semantic Segmentation with End-to-End Training

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation