Skip to main content
Log in

YOLO-CORE: Contour Regression for Efficient Instance Segmentation

  • Research Article
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Y. Gao, Y. H. Shan, Y. P. Wang, X. Zhao, Y. N. Yu, M. Yang, K. Q. Huang. SSAP: Single-shot instance segmentation with affinity pyramid. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 642–651, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00073.

    Google Scholar 

  2. Y. Z. Zhou, Y. Zhu, Q. X. Ye, Q. Qiu, J. B. Jiao. Weakly supervised instance segmentation using class peak response. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3791–3800, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00399.

    Google Scholar 

  3. Z. Y. Zhang, S. Fidler, R. Urtasun. Instance-level segmentation for autonomous driving with deep densely connected MRFs. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 669–677, 2016. DOI: https://doi.org/10.1109/CVPR.2016.79.

  4. B. Romera-Paredes, P. H. S. Torr. Recurrent instance segmentation. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 312–329, 2016. DOI: https://doi.org/10.1007/978-3-319-46466-4_19.

    Google Scholar 

  5. A. Khoreva, R. Benenson, J. Hosang, M. Hein, B. Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1665–1674, 2017. DOI: https://doi.org/10.1109/CVPR.2017.181.

  6. W. C. Gu, S. Bai, L. X. Kong. A review on 2D instance segmentation based on deep neural networks. Image and Vision Computing, vol. 120, Article number 104401, 2022. DOI: https://doi.org/10.1016/j.imavis.2022.104401.

  7. K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.322.

  8. S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia. Path aggregation network for instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 8759–8768, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00913.

    Google Scholar 

  9. M. Bai, R. Urtasun. Deep watershed transform for instance segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2858–2566, 2017. DOI: https://doi.org/10.1109/CVPR.2017.305.

  10. W. Q. Xu, H. Y. Wang, F. B. Qi, C. W. Lu. Explicit shape encoding for real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 5167–5176, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00527.

    Google Scholar 

  11. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 91–99, 2015.

  12. H. Ling, J. Gao, A. Kar, W. Z. Chen, S. Fidler. Fast interactive object annotation with curve-GCN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5252–5261, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00540.

    Google Scholar 

  13. G. Liu, J. Han, W. Z. Rong. Feedback-driven loss function for small object detection. Image and Vision Computing, vol. 111, Article number 104197, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104197.

  14. Z. Chen, Z. H. Fu, J. Q. Huang, M. Y. Tao, R. X. Jiang, X. Tian, Y. W. Chen, X. S. Hua. Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection. Image and Vision Computing, vol. 116, Article number 104314, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104314.

  15. J. Y. Chen, T. Y. Bai. SAANet: Spatial adaptive alignment network for object detection in automatic driving. Image and Vision Computing, vol. 94, Article number 103873, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103873.

  16. L. Aziz, S. B. H. Salam, S. Ayub. Multi-level refinement enriched feature pyramid network for object detection. Image and Vision Computing, vol. 115, Article number 104287, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104287.

  17. J. Redmon, A. Farhadi. YOLOv3: An incremental improvement, [Online], Available: https://arxiv.org/abs/1804.02767, 2018.

  18. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

    Google Scholar 

  19. Z. J. Huang, L. C. Huang, Y. C. Gong, C. Huang, X. G. Wang. Mask scoring R-CNN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 6402–6411, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00657.

    Google Scholar 

  20. J. F. Dai, K. M. He, J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 3150–3158, 2016. DOI: https://doi.org/10.1109/CVPR.2016.343.

  21. S. Liu, J. Y. Jia, S. Fidler, R. Urtasun. SGN: Sequential grouping networks for instance segmentation. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 3516–3524, 2017. DOI: https://doi.org/10.1109/IC-CV.2017.378.

  22. D. Bolya, C. Zhou, F. Y. Xiao, Y. J. Lee. YOLACT: Real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 9156–9165, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00925.

    Google Scholar 

  23. X. L. Chen, R. Girshick, K. M. He, P. Dollár. TensorMask: A foundation for dense object segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 2061–2069, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00215.

    Google Scholar 

  24. P. O. Pinheiro, R. Collobert, P. Dollár. Learning to segment object candidates. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 1990–1998, 2015.

  25. P. O. Pinheiro, T. Y. Lin, R. Collobert, P. Dollár. Learning to refine object segments. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 75–91, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_5.

    Google Scholar 

  26. S. Jetley, M. Sapienza, S. Golodetz, P. H. S. Torr. Straight to shapes: Real-time detection of encoded shapes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4207–4216, 2017. DOI: https://doi.org/10.1109/CVPR.2017.448.

  27. J. Guo, H. He, T. He, L. Lausen, M. Li, H. B. Lin, X. J. Shi, C. G. Wang, J. Y. Xie, S. Zha, A. Zhang, H. Zhang, Z. Zhang, Z. Y. Zhang, S. Zheng, Y. Zhu. Gluoncv and gluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, vol. 21, no. 1, Article number 23, 2020.

    Google Scholar 

  28. B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, J. Malik. Semantic contours from inverse detectors. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 991–998, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126343.

    Google Scholar 

  29. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.

    Google Scholar 

  30. S. D. Peng, W. Jiang, H. J. Pi, X. L. Li, H. J. Bao, X. W. Zhou. Deep snake for real-time instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8530–8539, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00856.

    Google Scholar 

  31. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.

    Article  Google Scholar 

  32. Y. Li, H. Z. Qi, J. F. Dai, X. Y. Ji, Y. C. Wei. Fully convolutional instance-aware semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4438–4446, 2017. DOI: https://doi.org/10.1109/CVPR.2017.472.

  33. E. Z. Xie, P. Z. Sun, X. G. Song, W. H. Wang, X. B. Liu, D. Liang, C. H. Shen, P. Luo. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 12190–12199, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01221.

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Nos. 2018AAA0100104 and 2018AAA0100100), Natural Science Foundation of Jiangsu Province, China (No. BK20211164). We thank the Big Data Computing Center of Southeast University, China, for providing the facility support on the numerical calculations in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Haoliang Liu received the M. Sc. degree in computer technology from Southeast University, China in 2022. He is now with Alimama Corporation, China.

His research interest is object detection.

Wei Xiong received the B. Sc. degree in computer science and technology from Soochow University, China in 2021. He is currently a master student in computer technology at School of Computer Science and Engineering, Southeast University, China.

His research interests include action quality assessment, computer vision, and deep learning.

Yu Zhang received the B. Sc. and M. Sc. degrees in telecommunications engineering from Xidian University, China in 2008 and 2010, respectively, and the Ph. D. degree in computer engineering from Nanyang Technological University, Singapore in 2015. He has been a postdoctoral fellow in Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore. He is now an associate professor in Southeast University, China.

His research interest is computer vision.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Xiong, W. & Zhang, Y. YOLO-CORE: Contour Regression for Efficient Instance Segmentation. Mach. Intell. Res. 20, 716–728 (2023). https://doi.org/10.1007/s11633-022-1379-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-022-1379-3

Keywords

Navigation