Abstract
Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).
Similar content being viewed by others
References
N. Y. Gao, Y. H. Shan, Y. P. Wang, X. Zhao, Y. N. Yu, M. Yang, K. Q. Huang. SSAP: Single-shot instance segmentation with affinity pyramid. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 642–651, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00073.
Y. Z. Zhou, Y. Zhu, Q. X. Ye, Q. Qiu, J. B. Jiao. Weakly supervised instance segmentation using class peak response. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3791–3800, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00399.
Z. Y. Zhang, S. Fidler, R. Urtasun. Instance-level segmentation for autonomous driving with deep densely connected MRFs. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 669–677, 2016. DOI: https://doi.org/10.1109/CVPR.2016.79.
B. Romera-Paredes, P. H. S. Torr. Recurrent instance segmentation. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 312–329, 2016. DOI: https://doi.org/10.1007/978-3-319-46466-4_19.
A. Khoreva, R. Benenson, J. Hosang, M. Hein, B. Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 1665–1674, 2017. DOI: https://doi.org/10.1109/CVPR.2017.181.
W. C. Gu, S. Bai, L. X. Kong. A review on 2D instance segmentation based on deep neural networks. Image and Vision Computing, vol. 120, Article number 104401, 2022. DOI: https://doi.org/10.1016/j.imavis.2022.104401.
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.322.
S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia. Path aggregation network for instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 8759–8768, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00913.
M. Bai, R. Urtasun. Deep watershed transform for instance segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2858–2566, 2017. DOI: https://doi.org/10.1109/CVPR.2017.305.
W. Q. Xu, H. Y. Wang, F. B. Qi, C. W. Lu. Explicit shape encoding for real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 5167–5176, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00527.
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 91–99, 2015.
H. Ling, J. Gao, A. Kar, W. Z. Chen, S. Fidler. Fast interactive object annotation with curve-GCN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5252–5261, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00540.
G. Liu, J. Han, W. Z. Rong. Feedback-driven loss function for small object detection. Image and Vision Computing, vol. 111, Article number 104197, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104197.
Z. Chen, Z. H. Fu, J. Q. Huang, M. Y. Tao, R. X. Jiang, X. Tian, Y. W. Chen, X. S. Hua. Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection. Image and Vision Computing, vol. 116, Article number 104314, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104314.
J. Y. Chen, T. Y. Bai. SAANet: Spatial adaptive alignment network for object detection in automatic driving. Image and Vision Computing, vol. 94, Article number 103873, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103873.
L. Aziz, S. B. H. Salam, S. Ayub. Multi-level refinement enriched feature pyramid network for object detection. Image and Vision Computing, vol. 115, Article number 104287, 2021. DOI: https://doi.org/10.1016/j.imavis.2021.104287.
J. Redmon, A. Farhadi. YOLOv3: An incremental improvement, [Online], Available: https://arxiv.org/abs/1804.02767, 2018.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Z. J. Huang, L. C. Huang, Y. C. Gong, C. Huang, X. G. Wang. Mask scoring R-CNN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 6402–6411, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00657.
J. F. Dai, K. M. He, J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 3150–3158, 2016. DOI: https://doi.org/10.1109/CVPR.2016.343.
S. Liu, J. Y. Jia, S. Fidler, R. Urtasun. SGN: Sequential grouping networks for instance segmentation. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 3516–3524, 2017. DOI: https://doi.org/10.1109/IC-CV.2017.378.
D. Bolya, C. Zhou, F. Y. Xiao, Y. J. Lee. YOLACT: Real-time instance segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 9156–9165, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00925.
X. L. Chen, R. Girshick, K. M. He, P. Dollár. TensorMask: A foundation for dense object segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 2061–2069, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00215.
P. O. Pinheiro, R. Collobert, P. Dollár. Learning to segment object candidates. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 1990–1998, 2015.
P. O. Pinheiro, T. Y. Lin, R. Collobert, P. Dollár. Learning to refine object segments. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 75–91, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_5.
S. Jetley, M. Sapienza, S. Golodetz, P. H. S. Torr. Straight to shapes: Real-time detection of encoded shapes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4207–4216, 2017. DOI: https://doi.org/10.1109/CVPR.2017.448.
J. Guo, H. He, T. He, L. Lausen, M. Li, H. B. Lin, X. J. Shi, C. G. Wang, J. Y. Xie, S. Zha, A. Zhang, H. Zhang, Z. Zhang, Z. Y. Zhang, S. Zheng, Y. Zhu. Gluoncv and gluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, vol. 21, no. 1, Article number 23, 2020.
B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, J. Malik. Semantic contours from inverse detectors. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 991–998, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126343.
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.
S. D. Peng, W. Jiang, H. J. Pi, X. L. Li, H. J. Bao, X. W. Zhou. Deep snake for real-time instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8530–8539, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00856.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.
Y. Li, H. Z. Qi, J. F. Dai, X. Y. Ji, Y. C. Wei. Fully convolutional instance-aware semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4438–4446, 2017. DOI: https://doi.org/10.1109/CVPR.2017.472.
E. Z. Xie, P. Z. Sun, X. G. Song, W. H. Wang, X. B. Liu, D. Liang, C. H. Shen, P. Luo. Polarmask: Single shot instance segmentation with polar representation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 12190–12199, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01221.
Acknowledgements
This work was supported by the National Key R&D Program of China (Nos. 2018AAA0100104 and 2018AAA0100100), Natural Science Foundation of Jiangsu Province, China (No. BK20211164). We thank the Big Data Computing Center of Southeast University, China, for providing the facility support on the numerical calculations in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declared that they have no conflicts of interest to this work.
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Haoliang Liu received the M. Sc. degree in computer technology from Southeast University, China in 2022. He is now with Alimama Corporation, China.
His research interest is object detection.
Wei Xiong received the B. Sc. degree in computer science and technology from Soochow University, China in 2021. He is currently a master student in computer technology at School of Computer Science and Engineering, Southeast University, China.
His research interests include action quality assessment, computer vision, and deep learning.
Yu Zhang received the B. Sc. and M. Sc. degrees in telecommunications engineering from Xidian University, China in 2008 and 2010, respectively, and the Ph. D. degree in computer engineering from Nanyang Technological University, Singapore in 2015. He has been a postdoctoral fellow in Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore. He is now an associate professor in Southeast University, China.
His research interest is computer vision.
Rights and permissions
About this article
Cite this article
Liu, H., Xiong, W. & Zhang, Y. YOLO-CORE: Contour Regression for Efficient Instance Segmentation. Mach. Intell. Res. 20, 716–728 (2023). https://doi.org/10.1007/s11633-022-1379-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-022-1379-3