research-article

Single-stage Instance Segmentation

Authors:

Yan LuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 16, Issue 3

Article No.: 86, Pages 1 - 19

https://doi.org/10.1145/3387926

Published: 05 July 2020 Publication History

Abstract

Albeit the highest accuracy of object detection is generally acquired by multi-stage detectors, like R-CNN and its extension approaches, the single-stage object detectors also achieve remarkable performance with faster execution and higher scalability. Inspired by this, we propose a single-stage framework to tackle the instance segmentation task. Building on a single-stage object detection network in hand, our model outputs the detected bounding box of each instance, the semantic segmentation result, and the pixel affinity simultaneously. After that, we generate the final instance masks via a fast post-processing method with the help of the three outputs above. As far as we know, it is the first attempt to segment instances in a single-stage pipeline on challenging datasets. Extensive experiments demonstrate the efficiency of our post-processing method, and the proposed framework obtains competitive results as a single-stage instance segmentation method. We achieve 32.5 box AP and 26.0 mask AP on the COCO validation set with 500 pixels input scale and 22.9 mask AP on the Cityscapes test set.

References

[1]

Anurag Arnab and Philip H. S. Torr. 2016. Bottom-up instance segmentation using deep higher-order CRFs. arXiv:1609.02583. (2016).

[2]

Anurag Arnab and Philip H. S. Torr. 2017. Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[3]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2481--2495.

[4]

Min Bai and Raquel Urtasun. 2017. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[5]

Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[6]

Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. YOLACT: Real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).

[7]

Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, and Hartwig Adam. 2018. MaskLab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).

[8]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18).

[9]

Yadang Chen, Chuanyan Hao, Alex X. Liu, and Enhua Wu. 2019. Appearance-consistent video object segmentation based on a multinomial event model. ACM Trans. Multimedia Comput. Commun. Applic. 15, 2 (2019), 40.

Digital Library

[10]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[11]

Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[12]

Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’16).

[13]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[14]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303--338.

Digital Library

[15]

Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Tai-Jiang Mu, and Shi-Min Hu. 2017. S net: Single stage salient-instance segmentation. arXiv:1711.07618. (2017).

[16]

Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P. Murphy. 2017. Semantic instance segmentation via deep metric learning. arXiv:1703.10277. (2017).

[17]

Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single shot detector. arXiv:1701.06659. (2017).

[18]

Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, and Kaiqi Huang. 2019. SSAP: Single-shot instance segmentation with affinity pyramid. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).

[19]

Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. Detectron. Retrieved from https://github.com/facebookresearch/detectron.

[20]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10).

[21]

Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2014. Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’14).

[22]

Zeeshan Hayder, Xuming He, and Mathieu Salzmann. 2017. Boundary-aware instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[23]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).

[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision (ECCV’14).

[25]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[26]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. (2017).

[27]

Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. 2018. Learning to segment every thing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).

[28]

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[29]

Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, and Carsten Rother. 2017. InstanceCut: From edges to instances with multicut. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[30]

Hei Law and Jia Deng. 2018. CornerNet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV’18).

Digital Library

[31]

Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2017. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[32]

Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[33]

Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. DetNet: Design backbone for object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18).

[34]

Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2017. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2017), 2978--2991.

Digital Library

[35]

Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[36]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).

[37]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14).

[38]

Songtao Liu, Di Huang, and Yunhong Wang. 2018. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18).

[39]

Shu Liu, Jiaya Jia, Sanja Fidler, and Raquel Urtasun. 2017. SGN: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).

[40]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).

[41]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV’16).

[42]

Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, and Yan Lu. 2018. Affinity derivation and graph merge for instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18).

[43]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).

[44]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShufflenNet v2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV’18).

[45]

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15).

Digital Library

[46]

Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’15).

[47]

Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision (ECCV’16).

[48]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[49]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767. (2018).

[50]

Mengye Ren and Richard S. Zemel. 2017. End-to-end instance segmentation with recurrent attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[51]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’15).

Digital Library

[52]

Bernardino Romera-Paredes and Philip Hilaire Sean Torr. 2016. Recurrent instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’16).

[53]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention (MICCAI’15).

[54]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.

Digital Library

[55]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).

[56]

Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, and Abhinav Gupta. 2016. Beyond skip connections: Top-down modulation for object detection. arXiv:1612.06851. (2016).

[57]

Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. 2018. IGCV3: Interleaved low-rank group convolutions for efficient deep neural networks. In Proceedings of the British Machine Vision Conference (BMVC’18).

[58]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).

[59]

Jonas Uhrig, Marius Cordts, Uwe Franke, and Thomas Brox. 2016. Pixel-level encoding and depth layering for instance-level semantic labeling. In Proceedings of the German Conference on Pattern Recognition (GCPR’16).

[60]

Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018. Understanding convolution for semantic segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18).

[61]

Zifeng Wu, Chunhua Shen, and Anton van den Hengel. 2016. Bridging category-level and instance-level semantic image segmentation. arXiv:1605.06885. (2016).

[62]

Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Xuebo Liu, Ding Liang, Chunhua Shen, and Ping Luo. 2019. PolarMask: Single shot instance segmentation with polar representation. arXiv:1909.13226. (2019).

[63]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[64]

Wenqiang Xu, Haiyang Wang, Fubo Qi, and Cewu Lu. 2019. Explicit shape encoding for real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).

[65]

Bo Zhang, Nicola Conci, and Francesco G. B. De Natale. 2015. Segmentation of discriminative patches in human activity video. ACM Trans. Multimedia Comput. Commun. Applic. 12, 1 (2015), 4.

Digital Library

[66]

Qianni Zhang and Ebroul Izquierdo. 2013. Multifeature analysis and semantic context learning for image classification. ACM Trans. Multimedia Comput. Commun. Applic. 9, 2 (2013), 12.

Digital Library

[67]

Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2018. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).

[68]

Ziyu Zhang, Sanja Fidler, and Raquel Urtasun. 2016. Instance-level segmentation for autonomous driving with deep densely connected MRFs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

Cited By

Long JLin JLiu JLiu L(2024)Toward Robust Segmentation of Polyp via Box-supervised and Feature-EmbeddedArabian Journal for Science and Engineering10.1007/s13369-024-09762-4Online publication date: 15-Nov-2024
https://doi.org/10.1007/s13369-024-09762-4
Mei HYu LXu KWang YYang XWei XLau R(2023)Mirror Segmentation via Semantic-aware Contextual Contrasted Feature LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356612719:2s(1-22)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3566127
Wu JZhu TZhu JLi TWang C(2023)A Optimized BERT for Multimodal Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356612619:2s(1-12)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3566126
Show More Cited By

Index Terms

Single-stage Instance Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Object detection

Recommendations

Nuclei and glands instance segmentation in histology images: a narrative review
Abstract
Examination of tissue biopsy and quantification of the various characteristics of cellular processes are clinical benchmarks in cancer diagnosis. Nuclei and glands instance segmentation greatly assists the high-throughput quantification of ...
ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation
Abstract
We consider the competition between instance and semantic segmentation in panoptic segmentation to develop the deep chain instance segmentation network (ChaInNet) to mitigate this problem. Segmentation competition is caused by the usual ...
Enhancing gland segmentation in colon histology images using an instance-aware diffusion model
Abstract
In pathological image analysis, determination of gland morphology in histology images of the colon is essential to determine the grade of colon cancer. However, manual segmentation of glands is extremely challenging and there is a need to develop ...
Highlights
- Our approach models gland instance segmentation in histology images as denoising with a diffusion model.
- To improve segmentation, we use instance-aware methods to recover lost details post-denoising.
- To improve object-background ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 3

August 2020

364 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3409646

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2020

Online AM: 07 May 2020

Accepted: 01 March 2020

Revised: 01 February 2020

Received: 01 August 2019

Published in TOMM Volume 16, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSFC
National Natural Science Foundation of China
Youth Innovation Promotion Association CAS

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
449
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Long JLin JLiu JLiu L(2024)Toward Robust Segmentation of Polyp via Box-supervised and Feature-EmbeddedArabian Journal for Science and Engineering10.1007/s13369-024-09762-4Online publication date: 15-Nov-2024
https://doi.org/10.1007/s13369-024-09762-4
Mei HYu LXu KWang YYang XWei XLau R(2023)Mirror Segmentation via Semantic-aware Contextual Contrasted Feature LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356612719:2s(1-22)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3566127
Wu JZhu TZhu JLi TWang C(2023)A Optimized BERT for Multimodal Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356612619:2s(1-12)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3566126
Bakasa WViriri S(2021)Pancreatic Cancer Survival Prediction: A Survey of the State-of-the-ArtComputational and Mathematical Methods in Medicine10.1155/2021/11884142021(1-17)Online publication date: 30-Sep-2021
https://doi.org/10.1155/2021/1188414

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents