Skip to main content
Log in

Joint multi-task cascade for instance segmentation

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Instance segmentation requires both pixel-level classification accuracy and high-level semantic features at the target instance level, which is very challenging, and the cascade structure can effectively improve both of these problems. To make full use of the relationship between detection and segmentation, this paper proposes a joint multi-tasking cascade structure, which is not simply to cascade the two tasks of detection and segmentation, but to unitedly put them into multi-stage processing, and especially to integrate the information at different stages of the mask branch. The entire structure can effectively utilize the superior characteristics of each stage in the matter of detection and segmentation, thus improving the quality of mask prediction. The feature fusion process is introduced in the full convolution networks (FCN) branch, and the high-level and low-level features are effectively fused to enhance the contextual information of the picture semantic features. The experiments demonstrate the better results on the COCO dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4), 834–848 (2016)

    Article  Google Scholar 

  2. Dai, J.F., He, K.M., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3150–3158. (2016)

  3. Li, Y., Qi, H.Z., Dai, J.F., et al.: Fully convolutional instance-aware semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446. (2017)

  4. Dai, J.F., He, K.M., Li, Y., et al.: Instance-sensitive fully convolutional networks. In: Proceedings of European Conference on Computer Vision, pp. 534–549. (2016)

  5. Ren, M.Y., Zemel ,R.S.: End-to-end instance segmentation with recurrent attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656–6664. (2017)

  6. He, K.M., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2961–2969. (2017)

  7. Ren ,S.Q., He, K.M., Girshick, R., et al.: Faster R-CNN: towards realtime object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 1137–1149. MIT Press, Cambridge (2015)

  8. Chen, L.C., Hermans, A., Papandreou, G., et al.: Masklab: instance segmentation by refining object detection with semantic and direction features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4022. (2018)

  9. Zhang, Z.Y., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFS. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 669–677. (2016)

  10. Zhang, Z.Y., Schwing, A.G., Fidler, S., et al.: Monocular object instance segmentation and depth ordering with CNNS. In Proceedings of IEEE International Conference on Computer Vision, pp. 2614–2622. (2015)

  11. Arnab, A., Torr, P.H.: Bottom-up instance segmentation using deep higher order CRFS. arXiv preprint arXiv. 2016: 1609.02583

  12. Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5221–5229. (2017)

  13. Tang, Y.B., Wu, X.Q.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017)

    Article  MathSciNet  Google Scholar 

  14. Liu, C.S., Chang, F.L.: Hybrid cascade structure for license plate detection in large visual surveillance scenes. IEEE Trans. Intell. Transp. Syst. 20(6), 2122–2135 (2019)

    Article  MathSciNet  Google Scholar 

  15. Cai, Z.W., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162. (2018)

  16. Liu, Y.S., Liu, Y.B., Ding, L.W.: Scene classification based on two-stage deep feature fusion. IEEE Geosci. Remote Sens. Lett. 15(2), 183–186 (2018)

    Article  Google Scholar 

  17. Chen, J.K., Chen, Z.H., Chi, Z.R., et al.: Facial expression recognition in video with multiple feature fusion. IEEE Trans. Affect. Comput. 9(1), 38–50 (2018)

    Article  Google Scholar 

  18. Liu, A.B., Yang, Y.Q., Sun, Q.Y., et al.: A deep fully convolution neural network for semantic segmentation based on adaptive feature fusion. In: International Conference on Information Science and Control Engineering, pp. 16–20. (2018)

  19. Bodla, N., Singh, B., Chellappa, R., et al.: Soft-NMS improving object detection with one line of code. In: Proceedings of IEEE International Conference on Computer Vision, pp. 5561–5569. (2017)

  20. Zabalza, J., Ren, J.C., Zheng, J.B., et al.: Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans. Geosci. Remote Sens. 53(8), 4418–4433 (2015)

    Article  Google Scholar 

  21. Shamsolmoali, P., Zareapoor, M., Wang, R.L., et al.: A novel deep structure U-net for sea-land segmentation in remote sensing images. IEEE J Sel. Top. Appl. Earth Observ. Remote Sens. 12(9), 3219–3232 (2019)

    Article  Google Scholar 

  22. Lin, T.Y., Dollar, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944. (2016)

  23. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. (2015)

  24. Prabhakar, K.R., Srikar, V.S., Babu, R.V.: DeepFuse: a deep unsupervised approach for exposure fusion with extreme exposure image Pairs. In: Proceedings of IEEE International Conference on Computer Vision. (2017)

  25. Liu, S., Qi, L., Qin, H.F., et al.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768. (2018)

  26. Zhang, H., Tian, Y.L., Wang, K.F., et al.: Mask SSD: an effective single-stage approach to object instance segmentation. IEEE Trans. Image Process. 29(1), 2078–2093 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of. China (Nos. 61876121 and 61472267), the Foundation of Key Research and Development Projects in Jiangsu Province (No. BE2017663), the Foundation of Natural Science Research Program in Jiangsu Province higher education (Nos. 19KJB520054 and 18KJB510042), the Foundation of Science and Technology Project of Suzhou Water Conservancy and Water affairs (2015-7-5).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fuyuan Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, Y., Hu, F., Ren, J. et al. Joint multi-task cascade for instance segmentation. J Real-Time Image Proc 17, 1983–1989 (2020). https://doi.org/10.1007/s11554-020-01007-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-020-01007-5

Keywords

Navigation