Abstract
Instance segmentation requires both pixel-level classification accuracy and high-level semantic features at the target instance level, which is very challenging, and the cascade structure can effectively improve both of these problems. To make full use of the relationship between detection and segmentation, this paper proposes a joint multi-tasking cascade structure, which is not simply to cascade the two tasks of detection and segmentation, but to unitedly put them into multi-stage processing, and especially to integrate the information at different stages of the mask branch. The entire structure can effectively utilize the superior characteristics of each stage in the matter of detection and segmentation, thus improving the quality of mask prediction. The feature fusion process is introduced in the full convolution networks (FCN) branch, and the high-level and low-level features are effectively fused to enhance the contextual information of the picture semantic features. The experiments demonstrate the better results on the COCO dataset.
Similar content being viewed by others
References
Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4), 834–848 (2016)
Dai, J.F., He, K.M., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3150–3158. (2016)
Li, Y., Qi, H.Z., Dai, J.F., et al.: Fully convolutional instance-aware semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446. (2017)
Dai, J.F., He, K.M., Li, Y., et al.: Instance-sensitive fully convolutional networks. In: Proceedings of European Conference on Computer Vision, pp. 534–549. (2016)
Ren, M.Y., Zemel ,R.S.: End-to-end instance segmentation with recurrent attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6656–6664. (2017)
He, K.M., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2961–2969. (2017)
Ren ,S.Q., He, K.M., Girshick, R., et al.: Faster R-CNN: towards realtime object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 1137–1149. MIT Press, Cambridge (2015)
Chen, L.C., Hermans, A., Papandreou, G., et al.: Masklab: instance segmentation by refining object detection with semantic and direction features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4022. (2018)
Zhang, Z.Y., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFS. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 669–677. (2016)
Zhang, Z.Y., Schwing, A.G., Fidler, S., et al.: Monocular object instance segmentation and depth ordering with CNNS. In Proceedings of IEEE International Conference on Computer Vision, pp. 2614–2622. (2015)
Arnab, A., Torr, P.H.: Bottom-up instance segmentation using deep higher order CRFS. arXiv preprint arXiv. 2016: 1609.02583
Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5221–5229. (2017)
Tang, Y.B., Wu, X.Q.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017)
Liu, C.S., Chang, F.L.: Hybrid cascade structure for license plate detection in large visual surveillance scenes. IEEE Trans. Intell. Transp. Syst. 20(6), 2122–2135 (2019)
Cai, Z.W., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162. (2018)
Liu, Y.S., Liu, Y.B., Ding, L.W.: Scene classification based on two-stage deep feature fusion. IEEE Geosci. Remote Sens. Lett. 15(2), 183–186 (2018)
Chen, J.K., Chen, Z.H., Chi, Z.R., et al.: Facial expression recognition in video with multiple feature fusion. IEEE Trans. Affect. Comput. 9(1), 38–50 (2018)
Liu, A.B., Yang, Y.Q., Sun, Q.Y., et al.: A deep fully convolution neural network for semantic segmentation based on adaptive feature fusion. In: International Conference on Information Science and Control Engineering, pp. 16–20. (2018)
Bodla, N., Singh, B., Chellappa, R., et al.: Soft-NMS improving object detection with one line of code. In: Proceedings of IEEE International Conference on Computer Vision, pp. 5561–5569. (2017)
Zabalza, J., Ren, J.C., Zheng, J.B., et al.: Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans. Geosci. Remote Sens. 53(8), 4418–4433 (2015)
Shamsolmoali, P., Zareapoor, M., Wang, R.L., et al.: A novel deep structure U-net for sea-land segmentation in remote sensing images. IEEE J Sel. Top. Appl. Earth Observ. Remote Sens. 12(9), 3219–3232 (2019)
Lin, T.Y., Dollar, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944. (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. (2015)
Prabhakar, K.R., Srikar, V.S., Babu, R.V.: DeepFuse: a deep unsupervised approach for exposure fusion with extreme exposure image Pairs. In: Proceedings of IEEE International Conference on Computer Vision. (2017)
Liu, S., Qi, L., Qin, H.F., et al.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768. (2018)
Zhang, H., Tian, Y.L., Wang, K.F., et al.: Mask SSD: an effective single-stage approach to object instance segmentation. IEEE Trans. Image Process. 29(1), 2078–2093 (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of. China (Nos. 61876121 and 61472267), the Foundation of Key Research and Development Projects in Jiangsu Province (No. BE2017663), the Foundation of Natural Science Research Program in Jiangsu Province higher education (Nos. 19KJB520054 and 18KJB510042), the Foundation of Science and Technology Project of Suzhou Water Conservancy and Water affairs (2015-7-5).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wen, Y., Hu, F., Ren, J. et al. Joint multi-task cascade for instance segmentation. J Real-Time Image Proc 17, 1983–1989 (2020). https://doi.org/10.1007/s11554-020-01007-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01007-5