Abstract
After analyzing methods of object detection under the existing deep learning framework, a multitask learning model (Fully Convolution Object Detection Network, FCDN) is proposed, which can realize complete end to end semantic segmentation and object detection through deep learning, without delimiting the default boxes. First, this paper analysis the reason why the current mainstream object detection network needs the default box delineated in advance; second, an object detection network with no delimited default box needed is proposed. It uses the semantic segmentation to detect all boundaries and key points of object at the pixel level, and then obtain prediction boxes by combining the category information of the semantic segmentation map. Finally, the feasibility of the method is verified on the VOC 2007 datasets, and compared with the performance of current mainstream object detection algorithm. Results show that the semantic segmentation and object detection can be realized at the same time by the new model. Trained by the same training sample, detection precision of FCDN is superior to that of classic detection models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Girshick, R., Donahue, J., Darrell. T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE Computer Society (2014)
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Girshick, R.: Fast R-CNN. Computer Science (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Li, Y., He, K., Sun, J., et al.: R-fcn: Object detection via region based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: Youonly look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger (2016). arXiv:1612.08242
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE Computer Society (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with d deep convolutional encoder-decoder architecture for image segmeeep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: antation. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2015)
Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network (2016)
Chen, L.C., Papandreou, G., Schroff, F., et al.: DeepLab v3: rethinking atrous convolution for semantic image segmentation (2017)
Lin, G., Milan, A., Shen, C., et al.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation (2016)
Bulò, S.R., Neuhold, G., Kontschieder, P.: Loss max-pooling for semantic image segmentation (2017)
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)
Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Pohlen, T., Hermans, A., Mathias, M., et al.: Full-resolution residual networks for semantic segmentation in street scenes, 3309–3318 (2017)
Cheng, J., Liu, S., Tsai, Y.H., et al.: Learning to segment instances in videos with spatial propagation network (2017)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE Computer Society (2016)
Huang, G., Liu, Z., Laurens, V.D.M., et al.: Densely connected convolutional networks, 2261–2269 (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V, et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning (2016)
Hong, R., Li, L., Cai, J., Tao, D., Wang, M., Tian, Q.: Coherent semantic-visual indexing for large-scale image retrieval in the Cloud. IEEE Trans. Image Process. 26(9), 4128–4138 (2017)
Hong, R., Zhenzhen, H., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18(8), 1555–1567 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Rui, T., Xiao, F., Tang, J., Zhang, F., Yang, C., Liu, M. (2018). Research on Multitask Deep Learning Network for Semantic Segmentation and Object Detection. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_65
Download citation
DOI: https://doi.org/10.1007/978-3-030-00764-5_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)