Abstract
In recent years, object detection has become a popular direction of computer vision and digital image processing. All the research work in this paper is a two-stage object detection algorithm based on deep learning. First, this paper proposes the Deep_Dilated Convolution Network (D_dNet). That is, by adding the operation of dilated convolution into the backbone network, in this way, not only the number of training parameters can be further reduced, but also the resolution of feature map and the size of receptive field can be improved. Second, the Fully Convolutional Layer (FC) is usually involved in the re-identification process of region proposal in the traditional object detection. This too “thick” network structure will easily lead to reduced detection speed and excessive computation. Therefore, the feature map before training is compressed in this paper to establish a light-weight network. Then, transfer learning method is introduced in training network to optimize the model. The whole experiment is evaluated based on MSCOCO dataset. Experiments show that the accuracy of the proposed model is improved by 1.3 to 2.2% points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
Chen, X., et al.: Microsoft COCO captions: data collection and evaluation server (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks, pp. 379–387 (2016)
Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jiang, W., Ma, L., Chen, X., Zhang, H., Liu, W.: Learning to guide decoding for image captioning (2018)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector (2017)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: design backbone for object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 339–354. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_21
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation, pp. 3431–3440 (2015)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks, pp. 1717–1724 (2014)
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Discov. Data Eng. 22(10), 1345–1359 (2010)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks, pp. 91–99 (2015)
Ross, T.Y.L.P.G., Dollár, G.K.H.P.: Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection (2016)
Steder, B., Rusu, R.B., Konolige, K., Burgard, W.: Point feature extraction on 3D range scans taking into account object boundaries. In: 2011 IEEE International Conference on Robotics and Automation, pp. 2601–2608. IEEE (2011)
Zheng, Y., Li, Z., Zhang, C.: A hybrid architecture based on CNN for cross-modal semantic instance annotation. Multimed. Tools Appl. 77(7), 8695–8710 (2018)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos. 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), Innovation Project of Guangxi Graduate Education (XYCSZ2019068), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Quan, Y., Li, Z., Zhang, F., Zhang, C. (2019). D_dNet-65 R-CNN: Object Detection Model Fusing Deep Dilated Convolutions and Light-Weight Networks. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-29894-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)