Abstract
Target detection based on deep convolutional neural network has achieved excellent performance. However, small target detection is still one of the challenges in the field of computer vision. In this paper, we present an efficient network for real-time small target detection. The proposed network performs feature extraction using a modified Darknet53, while utilizing scale matching strategy to select suitable scales and anchor size for small target detection. In the network, we design an adaptive receptive field fusion module to increase the context information around the small targets by merging the features with different receptive field. Furthermore, we also propose an image cropping method in data preprocessing, aiming to make the targets trained in a wider range of scales. We conduct experiments on VEDAI dataset and small target dataset. Comparative results show that the proposed network achieved 74.5% mean average precision (mAP) at 50.0 FPS on VEDAI dataset and 45.7% mAP at 51.1 FPS on small target dataset which is better than other advanced target detectors.
Similar content being viewed by others
References
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. 35(1), 84–100 (2018)
Li, Z., Chen, Z., Wu, Q.M.J., et al.: Real-time pedestrian detection with deep supervision in the wild. Signal Image Video Process. 13, 761–769 (2019)
Nguyen-Meidine, L. T., Granger, E., Kiran, M. Blais-Morin, L. A.: A comparison of CNN-based face and head detectors for real-time video surveillance applications. arXiv preprint https://arxiv.org/abs/1809.03336 (2018).
Chen, X., Ma, H., Wan, J., Li, B., Xia, T. Multi-view 3D object detection network for autonomous driving. arXiv preprint https://arxiv.org/abs/1611.07759 (2016).
Kwan, C., Chou, B., Yang, J., Yang, j., Rangamani, A. Etienne-Cummings, R. Target tracking and classification using compressive measurements of MWIR and LWIR coded aperture cameras. Journal of Signal and Information Processing. pp. 73–95, (2019).
Kwan, C., Gribben, D., Tran, T. Multiple Human Objects Tracking and Classification Directly in Compressive Measurement Domain for Long Range Infrared Videos, IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference. (2019)
Kwan, C., Gribben, D., Chou, B., Budavari, B.: Real-Time and Deep Learning Based Vehicle Detection and Classification Using Pixel-Wise Code Exposure Measurements. Electronics 9(6), 1014 (2020)
Lowe, D.: Distinctive image features from scale-invariant key points. Int. J. Comput. Vision 60(2), 91–110 (2004)
Ojala, T., Pietik¨ainen, M., Maenp¨a¨a, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE TPAMI, 24(7), 971–987 (2002).
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 886–893 (2005).
Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. in Proc. NIPS, 1097–1105, (2012).
Girshick, R. Fast R-CNN. in Proc. IEEE Int. Conf. Comput. Vis. pp.1440–1448 (2015).
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Liu, W. et al.: SSD: Single shot multibox detector. in Computer Vision ECCV. pp. 21–37 (2016).
Redmon, J., Farhadi, A.: YOLO 9000: Better, faster, stronger. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 6517–6525 (2017).
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp.779–788 (2016).
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint https://arxiv.org/abs/1804.02767 (2018).
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 580–587 (2014).
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 1–1 (2018).
Van, d S K E A., Uijlings, J R R., Gevers, T., et al.: Segmentation as selective search for object recognition, in Proceedings of the 2011 International Conference on Computer Vision, pp.1879–1886 (2011).
Zhou, X., Wang, D., Philipp, K.: Objects as points. arXiv preprint https://arxiv.org/abs/1904.07850 (2019).
Law, H., Deng, J.: Cornernet: detecting objects as paired key points. In: Proceedings of European Conference on Computer Vision, pp. 765–781 (2018).
Fu, C.-Y., W. Liu, A., Ranga, A., Tyagi, A., Berg, C.: DSSD: Deconvolutional single shot detector. arXiv preprint https://arxiv.org/abs/1701.06659 (2017).
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S. Z.: Single-shot refinement neural network for object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212 (2018).
Chen, C., Liu, M., Tuzel, O., Xiao, J.: “R-cnn for small object detection. Asian conference on computer vision (2017).
Eggert, C., Zecha, D., Brehm, S., Lienhart, R.: Improving Small Object Proposals for Company Logo Detection. in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 167–174 (2017).
Hu, P., Ramanan, D.: Finding tiny faces. arXiv preprint https://arxiv.org/abs/1612.04402 (2017).
Krishna, H., Jawahar, C.V.: Improving small object detection, in Asian conference on pattern recognition (2017).
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., Li, S. Z.: S^3FD: Single shot scale-invariant face detector, in 2017 IEEE International Conference on Computer Vision, pp. 192–201 (2017).
Yu, F., and Koltun, V.: Multi-scale context aggregation by dilated convolutions. [Online]. Available: https://arxiv.org/abs/1511.07122. (2015).
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, in IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2999–3007 (2017).
Razakarivony, S., Jurie, F.: Vehicle detection in aerial imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 34, 187–203 (2016)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. [Online]. Available: https://arxiv.org/abs/1608.03983, (2016).
Terrail, J. O. du, Jurie, F.: Faster RER-CNN: Application to the detection of vehicles in aerial images. arXiv preprint https://arxiv.org/abs/1809.07628 (2018).
Zhang, Z., Liu, Y., Liu, T., Lin, Z., Wang, S.: DAGN: A real-time UAV remote sensing image vehicle detection framework, in IEEE Geoscience and Remote Sensing Letters (2019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ju, M., Luo, J., Liu, G. et al. A real-time small target detection network. SIViP 15, 1265–1273 (2021). https://doi.org/10.1007/s11760-021-01857-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01857-x