A single-shot multi-level feature reused neural network for object detection

Wei, Lixin; Cui, Wei; Hu, Ziyu; Sun, Hao; Hou, Shijie

doi:10.1007/s00371-019-01787-3

A single-shot multi-level feature reused neural network for object detection

Original Article
Published: 03 January 2020

Volume 37, pages 133–142, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Lixin Wei¹,
Wei Cui¹,
Ziyu Hu¹,
Hao Sun¹ &
…
Shijie Hou¹

1300 Accesses
23 Citations
15 Altmetric
Explore all metrics

Abstract

Recent years have witnessed the significant progress in object detection using deep convolutional neutral networks. However, there are few object detectors achieving high precision with low computational cost. In this paper, a novel and lightweight framework named multi-level feature reused detector (MFRDet) is proposed, which can reach a better accuracy than two-stage methods. It also can maintain comparable high efficiency of one-stage methods without employing very deep convolution neural networks as most modern detectors do. The proposed framework is suitable for reusing information included in deep and shallow feature maps, by which property the detection precision can be higher. For the Pascal VOC2007 test set trained with VOC 2007 and VOC 2012 training sets, the proposed MFRDet with the input size of 300 \(\times \) 300 can achieve 80.7% mAP at the speed of 62.5 FPS. As for a high-resolution input version, MFRDet can obtain 82.0% mAP with the speed of 37.0 FPS using single Nvidia Tesla P100 GPU. The proposed framework shows the state-of-the-art mAP with high FPS, which is better than most of other modern object detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Abdellatef, E., Ismail, N.A., Elrahman, S.E.S.E.A., Ismail, K.N., Rihan, M., Elsamie, F.E.A.: Cancelable multi-biometric recognition system based on deep learning. Vis. Comput. 1–13 (2019)
Abhinav, S., Abhinav, G.: Contextual priming and feedback for faster r-cnn. In: European Conference on Computer Vision (ECCV), pp. 330–348 (2016)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)
Article Google Scholar
Bansal, A., Sikka, K., Sharma, G., Chellappa, R., Divakaran, A.: Zero-shot object detection. In: Proceedings paper of ECCV, pp. 397–414 (2018)
Changpinyo, S., Chao, W., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5327–5336 (2016)
Cheng Yang, F., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1937–1945 (2017)
Demirel, B., Cinbis, R.G., Ikizlercinbis, N.: Zero-shot object detection by hybrid region embedding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Feifei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)
Article Google Scholar
Guosen, X., Li, L., Xiaobo, J., Fan, Z., Zheng, Z., Jie, Q., Yazhou, Y., Ling, S.: Attentive region embedding network for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Hanling, Z., Min, X., Liyuan, Z., Havyarimana, V.: A novel optimization framework for salient object detection. Vis. Comput. 32(1), 31–41 (2016)
Article Google Scholar
Hao, G., Baozhong, C.: How do deep convolutional features affect tracking performance: an experimental study. Vis. Comput. 34(12), 1701–1711 (2018)
Article Google Scholar
Heng, F., Haibin, L.: Sanet: Structure-aware network for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Hong, T., Yao, A., Yurong, C., Sun, F.: Hypernet: Towards accurate region proposal generation and joint object detection. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 845–853 (2016)
Howard, A., Menglong, Z., Bo, C., Kalenichenko, D., Weijun, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Hyungtae, L., Sungmin, E., Heesung, K.: Me r-cnn: multi-expert region-based cnn for object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Jiahui, C., Jianguo, H.: 3d rans: 3d residual attention networks for action recognition. Vis. Comput. pp. 1–10 (2019)
Jifeng, D., Li, Y., Kaiming, H., Jian, S.: R-fcn: Object detection via region-based fully convolutional networks. In: Conference on Neural Information Processing Systems (NIPS), pp. 379–387 (2016)
Jifeng, D., Yi, L., Kaiming, H., Jian, S.: R-fcn: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 379–387 (2016)
Jinhui, T., Lu, J., Zechao, L., Shenghua, G.: Rgb-d object recognition by incorporating latent data structure and prior knowledge. IEEE Trans. Multimed. 17, 1899–1908 (2015)
Article Google Scholar
Jinhui, T., Xiangbo, S., Zechao, L., Guojun, Q., Jingdong, W.: Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Multimed. 12(4), 68 (2016)
Google Scholar
Jisoo, J., Hyojin, P., Nojun, K.: Enhancement of ssd by concatenating feature maps for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Joseph, R., Ali, F.: Yolov3: An incremental improvement. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Joseph, R., Santosh Kumar, D., Ross B, G., Ali, F.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Kaiming, H., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: The IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Kaiming, H., Xiangyu, Z., Shaoqing, S., Jian, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Li, N., Wen, L., Dong, X.: Visual recognition by learning from web data: a weakly supervised domain generalization approach pp. 2774–2783 (2015)
LiTao, L., Mao, Y., Jian, D.: Discriminative Hough context model for object detection. Vis. Comput. 30(1), 59–69 (2014)
Article Google Scholar
Mark, E., John, W.: The pascal visual object classes challenge 2007 (voc2007) development kit. Int. J. Comput. Vision 111(1), 98–136 (2006)
Google Scholar
Navaneeth, B., Bharat, S., Rama, C., Larry S, D.: Soft-nms improving object detection with one line of code. In: IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)
Rahman, S., Khan, S.H., Porikli, F.: Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
Romeraparedes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. In: International Conference on MachineLearning (ICML)
Ross B, G., Jeff, D., Trevor, D., Jitendra, M.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Schneider, L., Jasch, M., Frohlich, B., Weber, T., Franke, U., Pollefeys, M., Ratsch, M.: Multimodal neural networks: Rgb-d for semantic segmentation and object detection, pp. 98–109 (2017)
Sean, B., C Lawrence, Z., Kavita, B., Ross B, G.: Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR) (2014)
Shaoqing, R., Kaiming, H., Girshick, R., Jian, S.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Shengjing, T., Shuwei, S., Guoqiang, T., Xiuping, L., Baocai, Y.: End-to-end deep metric network for visual tracking. Vis. Comput. pp. 1–14 (2019)
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection - snip. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3578–3587 (2018)
Songtao, L., Di, H., Yunhong, W.: Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision (ECCV) (2018)
Spyros, G., Nikos, K.: Object detection via a multi-region and semantic segmentation-aware cnn model. In: IEEE International Conference on Computer Vision (ICCV), pp. 1134–1142 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Association for the Advance of Artificial Intelligence (AAAI) (2017)
Tao, K., Fuchun, S., Anbang Yaoand Huaping, L., Ming, L., Yurong, C.: Ron: Reverse connection with objectness prior networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5244–5252 (2017)
Tsung-Y, L., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (CVPR)
Tsung-Yi, L., Dollar, P., Girshick, R., Kaiming, H., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)
Tsung-Yi, L., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV) (2014)
Wei, L., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37. Springer (2016)
Xiangbo, S., Guojun, Q., Jinhui, T., Jingdong, W.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: ACM Multimedia, pp. 35–44 (2015)
Yazhou, Y., Fumin, S., Jian, Z., Li, L., Zhenmin, T., Ling, S.: Extracting privileged information for enhancing classifier learning. IEEE Trans. Image Process. 28(1), 436–450 (2019)
Article MathSciNet Google Scholar
Yazhou, Y., Jian, Z., Fumin, S., Li, L., Fan, Z., Dongxiang, Z., Hengtao, S.: Towards automatic construction of diverse, high-quality image dataset. IEEE Trans. Knowl. Data Eng. (2019)
Yazhou, Y., Jian, Z., Fumin, S., Xiansheng, H., Jingsong, X., Zhenmin, T.: Exploiting web images for dataset construction: a domain robust approach. IEEE Trans. Multimed. 19(8), 1771–1784 (2017)
Article Google Scholar
Yazhou, Y., Jian, Z., Fumin, S., Xiansheng, H., Jingsong, X., ZhenminT, T.: Automatic image dataset construction with multiple textual metadata, pp. 1–6 (2016)
Zhang, L., Dai, J., Lu, H., He, Y., Wang, G.: A bi-directional message passing model for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1741–1750 (2018)
Zhaowei, C., Quanfu, F., Rogerio, S., FerisNuno, V.: A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision (ECCV), pp. 354–370 (2016)
ZhaoYue, Z., Yuanjun, X., Dahua, L.: Recognize actions by disentangling components of dynamics, pp. 6566–6575 (2018)
Zhiqiang, S., Zhuang, L., Jianguo, L., Yugang, J., Yurong, C., Xiangyang, X.: Dsod: Learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945 (2017)
Zhou, P., Bingbing, N., Cong, G., Jianguo, H., Yi, X.: Scale-transferrable object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 528–537 (2018)
Zhun, Z., Liang, Z., Guoliang, K., Shaozi, L., Yi, Y.: Random erasing data augmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Ziming, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

Download references

Acknowledgements

This research is supported by the Youth Foundation of Hebei Province (CN) No. E2018203162.

Author information

Authors and Affiliations

Institute of Electrical Engineering, Yanshan University, Qinhuangdao, 066004, China
Lixin Wei, Wei Cui, Ziyu Hu, Hao Sun & Shijie Hou

Authors

Lixin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wei Cui
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Cui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, L., Cui, W., Hu, Z. et al. A single-shot multi-level feature reused neural network for object detection. Vis Comput 37, 133–142 (2021). https://doi.org/10.1007/s00371-019-01787-3

Download citation

Published: 03 January 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00371-019-01787-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A single-shot multi-level feature reused neural network for object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation