Attentional and adversarial feature mimic for efficient object detection

Wang, Hongxing; Chen, Yuquan; Wu, Mei; Zhang, Xin; Huang, Zheng; Mao, Weiping

doi:10.1007/s00371-021-02363-4

Attentional and adversarial feature mimic for efficient object detection

Original article
Published: 10 January 2022

Volume 39, pages 639–650, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Hongxing Wang¹,
Yuquan Chen¹,
Mei Wu¹,
Xin Zhang¹,
Zheng Huang¹ &
…
Weiping Mao ORCID: orcid.org/0000-0001-7243-0040²

302 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we focus on learning efficient object detectors by knowledge (or network) distillation. More specifically, we mimic features from deeper and larger teacher networks to help train better efficient student networks. Unlike the previous method that mimics features through minimizing an L2 loss between feature generated by teacher and student networks, we propose an attentional and adversarial feature mimic (AAFM) method which consists of an attentional feature mimic module and an adversarial feature mimic module, where the former module uses an attentional L2 loss which learns to pay attention to important object-related regions for feature mimic, and the latter module uses an adversarial loss which makes features generated by teacher and student networks have similar distributions. We apply our AAFM method in the two-stage Faster R-CNN detector. Experiments on the PASCAL VOC 2007 and COCO datasets show that our method consistently improves the performance of detectors without feature mimic or with other feature mimic methods. In particular, our method obtains 72.1% mAP on the PASCAL VOC 2007 dataset using the ResNet-18-based detector.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

Imitating What You Need: An Adaptive Framework for Detector Distillation

Prediction-Guided Distillation for Dense Object Detection

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. arXiv preprint arXiv:2005.12872 (2020)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NIPS, pp. 742–751 (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Patttern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NIPS, pp. 577–585 (2015)
Dai, X., Jiang, Z., Wu, Z., Bao, Y., Wang, Z., Liu, S., Zhou, E.: General instance distillation for object detection. arXiv preprint arXiv:2103.02340 (2021)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Article Google Scholar
Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2021)
Article Google Scholar
Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A. L.: Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In: CVPR (2019)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Guo, J., Han, K., Wang, Y., Wu, H., Chen, X., Xu, C., Xu, C.: Distilling object detectors via decoupled features. arXiv preprint arXiv:2103.14475 (2021)
Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: CVPR, pp. 2827–2836 (2016)
Hara, K., Liu, M. Y., Tuzel, O., Farahmand, A.: Attentional network for visual object detection. arXiv preprint arXiv:1702.01478 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: ICCV, pp. 1398–1406 (2017)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Kim, K. H., Hong, S., Roh, B., Cheon, Y., Park, M.: Pvanet: deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., Yan, S.: Perceptual generative adversarial networks for small object detection. In: CVPR, pp. 1951–1959 (2017)
Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: CVPR, pp. 7341–7349 (2017)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector. arXiv preprint arXiv:1711.07264 (2017)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV, pp. 740–755 (2014)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: TPAMI (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Mehta, R., Ozturk, C.: Object detection at 200 frames per second. arXiv preprint arXiv:1805.06361 (2018)
Mittal, D., Bhardwaj, S., Khapra, M. M., Ravindran, B.: Recovering from random pruning: on the plasticity of deep convolutional neural networks. In: WACV, pp. 848–857 (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML, pp. 807–814 (2010)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: NIPS, pp. 2440–2448 (2015)
Sun, P., Jiang, Y., Xie, E., Yuan, Z., Wang, C., Luo, P.: Onenet: towards end-to-end one-stage object detection. arXiv preprint arXiv:2012.05780 (2020)
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., Luo, P.: Sparse r-cnn: end-to-end object detection with learnable proposals. arXiv preprint arXiv:2011.12450 (2020)
Sun, R., Tang, F., Zhang, X., Xiong, H., Tian, Q.: Distilling object detectors with task adaptive regularization. arXiv preprint arXiv:2006.13108 (2020)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by short and long range object linking. arXiv preprint arXiv:1801.09823 (2018)
Tang, P., Wang, X., Bai, S., Shen, W., Bai, X., Liu, W., Yuille, A.L.: Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 176–191 (2018)
Article Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Tsai, Y.H, Hung, W. C., Schulter, S., Sohn, K., Yang, M. H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. arXiv preprint arXiv:1802.10349 (2018)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-yolov4: scaling cross stage partial network. arXiv preprint arXiv:2011.08036 (2020)
Wang, J., Song, L., Li, Z., Sun, H., Sun, J., Zheng, N.: End-to-end object detection with fully convolutional network. arXiv preprint arXiv:2012.03544 (2021)
Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4933–4942 (2019)
Wang, X., Chen, K., Huang, Z., Yao, C., Liu, W.: Point linking network for object detection. arXiv preprint arXiv:1706.03646 (2017)
Wang, Y., Xu, C., Xu, C., Tao, D.: Adversarial learning of portable student networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Wei, L., Cui, W., Hu, Z., Sun, H., Hou, S.: A single-shot multi-level feature reused neural network for object detection. Vis. Comput. 37(1), 133–142 (2021)
Article Google Scholar
Wei, Y., Pan, X., Qin, H., Yan, J.: Quantization mimic: towards very tiny cnn for object detection. In: ECCV, pp. 274–290 (2018)
Xu, Z., Hsu, Y.C., Huang, J.: Training student networks for acceleration with conditional adversarial networks. In: BMVC British Machine Vision Association (2018)
Yang, T. J., Howard, A., Chen, B., Zhang, X., Go, A., Sandler, M., Sze, V., Adam, H.: Netadapt: Platform-aware neural network adaptation for mobile applications. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 285–300 (2018)
Yao, Z., Ai, J., Li, B., Zhang, C.: Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 (2021)
Yu, J., Huang, T.S.: Universally slimmable networks and improved training techniques. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1803–1811 (2019)
Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
Article MathSciNet MATH Google Scholar
Zhang, K., Luo, W., Zhong, Y., Ma, L., Liu, W., Li, H.: Adversarial spatio-temporal learning for video deblurring. IEEE Trans. Image Process. 28(1), 291–301 (2019)
Article MathSciNet Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2021)
Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention couplenet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2019)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Jiangsu Frontier Electric Power Technology Co., Ltd., Nanjing, 210036, China
Hongxing Wang, Yuquan Chen, Mei Wu, Xin Zhang & Zheng Huang
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
Weiping Mao

Authors

Hongxing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuquan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiping Mao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Chen, Y., Wu, M. et al. Attentional and adversarial feature mimic for efficient object detection. Vis Comput 39, 639–650 (2023). https://doi.org/10.1007/s00371-021-02363-4

Download citation

Accepted: 12 November 2021
Published: 10 January 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00371-021-02363-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attentional and adversarial feature mimic for efficient object detection

Abstract

Access this article

Similar content being viewed by others

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

Imitating What You Need: An Adaptive Framework for Detector Distillation

Prediction-Guided Distillation for Dense Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attentional and adversarial feature mimic for efficient object detection

Abstract

Access this article

Similar content being viewed by others

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

Imitating What You Need: An Adaptive Framework for Detector Distillation

Prediction-Guided Distillation for Dense Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation