Feature Enhancement and Reconstruction for Small Object Detection

Zhang, Chong-Jian; Chen, Song-Lu; Liu, Qi; Huang, Zhi-Yong; Chen, Feng; Yin, Xu-Cheng

doi:10.1007/978-3-031-27077-2_2

Chong-Jian Zhang^15,16,
Song-Lu Chen^15,16,
Qi Liu^15,16,
Zhi-Yong Huang^15,16,
Feng Chen^16,17 &
…
Xu-Cheng Yin^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

International Conference on Multimedia Modeling

1403 Accesses

Abstract

Due to the small size and noise interference, small object detection is still a challenging task. The previous work can not effectively reduce noise interference and extract representative features of the small object. Although the upsampling network can alleviate the loss of features by enlarging feature maps, it can not enhance semantics and will introduce more noises. To solve the above problems, we propose CAU (Content-Aware Upsampling) to enhance feature representation and semantics of the small object. Moreover, we propose CSA (Content-Shuffle Attention) to reconstruct robust features and reduce noise interference using feature shuffling and attention. Extensive experiments verify that our proposed method can improve small object detection by 2.2% on the traffic sign dataset TT-100K and 0.8% on the object detection dataset MS COCO compared with the baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Please refer to https://github.com/ultralytics/yolov5. For TT-100K, we use the large YOLOv5l6 as the baseline model. For MS COCO, we use the small YOLOv5s as the baseline model. Except for CSA and CAU, our model is the same as the official model. The parameters of YOLOv5l6 are about 10\(\times \) of YOLOv5s.

References

Bochkovskiy, A., Wang, C.-Y., Mark Liao, H.-Y.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)
Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Dai, J., et al.: Deformable convolutional networks. In: ICCV, pp. 764–773 (2017)
Google Scholar
Deng, C., Wang, M., Liu, L., Liu, Y., Jiang, Y.: Extended feature pyramid network for small object detection. IEEE Trans. Multimedia 24, 1968–1979 (2021)
Article Google Scholar
Fang, P., Zhou, J., Kumar Roy, S., Petersson, L., Harandi, M.: Bilinear attention networks for person retrieval. In: ICCV, pp. 8029–8038 (2019)
Google Scholar
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: CVPR, pp. 3024–3033 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Google Scholar
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Chapter Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR, pp. 8759–8768 (2018)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767, 2018
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: CVPR, pp. 1874–1883 (2016)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: CVPR, pp. 10778–10787 (2020)
Google Scholar
Tian, Z., He, T., Shen, C., Yan, Y.: Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: CVPR, pp. 3126–3135 (2019)
Google Scholar
Tychsen-Smith, L., Petersson, L.: Denet: scalable real-time object detection with directed sparse sampling. In: ICCV, pp. 428–436 (2017)
Google Scholar
Wang, J., et al.: CARAFE: content-aware reassembly of features. In: ICCV, pp. 3007–3016 (2019)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: CVPR, pp. 11531–11539 (2020)
Google Scholar
Ying, X., et al.: Multi-attention object detection model in remote sensing images based on multi-scale. IEEE Access 7, 94508–94519 (2019)
Article Google Scholar
Zhang, Q.-L., Yang, Y.-B.: SA-Net: shuffle attention for deep convolutional neural networks. In: ICASSP, pp. 2235–2239 (2021)
Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR, pp. 4203–4212 (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2017)
Google Scholar
Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., Lu, H.: CoupleNet: coupling global structure with local parts for object detection. In: ICCV, pp. 4146–4154 (2017)
Google Scholar
Zhu, Z., Liang, D., Zhang, S.-H., Huang, X., Li, B., Hu, S.-M.: Traffic-sign detection and classification in the wild. In: CVPR, pp. 2110–2118 (2016)
Google Scholar

Download references

Acknowledgment

The research is supported by National Key Research and Development Program of China (2020AAA0109701), National Natural Science Foundation of China (62076024, 62006018).

Author information

Authors and Affiliations

University of Science and Technology Beijing, Beijing, 100083, China
Chong-Jian Zhang, Song-Lu Chen, Qi Liu, Zhi-Yong Huang & Xu-Cheng Yin
USTB-EEasyTech Joint Lab of Artificial Intelligence, Beijing, 100083, China
Chong-Jian Zhang, Song-Lu Chen, Qi Liu, Zhi-Yong Huang, Feng Chen & Xu-Cheng Yin
EEasy Technology Company Ltd., Zhuhai, 519000, China
Feng Chen

Authors

Chong-Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Song-Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Yong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu-Cheng Yin .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, CJ., Chen, SL., Liu, Q., Huang, ZY., Chen, F., Yin, XC. (2023). Feature Enhancement and Reconstruction for Small Object Detection. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-27077-2_2
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27076-5
Online ISBN: 978-3-031-27077-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Enhancement and Reconstruction for Small Object Detection