skip to main content
10.1145/3573942.3574033acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

Object Detection Algorithm Based on Second-Order Pooling Network and Gaussian Mixture Attention

Authors Info & Claims
Published:16 May 2023Publication History

ABSTRACT

To improve the feature representation ability of the YOLOX algorithm and obtain better detection performance, an object detection algorithm based on second-order pooling network and gaussian mixture attention is proposed. Firstly, the second-order pooling network is added after the PAFPN, and the higher-order statistical information is obtained by calculating the covariance matrix between different channels, to enhance the non-linear modeling capability. Secondly, the mixture attention based on the gaussian function is added after the SPP to model global contexts in the spatial and channel dimensions respectively, which improves the network performance with almost no extra parameters. The experimental results show that the detection accuracy of the proposed algorithm on the PASCAL VOC dataset reaches 82.6 %, which is 1.6 % higher than the YOLOX algorithm.

References

  1. A. Ozcan and O. Cetin, A Novel Fusion Method With Thermal and RGB-D Sensor Data for Human Detection[J]. IEEE Access, 2022, 10: 66831-66843.Google ScholarGoogle ScholarCross RefCross Ref
  2. Gao J, Yang T. Face detection algorithm based on improved TinyYOLOv3 and attention mechanism[J]. Computer Communications, 2022, 181: 329-337.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Qian R, Lai X, Li X. 3D object detection for autonomous driving: a survey[J]. Pattern Recognition, 2022: 108796.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen Keqi, Zhu Zhiliang, Deng Xiaoming, Deep learning for multi-scale object detection: A Survey[J]. Journal of Software, 2021, 32(04):1201-1227Google ScholarGoogle Scholar
  5. Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2): 91-110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Girshick R, Donahue J, Darrell T, Rich feature hierarchies for accurate object detection and semantic segmentation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2014: 580-587Google ScholarGoogle Scholar
  8. Ren S Q, He K M, Girshick R, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pang J, Chen K, Shi J, Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821-830.Google ScholarGoogle Scholar
  10. Redmon J, Divvala S, Girshick R, You only look once: Unified, real-time object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2016: 779-788Google ScholarGoogle Scholar
  11. Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.Google ScholarGoogle Scholar
  12. Redmon J, Farhadi A. Yolov3: An incremental improvement[OL]. [2018.4.8]. https://arxiv.org/abs/1804.02767.pdfGoogle ScholarGoogle Scholar
  13. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.Google ScholarGoogle Scholar
  14. Ge Z, Liu S, Wang F, Yolox: Exceeding yolo series in 2021[OL]. [2021.7.18]. https://arxiv.org/abs/2107.08430.pdfGoogle ScholarGoogle Scholar
  15. Liu W, Anguelov D, Erhan D, SSD: Single shot multibox detector[C] // Proceedings of European Conference on Computer Vision. Heidelberg: Springer, 2016: 21-37Google ScholarGoogle Scholar
  16. Tian Z, Shen C, Chen H, FCOS: Fully convolutional one-stage object detection[C] //Proceedings of the IEEE International Conference on Computer Vision. Los Alamitos: IEEE Computer Society Press, 2019: 9627-9636Google ScholarGoogle Scholar
  17. Zhou Xingyi, Wang Dequan, KRÄHENBÜHL P. Objects as points[OL]. [2019.5.25]. https://arxiv.org/abs/1904.07850.pdfGoogle ScholarGoogle Scholar
  18. Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30Google ScholarGoogle Scholar
  19. Carion N, Massa F, Synnaeve G, End-to-end object detection with transformers[C] // Proceedings of European Conference on Computer Vision. Heidelberg: Springer, 2020: 213-229Google ScholarGoogle Scholar
  20. Dai Z, Cai B, Lin Y, UP-DETR: Unsupervised pre-training for object detection with transformers[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2021: 1601-1610Google ScholarGoogle Scholar
  21. Wang H, Wang Q, Gao M, Multi-scale location-aware kernel representation for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1248-1257.Google ScholarGoogle Scholar
  22. Gao Z, Xie J, Wang Q, Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3024-3033.Google ScholarGoogle Scholar
  23. Chen B, Deng W, Hu J. Mixed high-order attention network for person re-identification[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 371-381.Google ScholarGoogle Scholar
  24. Li P, Xie J, Wang Q, Is second-order information helpful for large-scale visual recognition?[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2070-2078.Google ScholarGoogle Scholar
  25. Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.Google ScholarGoogle Scholar
  26. Woo S, Park J, Lee J Y, Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.Google ScholarGoogle Scholar
  27. Park J, Woo S, Lee J Y, BAM: Bottleneck Attention Module[C]//British Machine Vision Conference (BMVC). British Machine Vision Association (BMVA), 2018.Google ScholarGoogle Scholar
  28. Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.Google ScholarGoogle Scholar
  29. Ruan D, Wang D, Zheng Y, Gaussian Context Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 15129-15138.Google ScholarGoogle Scholar
  30. DAI Jifeng, LI Yi, HE Kaiming, R-FCN: Object detection via region-based fully convolutional networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, SPAIN, 2016: 379–387. doi: 10.5555/3157096.3157139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. LIN T Y, GOYAL P, GIRSHICK R, Focal loss for dense object detection[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.Google ScholarGoogle Scholar
  32. Zhou X, Zhuo J, Krahenbuhl P. Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 850-859.Google ScholarGoogle Scholar

Index Terms

  1. Object Detection Algorithm Based on Second-Order Pooling Network and Gaussian Mixture Attention

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition
      September 2022
      1221 pages
      ISBN:9781450396899
      DOI:10.1145/3573942

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 May 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format