ABSTRACT
To improve the feature representation ability of the YOLOX algorithm and obtain better detection performance, an object detection algorithm based on second-order pooling network and gaussian mixture attention is proposed. Firstly, the second-order pooling network is added after the PAFPN, and the higher-order statistical information is obtained by calculating the covariance matrix between different channels, to enhance the non-linear modeling capability. Secondly, the mixture attention based on the gaussian function is added after the SPP to model global contexts in the spatial and channel dimensions respectively, which improves the network performance with almost no extra parameters. The experimental results show that the detection accuracy of the proposed algorithm on the PASCAL VOC dataset reaches 82.6 %, which is 1.6 % higher than the YOLOX algorithm.
- A. Ozcan and O. Cetin, A Novel Fusion Method With Thermal and RGB-D Sensor Data for Human Detection[J]. IEEE Access, 2022, 10: 66831-66843.Google ScholarCross Ref
- Gao J, Yang T. Face detection algorithm based on improved TinyYOLOv3 and attention mechanism[J]. Computer Communications, 2022, 181: 329-337.Google ScholarDigital Library
- Qian R, Lai X, Li X. 3D object detection for autonomous driving: a survey[J]. Pattern Recognition, 2022: 108796.Google ScholarDigital Library
- Chen Keqi, Zhu Zhiliang, Deng Xiaoming, Deep learning for multi-scale object detection: A Survey[J]. Journal of Software, 2021, 32(04):1201-1227Google Scholar
- Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2): 91-110.Google ScholarDigital Library
- Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.Google ScholarDigital Library
- Girshick R, Donahue J, Darrell T, Rich feature hierarchies for accurate object detection and semantic segmentation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2014: 580-587Google Scholar
- Ren S Q, He K M, Girshick R, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149Google ScholarDigital Library
- Pang J, Chen K, Shi J, Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821-830.Google Scholar
- Redmon J, Divvala S, Girshick R, You only look once: Unified, real-time object detection[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2016: 779-788Google Scholar
- Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271.Google Scholar
- Redmon J, Farhadi A. Yolov3: An incremental improvement[OL]. [2018.4.8]. https://arxiv.org/abs/1804.02767.pdfGoogle Scholar
- Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.Google Scholar
- Ge Z, Liu S, Wang F, Yolox: Exceeding yolo series in 2021[OL]. [2021.7.18]. https://arxiv.org/abs/2107.08430.pdfGoogle Scholar
- Liu W, Anguelov D, Erhan D, SSD: Single shot multibox detector[C] // Proceedings of European Conference on Computer Vision. Heidelberg: Springer, 2016: 21-37Google Scholar
- Tian Z, Shen C, Chen H, FCOS: Fully convolutional one-stage object detection[C] //Proceedings of the IEEE International Conference on Computer Vision. Los Alamitos: IEEE Computer Society Press, 2019: 9627-9636Google Scholar
- Zhou Xingyi, Wang Dequan, KRÄHENBÜHL P. Objects as points[OL]. [2019.5.25]. https://arxiv.org/abs/1904.07850.pdfGoogle Scholar
- Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30Google Scholar
- Carion N, Massa F, Synnaeve G, End-to-end object detection with transformers[C] // Proceedings of European Conference on Computer Vision. Heidelberg: Springer, 2020: 213-229Google Scholar
- Dai Z, Cai B, Lin Y, UP-DETR: Unsupervised pre-training for object detection with transformers[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2021: 1601-1610Google Scholar
- Wang H, Wang Q, Gao M, Multi-scale location-aware kernel representation for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1248-1257.Google Scholar
- Gao Z, Xie J, Wang Q, Global second-order pooling convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3024-3033.Google Scholar
- Chen B, Deng W, Hu J. Mixed high-order attention network for person re-identification[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 371-381.Google Scholar
- Li P, Xie J, Wang Q, Is second-order information helpful for large-scale visual recognition?[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2070-2078.Google Scholar
- Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.Google Scholar
- Woo S, Park J, Lee J Y, Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.Google Scholar
- Park J, Woo S, Lee J Y, BAM: Bottleneck Attention Module[C]//British Machine Vision Conference (BMVC). British Machine Vision Association (BMVA), 2018.Google Scholar
- Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13713-13722.Google Scholar
- Ruan D, Wang D, Zheng Y, Gaussian Context Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 15129-15138.Google Scholar
- DAI Jifeng, LI Yi, HE Kaiming, R-FCN: Object detection via region-based fully convolutional networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, SPAIN, 2016: 379–387. doi: 10.5555/3157096.3157139.Google ScholarDigital Library
- LIN T Y, GOYAL P, GIRSHICK R, Focal loss for dense object detection[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.Google Scholar
- Zhou X, Zhuo J, Krahenbuhl P. Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 850-859.Google Scholar
Index Terms
- Object Detection Algorithm Based on Second-Order Pooling Network and Gaussian Mixture Attention
Recommendations
A BYY Split-and-Merge EM Algorithm for Gaussian Mixture Learning
ISNN '08: Proceedings of the 5th international symposium on Neural Networks: Advances in Neural NetworksGaussian mixture is a powerful statistic tool and has been widely used in the fields of information processing and data analysis. However, its model selection, i.e., the selection of number of Gaussians in the mixture, is still a difficult problem. ...
PARAFAC-Based Blind Identification of Underdetermined Mixtures Using Gaussian Mixture Model
This paper presents a novel algorithm, named GMM-PARAFAC, for blind identification of underdetermined instantaneous linear mixtures. The GMM-PARAFAC algorithm uses Gaussian mixture model (GMM) to model non-Gaussianity of the independent sources. We show ...
Gaussian mixture density modeling, decomposition, and applications
We present a new approach to the modeling and decomposition of Gaussian mixtures by using robust statistical methods. The mixture distribution is viewed as a contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose ...
Comments