GCANet: A Cross-Modal Pedestrian Detection Method Based on Gaussian Cross Attention Network

Peng, Peiran; Mu, Feng; Yan, Peilin; Song, Liqiang; Li, Hui; Chen, Yu; Li, Jianan; Xu, Tingfa

doi:10.1007/978-3-031-10464-0_35

GCANet: A Cross-Modal Pedestrian Detection Method Based on Gaussian Cross Attention Network

Peiran Peng¹⁰,
Feng Mu¹⁰,
Peilin Yan¹⁰,
Liqiang Song¹¹,
Hui Li¹¹,
Yu Chen¹²,
Jianan Li¹⁰ &
…
Tingfa Xu¹⁰

Conference paper
First Online: 07 July 2022

969 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 507))

Abstract

Pedestrian detection is a critical but challenging research field widely applicable in self-driving, surveillance and robotics. The performance of pedestrian detection is not ideal under the limitation of imaging conditions, especially at night or occlusion. To overcome these obstacles, we propose a cross-modal pedestrian detection network based on Gaussian Cross Attention (GCANet) improving the detection performance by a full use of multi-modal features. Through the bidirectional coupling of local features of different modals, the feature interaction and fusion between different modals are realized, and the salient features between multi-modal are effectively emphasized, thus improving the detection accuracy. Experimental results demonstrate GCANet achieves the highest accuracy with the state-of-the-art on KAIST multi-modal pedestrian dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cai, Z., Saberian, M., Vasconcelos, N.: Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3361–3369 (2015)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Article Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Article Google Scholar
Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fus. 50, 148–157 (2019). https://doi.org/10.1016/j.inffus.2018.11.017
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Hwang, S., Park, J., Kim, N., Choi, Y., Kweon, I.S.: Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Google Scholar
Kim, M., Joung, S., Park, K., Kim, S., Sohn, K.: Unpaired cross-spectral pedestrian detection via adversarial feature learning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1650–1654 (2019)
Google Scholar
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 49–56 (2017)
Google Scholar
Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)
Article Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)
Article Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008 (2017)
Google Scholar
Vs, V., Valanarasu, J.M.J., Oza, P., Patel, V.M.: Image fusion transformer. arXiv preprint arXiv:2107.09011 (2021)
Wagner, J., Fischer, V., Herman, M., Behnke, S.: Multispectral pedestrian detection using deep fusion convolutional neural networks. In: ESANN, vol. 587, pp. 509–514 (2016)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Yang, Z., Dan, T., Yang, Y.: Multi-temporal remote sensing image registration using deep convolutional features. IEEE Access 6, 38544–38555 (2018)
Article Google Scholar
Zhu, B., et al.: AutoAssign: differentiable label assignment for dense object detection. arXiv preprint arXiv:2007.03496 (2020)

Download references

Acknowledgments

This research is supported under Grant 202020429036.

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Peiran Peng, Feng Mu, Peilin Yan, Jianan Li & Tingfa Xu
National Astronomical Observatories, CAS, Beijing, China
Liqiang Song & Hui Li
Science and Technology on Near-Surface Detection Laboratory, Wuxi, 214035, China
Yu Chen

Authors

Peiran Peng
View author publications
You can also search for this author in PubMed Google Scholar
Feng Mu
View author publications
You can also search for this author in PubMed Google Scholar
Peilin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Song
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianan Li
View author publications
You can also search for this author in PubMed Google Scholar
Tingfa Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tingfa Xu .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, P. et al. (2022). GCANet: A Cross-Modal Pedestrian Detection Method Based on Gaussian Cross Attention Network. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 507. Springer, Cham. https://doi.org/10.1007/978-3-031-10464-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-10464-0_35
Published: 07 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10463-3
Online ISBN: 978-3-031-10464-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics