Airport Boarding Bridge Pedestrian Detection Based on Spatial Attention and Joint Crowd Density Estimation

Han, Xu; Wan, Hao; Tang, Wenxiao; Kang, Wenxiong

doi:10.1007/978-981-99-9119-8_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14474))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

282 Accesses

Abstract

Pedestrian detection serves as the cornerstone of pedestrian tracking and re-identification, playing a pivotal role in the realm of intelligent transportation. Accurate identification of pedestrians with diverse identities, such as passengers, crew members, and cleaning staff, is of utmost importance in high-security-demand scenarios like airport boarding bridges. The varied poses of pedestrians, occlusions, and small appearance differences pose significant challenges for accurately detecting individuals with different identities in boarding bridge scenarios. Existing object detectors exhibit limited prowess in extracting discriminative features tailored specifically for pedestrians, hampering their ability to fulfill the requirements of precise localization and classification. In this paper, we propose a method based on spatial attention and joint crowd density estimation. By incorporating spatial attention, our network selectively focuses on salient regions corresponding to different pedestrian categories, thereby enhancing classification accuracy. Moreover, through introducing an auxiliary task of crowd density estimation, the supervision of pedestrian head position information is added to the network. This significantly alleviates the missed detection problems caused by perspective distortion and occlusion, leading to significant improvements in detection accuracy. In our study, we use YOLO as the baseline model. The improved model shows a 5.81% increase in mAP and significantly outperforms several common object detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)
Google Scholar
Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Relational learning for joint head and human detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10647–10654 (2020)
Google Scholar
Chu, J., Guo, Z., Leng, L.: Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6, 19959–19967 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Vedaldi, A.: Gather-excite: exploiting feature context in convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Jha, S., Seo, C., Yang, E., Joshi, G.P.: Real time object detection and tracking system for video surveillance system. Multimedia Tools Appl. 80, 3981–3996 (2021)
Article Google Scholar
Jocher, G., et al.: ultralytics/YOLOv5: V6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations, August 2022. https://doi.org/10.5281/zenodo.7002879
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)
Google Scholar
Lee, H., Kim, H.E., Nam, H.: SRM: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1854–1862 (2019)
Google Scholar
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3958–3967 (2019)
Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, G., Nouaze, J.C., Touko Mbouembe, P.L., Kim, J.H.: YOLO-Tomato: a robust algorithm for tomato detection based on YOLOv3. Sensors 20(7), 2145 (2020)
Article Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Maji, D., Nagori, S., Mathew, M., Poddar, D.: YOLO-Pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2637–2646 (2022)
Google Scholar
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Sasagawa, Y., Nagahara, H.: YOLO in the dark - domain adaptation method for merging multiple models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 345–359. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_21
Chapter Google Scholar
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: an Euler-Region-Proposal for real-time 3D object detection on point clouds. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 197–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_11
Chapter Google Scholar
Song, Q., et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3365–3374 (2021)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Tan, Z., Wang, J., Sun, X., Lin, M., Li, H., et al.: GiraffeDet: a heavy-neck paradigm for object detection. In: International Conference on Learning Representations (2021)
Google Scholar
Tang, W., Liu, K., Shakeel, M.S., Wang, H., Kang, W.: DDAD: detachable crowd density estimation assisted pedestrian detection. IEEE Trans. Intell. Transp. Syst. (2022)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Wu, X., Liang, G., Lee, K.K., Xu, Y.: Crowd density estimation using texture analysis and learning. In: 2006 IEEE International Conference on Robotics and Biomimetics, pp. 214–219. IEEE (2006)
Google Scholar
Xiao, C., et al.: DSFNet: dynamic and static fusion network for moving object detection in satellite videos. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021)
Google Scholar
Zhang, L., Shi, M., Chen, Q.: Crowd counting via scale-adaptive convolutional neural network. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1113–1121. IEEE (2018)
Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant 61976095 and in part by the Natural Science Foundation of Guangdong Province, China, under Grant 2022A1515010114.

Author information

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Xu Han, Hao Wan, Wenxiao Tang & Wenxiong Kang
Guangdong Airport Baiyun Information Technology Co., Ltd. Postdoctoral Innovation Practice Base, Guangzhou, China
Hao Wan
Pazhou Lab, Guangzhou, China
Wenxiong Kang
Guangdong Enterprise Key Laboratory of Intelligent Finance, Guangzhou, China
Wenxiong Kang

Authors

Xu Han
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wan
View author publications
You can also search for this author in PubMed Google Scholar
Wenxiao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxiong Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxiong Kang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, X., Wan, H., Tang, W., Kang, W. (2024). Airport Boarding Bridge Pedestrian Detection Based on Spatial Attention and Joint Crowd Density Estimation. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14474. Springer, Singapore. https://doi.org/10.1007/978-981-99-9119-8_20

Download citation

DOI: https://doi.org/10.1007/978-981-99-9119-8_20
Published: 03 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9118-1
Online ISBN: 978-981-99-9119-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Airport Boarding Bridge Pedestrian Detection Based on Spatial Attention and Joint Crowd Density Estimation