Abstract
Currently, tracking by detection technique is widely used in crowd or vehicle counting. However, it is difficult to meet detection and counting with mixed pedestrian and vehicular traffic of smart park scenes in emergencies, for occluding small objects recognition problem. An improved detection model YOLOv5-FAC is proposed based YOLOv5s. First, a P2 detection layer is added to expand the detection range of the model and improve its detection ability of different sizes. Second, an auxiliary inference network is constructed using programmable gradient information to provide the model with a stronger information fitting capability. Finally, a cascading triplet attention mechanism is added to the head of model to increase the feature fusion capability. Then, a collision line counting method combined with OCSORT track technology is proposed, in which both direction movement of traffic is considered. The experimental results show that YOLOv5s-FAC has a significantly improved detection quality, with a mAP of 69.5%, and the counting accuracy reached 94.4%.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig5_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03735-8/MediaObjects/11760_2024_3735_Fig7_HTML.png)
Similar content being viewed by others
Data availability
The data and code of this study are openly available in [zenodo] at https://doi.org/https://doi.org/10.5281/zenodo.11509921.
References
Software and Integrated Circuits: 2022 China Smart Park Development Research Report. (2022)
Lee, M.R., Lin, D.T.: Vehicle counting based on a stereo vision depth maps for parking management. Multimed. Tools Appl. 78, 6827–6846 (2019). https://doi.org/10.1007/s11042-018-6394-6
Chang, T.Y., Ku, C.C.Y., Cheng, T.Y., Chung, C.K., Chang Sanchez, E.: Modular counting management system for mall parking services. Comput. Ind. Eng. 171, 108362 (2022). https://doi.org/10.1016/j.cie.2022.108362
Zhang, Q., Chan, A.B.: Wide-area crowd counting: multi-view fusion networks for counting in large scenes. Int. J. Comput. Vis. 130, 1938–1960 (2022). https://doi.org/10.1007/s11263-022-01626-4
Zhai, W., Gao, M., Souri, A., Li, Q., Guo, X., Shang, J., Zou, G.: An attentive hierarchy ConvNet for crowd counting in smart city. Clust. Comput. 26, 1099–1111 (2023). https://doi.org/10.1007/s10586-022-03749-2
Guo, X., Song, K., Gao, M., Zhai, W., Li, Q., Jeon, G.: Crowd counting in smart city via lightweight ghost attention pyramid network. Future Gener. Comput. Syst. 147, 328–338 (2023). https://doi.org/10.1016/j.future.2023.05.013
Guo, X., Gao, M., Zhai, W., Li, Q., Jeon, G.: Scale region recognition network for object counting in intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 24, 15920–15929 (2023). https://doi.org/10.1109/tits.2023.3296571
Zhai, W., Gao, M., Guo, X., Li, Q., Jeon, G.: Scale-context perceptive network for crowd counting and localization in smart city system. IEEE Internet Things J. 10, 18930–18940 (2023). https://doi.org/10.1109/jiot.2023.3268226
Guo, X., Gao, M., Zhai, W., Li, Q., Kim, K.H., Jeon, G.: Dense attention fusion network for object counting in IoT system. Mob. Netw. Appl. 28, 359–368 (2023). https://doi.org/10.1007/s11036-023-02090-1
Yen, M.H., Lin, B.S., Kuo, Y.L., Lee, I.J., Lin, B.S.: Adaptive indoor people-counting system based on edge AI computing. IEEE Trans. Emerg. Top. Comput. Intell. 8, 255–263 (2024). https://doi.org/10.1109/tetci.2023.3300172
Liu, H., Cheng, W., Li, C., Xu, Y., Fan, S.: Lightweight detection model RM-LFPN-YOLO for rebar counting. IEEE Access 12, 3936–3947 (2024). https://doi.org/10.1109/access.2024.3349978
Egi, Y., Hajyzadeh, M., Eyceyurt, E.: Drone-computer communication based tomato generative organ counting model using YOLO V5 and deep-sort. Agriculture 12, 1290 (2022). https://doi.org/10.3390/agriculture12091290
Zou, Y., Tian, Z., Cao, J., Ren, Y., Zhang, Y., Liu, L., Zhang, P., Ni, J.: Rice grain detection and counting method based on TCLE-YOLO model. Sensors 23, 9129 (2023). https://doi.org/10.3390/s23229129
Ren, P., Fang, W., Djahel, S.: A novel YOLO-based real-time people counting approach. In: 2017 International Smart Cities Conference (ISC2), pp 1–2. IEEE, Wuxi, China (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:180402767, (2018)
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:200410934, (2020)
Niu, C., Wang, W., Guo, H., Li, K.: Emergency evacuation simulation study based on improved YOLOv5s and anylogic. Appl. Sci. 13, 5812 (2023). https://doi.org/10.3390/app13095812
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021. arXiv:210708430, (2021)
Zhang, L.J., Fang, J.J., Liu, Y.X., Le, H.F., Rao, Z.Q., Zhao, J.X.: CR-YOLOv8: multiscale object detection in traffic sign images. IEEE Access 12, 219–228 (2024)
Wen, C., Chen, H., Ma, Z., Zhang, T., Yang, C., Su, H., Chen, H.: Pest-YOLO: a model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 13, 973985 (2022). https://doi.org/10.3389/fpls.2022.973985
Ma, D., Fang, H., Wang, N., Zhang, C., Dong, J., Hu, H.: Automatic detection and counting system for pavement cracks based on PCGAN and YOLO-MF. IEEE Trans. Intell. Transp. Syst. 23, 22166–22178 (2022). https://doi.org/10.1109/tits.2022.3161960
Li, J., Hu, Y., Zou, W.: Dynamic risk assessment of emergency evacuation in large public buildings: a case study. Int. J. Disaster Risk Reduct. 91, 103659 (2023). https://doi.org/10.1016/j.ijdrr.2023.103659
da Wu, J., Chen, B.Y., Shyr, W.J., Shih, F.Y.: Vehicle classification and counting system using YOLO object detection technology. Trait. Signal 38, 1087–1093 (2021). https://doi.org/10.18280/ts.380419
Xu, H., Zhou, W., Zhu, J., Huang, X., Wang, W.: Vehicle counting based on double virtual lines. Signal Image Video Process 11, 905–912 (2016). https://doi.org/10.1007/s11760-016-1038-7
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric SORT: rethinking SORT for robust multi-object tracking. arXiv:csCV/220314360, (2023)
Xu, D., Wu, Y.: An efficient detector with auxiliary network for remote sensing object detection. Electronics 12, 4448 (2023). https://doi.org/10.3390/electronics12214448
Zhou, Q., Shi, H., Xiang, W., Kang, B., Latecki, L.J.: DPNet: dual-path network for real-time object detection with lightweight attention. IEEE Trans. Neural Netw. Learn. Syst. (2024). https://doi.org/10.1109/tnnls.2024.3376563
Wang, C.Y., Yeh, I.H., Liao, H.Y.M.: YOLOv9: learning what you want to learn using programmable gradient information. arXiv:csCV/240213616, (2024)
Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3138–3147. IEEE, Waikoloa, HI, USA (2021)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, pp. 740–755. Springer, Cham (2014)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7132–7141. IEEE, Salt Lake City, UT, USA (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. arXiv:180706521, (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708–13717. IEEE, Nashville, TN, USA (2021)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531–11539. IEEE, Seattle, WA, USA (2020)
Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., Huang, Z.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1–5. IEEE, Rhodes Island, Greece (2023)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. arXiv:151202325 (2015)
Acknowledgements
The authors would like to acknowledge the financial support provided by the National Key Research and Development Program of China under Grant 2023YFC3008904 and the Fundamental Research Funds for Beijing University of Civil Engineering and Architecture under Grant X20109.
Author information
Authors and Affiliations
Contributions
Wei-Guang Zou: algorithm improvement experiments, including validation, ablation, and comparison experiments, implementation of counting methods, writing of the original manuscript, and editing. Yu-ling Hu: methodology and reviews. Xin-Yi Wang: comparative experiments of object tracking, image data acquisition for counting scenarios, and validation of counting performance. Jia-Feng Li: framework construction for detection and tracking, expansion and collection of Person and Vehicle datasets, and analysis of experimental data.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file1 (MP4 273989 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, W., Hu, Y., Wang, X. et al. YOLOv5s-FAC: enhanced feature association detector for person-vehicle counting in smart park. SIViP 19, 62 (2025). https://doi.org/10.1007/s11760-024-03735-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03735-8