OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation

Qu, Yang; Yang, Liran; Zhong, Ping; Li, Qiuyue

doi:10.1007/s10489-024-05641-5

OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation

Published: 05 July 2024

Volume 54, pages 8856–8875, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yang Qu^1,2,
Liran Yang⁵,
Ping Zhong^1,2,3,4 &
…
Qiuyue Li⁶

257 Accesses
Explore all metrics

Abstract

Vehicle density estimation plays a crucial role in traffic monitoring, providing the traffic management department with the traffic volume and traffic flow to monitor traffic safety. Currently, all vehicle density estimation methods based on Convolutional Neural Network (CNN) fall short in extracting global information due to the limited receptive field of the convolution kernel, resulting in the loss of vehicle information. Vision Transformer can capture long-distance dependencies and establish global context information through the self-attention mechanism, and is expected to be applied to vehicle density estimation. However, directly using Vision Transformer will result in the discontinuity of vehicle information between patches. In addition, the completion of vehicle density estimation also faces challenges, such as vehicle multi-scale changes, occlusion, and background noise. To solve the above challenges, a novel Overlapping Splitting Transformer Network (OSTNet) tailored for vehicle density estimation is designed. Overlapping splitting is proposed so that each patch shares half of its area, ensuring the continuity of vehicle information between patches. Dilation convolution is introduced to remove fixed-size position codes in order to provide accurate vehicle localization information. Meanwhile, Feature Pyramid Aggregation (FPA) module is utilized to obtain different scale information, which can tackle the issue of multi-scale changes. Moreover, a novel loss function called integrated density loss is designed to address the existing vehicle occlusion and background noise problems. The extensive experimental results on four open source datasets have shown that OSTNet outperforms the SOTA methods and can help traffic management department to better estimate vehicle density. The source code and pre-trained models are available at: https://github.com/quyang-hub/vehicle-density-estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and Implementation of Vehicle Density Detection Method Based on Deep Learning

MSCNet: Dense vehicle counting method based on multi-scale dilated convolution channel-aware deep network

Article 08 July 2023

Traffic density estimation via a multi-level feature fusion network

Article 13 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code Availability

The source code and pre-trained models are available at: https://github.com/quyang-hub/vehicle-density-estimation

References

Li S, Chang F, Liu C (2021) Bi-directional dense traffic counting based on spatio-temporal counting feature and counting-lstm network. IEEE Trans Intell Transp Syst 22(12):7395–7407
Article Google Scholar
Feng J, Liang Y, Zhang X, Zhang J, Jiao L (2023) Sdanet: semantic-embedded density adaptive network for moving vehicle detection in satellite videos. IEEE Trans Image Process 32:1788–1801
Article Google Scholar
Harikrishnan P, Thomas A, Gopi VP, Palanisamy P, Wahid KA (2021) Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl Intell 51:4714–4729
Article Google Scholar
Aljamal MA, Abdelghaffar HM, Rakha HA (2021) Real-time estimation of vehicle counts on signalized intersection approaches using probe vehicle data. IEEE Trans Intell Transp Syst 22(5):2719–2729
Article Google Scholar
Liu J, Kang Y, Li H, Wang H, Yang X (2023) Stghtn: Spatial-temporal gated hybrid transformer network for traffic flow forecasting. Appl Intell 53(10):12472–12488
Article Google Scholar
Gong S, Zhang S, Yang J, Dai D, Schiele B (2022) Bi-level alignment for cross-domain crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7542–7550
Zou S, Chen H, Feng H, Xiao G, Qin Z, Cai W (2023) Traffic flow video image recognition and analysis based on multi-target tracking algorithm and deep learning. IEEE Trans Intell Transp Syst 24(8):8762–8775. https://doi.org/10.1109/TITS.2022.3222608
Article Google Scholar
Zhang S, Wu G, Costeira JP, Moura JM (2017) Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE international conference on computer vision. pp 3667–3676
Lempitsky V, Zisserman A (2010) Learning to count objects in images. Adv Neural Inf Process Sys 23
Sooksatra S, Yoshitaka A, Kondo T, Bunnun P (2019) The density-aware estimation network for vehicle counting in traffic surveillance system,” in 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, pp 231–238
Song Q, Wang C, Jiang Z, Wang Y, Tai Y, Wang C, Li J, Huang F, Wu Y (2021) Rethinking counting and localization in crowds: a purely point-based framework. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp 3345–3354
Hu Y-X, Jia R-S, Li Y-C, Zhang Q, Sun H-M (2022) Traffic density estimation via a multi-level feature fusion network. Appl Intell 1–13
Jin Y, Wu J, Wang W, Wang Y, Yang X, Zheng J (2022) Dense vehicle counting estimation via a synergism attention network. Electronics 11(22):3792
Article Google Scholar
Premaratne P, Kadhim IJ, Blacklidge R, Lee M (2023) Comprehensive review on vehicle detection, classification and counting on highways. Neurocomputing 126627
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy
Wang B, Liu H, Samaras D, Nguyen MH (2020) Distribution matching for crowd counting. Ann Conf Neural Inf Process Sys 33:1595–1607
Google Scholar
Ma Z, Wei X, Hong X, Gong Y (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6142–6151
Gao Z, Zhai R, Wang P, Yan X, Qin H, Tang Y, Ramesh B (2017) Synergizing appearance and motion with low rank representation for vehicle counting and traffic flow analysis. IEEE Trans Intell Transp Syst 19(8):2675–2685
Article Google Scholar
Abdelwahab MA (2019) Accurate vehicle counting approach based on deep neural networks. In: 2019 International Conference on Innovative Trends in Computer Engineering (ITCE). IEEE, pp 1–5
Xu H, Cai Z, Li R, Li W (2022) Efficient citycam-to-edge cooperative learning for vehicle counting in its. IEEE Trans Intell Transp Syst 23(9):16600–16611
Article Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer. pp 213–229
Stock P, Joulin A, Gribonval R, Graham B, Jégou H (2020) And the bit goes down: revisiting the quantization of neural networks. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia. OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=rJehVyrKwH
Zheng M, Gao P, Zhang R, Li K, Li H, Dong H (2021) End-to-end object detection with adaptive clustering transformer. In: 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021. BMVA Press. pp 226. [Online]. Available: https://www.bmvc2021-virtualconference.com/assets/papers/0709.pdf
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR. pp 10347–10357
Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Fastvit: a fast hybrid vision transformer using structural reparameterization. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5785–5795
Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, Xia H, Shen C (2021) Twins: revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst 34:9355–9366
Google Scholar
Wang W, Shen Z, Li D, Zhong P, Chen Y (2022) Probability-based graph embedding cross-domain and class discriminative feature learning for domain adaptation. IEEE Trans Image Process 32:72–87
Article Google Scholar
Li Y, Zhang K, Cao J, Timofte R, Van Gool L (2021) Localvit: bringing locality to vision transformers. arXiv:2104.05707
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1091–1100
Fang Y, Li Y, Tu X, Tan T, Wang X (2020) Face completion with hybrid dilated convolution. Sig Process Image Comm 80:115664
Article Google Scholar
Ding X, Zhang X, Han J, Ding G (2022) Scaling up your kernels to 31x31: revisiting large kernel design in cnns. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11963–11975
Yang L, Zhong P (2020) Robust adaptation regularization based on within-class scatter for domain adaptation. Neural Netw 124:60–74
Article Google Scholar
Guerrero-Gómez-Olmedo R, Torre-Jiménez B, López-Sastre R, Maldonado-Bascón S, Onoro-Rubio D (2015) Extremely overlapping vehicle counting. In: Pattern recognition and image analysis: 7th Iberian conference, IbPRIA 2015, Santiago de Compostela, Spain, June 17-19, 2015, Proceedings 7. Springer, pp 423–431
S. Zhang, G. Wu, J. P. Costeira, and J. M. Moura, Understanding traffic density from large-scale web camera data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 5898–5907
Wen L, Du D, Cai Z, Lei Z, Chang M-C, Qi H, Lim J, Yang M-H, Lyu S (2020) Ua-detrac: a new benchmark and protocol for multi-object detection and tracking. Comput Vis Image Underst 193:102907
Article Google Scholar
Wang Q, Gao J, Lin W, Li X (2020) Nwpu-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Pattern Anal Mach Intell 43(6):2141–2149
Article Google Scholar
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer, pp 615–629
Yu S-Y, Pu J (2020) Aggregated context network for crowd counting. Front Inf Technol Elect Eng 21(11):1626–1638
Article Google Scholar
Chen X, Bin Y, Sang N, Gao C (2019) Scale pyramid network for crowd counting. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1941–1950
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 589–597
Zand M, Damirchi H, Farley A, Molahasani M, Greenspan M, Etemad A (2022) Multiscale crowd counting and localization by multitask point supervision. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1820–1824
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5099–5108

Download references

Acknowledgements

The authors would like to thank the reviewers for their valuable comments and suggestions to improve the quality of this paper.

Funding

This work was supported by Chinese Universities Scientific Fund (No. 2022TC109), Double First-class International Cooperation Project of China Agricultural University (No. 10020799), and Double First-class Project of China Agricultural University.

Author information

Authors and Affiliations

College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
Yang Qu & Ping Zhong
National Innovation Center for Digital Fishery, China Agricultural University, Beijing, 100083, China
Yang Qu & Ping Zhong
Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing, 100083, China
Ping Zhong
Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, Beijing, 100083, China
Ping Zhong
Department of Computer, Hebei Key Laboratory of Knowledge Computing for Energy & Power, North China Electric Power University, Hebei, 071066, China
Liran Yang
College of Science, China Agricultural University, Beijing, 100083, China
Qiuyue Li

Authors

Yang Qu
View author publications
You can also search for this author inPubMed Google Scholar
Liran Yang
View author publications
You can also search for this author inPubMed Google Scholar
Ping Zhong
View author publications
You can also search for this author inPubMed Google Scholar
Qiuyue Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yang Qu completed data collection, methodology, validation, and wrote the original draft. Liran Yang and Qiuyue Li completed formal analysis. Ping Zhong completed supervision, conceptualization, and reviewed the manuscript. All authors read and approved this manuscript.

Corresponding authors

Correspondence to Ping Zhong or Qiuyue Li.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics Statement

During the whole research process, all authors followed international guidelines and ensured that the animals were not harmed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qu, Y., Yang, L., Zhong, P. et al. OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation. Appl Intell 54, 8856–8875 (2024). https://doi.org/10.1007/s10489-024-05641-5

Download citation

Accepted: 23 June 2024
Published: 05 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s10489-024-05641-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design and Implementation of Vehicle Density Detection Method Based on Deep Learning

MSCNet: Dense vehicle counting method based on multi-scale dilated convolution channel-aware deep network

Traffic density estimation via a multi-level feature fusion network

Explore related subjects

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Ethics Statement

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now