Skip to main content
Log in

OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Vehicle density estimation plays a crucial role in traffic monitoring, providing the traffic management department with the traffic volume and traffic flow to monitor traffic safety. Currently, all vehicle density estimation methods based on Convolutional Neural Network (CNN) fall short in extracting global information due to the limited receptive field of the convolution kernel, resulting in the loss of vehicle information. Vision Transformer can capture long-distance dependencies and establish global context information through the self-attention mechanism, and is expected to be applied to vehicle density estimation. However, directly using Vision Transformer will result in the discontinuity of vehicle information between patches. In addition, the completion of vehicle density estimation also faces challenges, such as vehicle multi-scale changes, occlusion, and background noise. To solve the above challenges, a novel Overlapping Splitting Transformer Network (OSTNet) tailored for vehicle density estimation is designed. Overlapping splitting is proposed so that each patch shares half of its area, ensuring the continuity of vehicle information between patches. Dilation convolution is introduced to remove fixed-size position codes in order to provide accurate vehicle localization information. Meanwhile, Feature Pyramid Aggregation (FPA) module is utilized to obtain different scale information, which can tackle the issue of multi-scale changes. Moreover, a novel loss function called integrated density loss is designed to address the existing vehicle occlusion and background noise problems. The extensive experimental results on four open source datasets have shown that OSTNet outperforms the SOTA methods and can help traffic management department to better estimate vehicle density. The source code and pre-trained models are available at: https://github.com/quyang-hub/vehicle-density-estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Code Availability

The source code and pre-trained models are available at: https://github.com/quyang-hub/vehicle-density-estimation

References

  1. Li S, Chang F, Liu C (2021) Bi-directional dense traffic counting based on spatio-temporal counting feature and counting-lstm network. IEEE Trans Intell Transp Syst 22(12):7395–7407

    Article  Google Scholar 

  2. Feng J, Liang Y, Zhang X, Zhang J, Jiao L (2023) Sdanet: semantic-embedded density adaptive network for moving vehicle detection in satellite videos. IEEE Trans Image Process 32:1788–1801

    Article  Google Scholar 

  3. Harikrishnan P, Thomas A, Gopi VP, Palanisamy P, Wahid KA (2021) Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl Intell 51:4714–4729

    Article  Google Scholar 

  4. Aljamal MA, Abdelghaffar HM, Rakha HA (2021) Real-time estimation of vehicle counts on signalized intersection approaches using probe vehicle data. IEEE Trans Intell Transp Syst 22(5):2719–2729

    Article  Google Scholar 

  5. Liu J, Kang Y, Li H, Wang H, Yang X (2023) Stghtn: Spatial-temporal gated hybrid transformer network for traffic flow forecasting. Appl Intell 53(10):12472–12488

    Article  Google Scholar 

  6. Gong S, Zhang S, Yang J, Dai D, Schiele B (2022) Bi-level alignment for cross-domain crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7542–7550

  7. Zou S, Chen H, Feng H, Xiao G, Qin Z, Cai W (2023) Traffic flow video image recognition and analysis based on multi-target tracking algorithm and deep learning. IEEE Trans Intell Transp Syst 24(8):8762–8775. https://doi.org/10.1109/TITS.2022.3222608

    Article  Google Scholar 

  8. Zhang S, Wu G, Costeira JP, Moura JM (2017) Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE international conference on computer vision. pp 3667–3676

  9. Lempitsky V, Zisserman A (2010) Learning to count objects in images. Adv Neural Inf Process Sys 23

  10. Sooksatra S, Yoshitaka A, Kondo T, Bunnun P (2019) The density-aware estimation network for vehicle counting in traffic surveillance system,” in 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, pp 231–238

  11. Song Q, Wang C, Jiang Z, Wang Y, Tai Y, Wang C, Li J, Huang F, Wu Y (2021) Rethinking counting and localization in crowds: a purely point-based framework. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp 3345–3354

  12. Hu Y-X, Jia R-S, Li Y-C, Zhang Q, Sun H-M (2022) Traffic density estimation via a multi-level feature fusion network. Appl Intell 1–13

  13. Jin Y, Wu J, Wang W, Wang Y, Yang X, Zheng J (2022) Dense vehicle counting estimation via a synergism attention network. Electronics 11(22):3792

    Article  Google Scholar 

  14. Premaratne P, Kadhim IJ, Blacklidge R, Lee M (2023) Comprehensive review on vehicle detection, classification and counting on highways. Neurocomputing 126627

  15. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy

  16. Wang B, Liu H, Samaras D, Nguyen MH (2020) Distribution matching for crowd counting. Ann Conf Neural Inf Process Sys 33:1595–1607

    Google Scholar 

  17. Ma Z, Wei X, Hong X, Gong Y (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6142–6151

  18. Gao Z, Zhai R, Wang P, Yan X, Qin H, Tang Y, Ramesh B (2017) Synergizing appearance and motion with low rank representation for vehicle counting and traffic flow analysis. IEEE Trans Intell Transp Syst 19(8):2675–2685

    Article  Google Scholar 

  19. Abdelwahab MA (2019) Accurate vehicle counting approach based on deep neural networks. In: 2019 International Conference on Innovative Trends in Computer Engineering (ITCE). IEEE, pp 1–5

  20. Xu H, Cai Z, Li R, Li W (2022) Efficient citycam-to-edge cooperative learning for vehicle counting in its. IEEE Trans Intell Transp Syst 23(9):16600–16611

    Article  Google Scholar 

  21. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer. pp 213–229

  22. Stock P, Joulin A, Gribonval R, Graham B, Jégou H (2020) And the bit goes down: revisiting the quantization of neural networks. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia. OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=rJehVyrKwH

  23. Zheng M, Gao P, Zhang R, Li K, Li H, Dong H (2021) End-to-end object detection with adaptive clustering transformer. In: 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021. BMVA Press. pp 226. [Online]. Available: https://www.bmvc2021-virtualconference.com/assets/papers/0709.pdf

  24. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR. pp 10347–10357

  25. Vasu PKA, Gabriel J, Zhu J, Tuzel O, Ranjan A (2023) Fastvit: a fast hybrid vision transformer using structural reparameterization. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5785–5795

  26. Chu X, Tian Z, Wang Y, Zhang B, Ren H, Wei X, Xia H, Shen C (2021) Twins: revisiting the design of spatial attention in vision transformers. Adv Neural Inf Process Syst 34:9355–9366

    Google Scholar 

  27. Wang W, Shen Z, Li D, Zhong P, Chen Y (2022) Probability-based graph embedding cross-domain and class discriminative feature learning for domain adaptation. IEEE Trans Image Process 32:72–87

    Article  Google Scholar 

  28. Li Y, Zhang K, Cao J, Timofte R, Van Gool L (2021) Localvit: bringing locality to vision transformers. arXiv:2104.05707

  29. Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1091–1100

  30. Fang Y, Li Y, Tu X, Tan T, Wang X (2020) Face completion with hybrid dilated convolution. Sig Process Image Comm 80:115664

    Article  Google Scholar 

  31. Ding X, Zhang X, Han J, Ding G (2022) Scaling up your kernels to 31x31: revisiting large kernel design in cnns. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11963–11975

  32. Yang L, Zhong P (2020) Robust adaptation regularization based on within-class scatter for domain adaptation. Neural Netw 124:60–74

    Article  Google Scholar 

  33. Guerrero-Gómez-Olmedo R, Torre-Jiménez B, López-Sastre R, Maldonado-Bascón S, Onoro-Rubio D (2015) Extremely overlapping vehicle counting. In: Pattern recognition and image analysis: 7th Iberian conference, IbPRIA 2015, Santiago de Compostela, Spain, June 17-19, 2015, Proceedings 7. Springer, pp 423–431

  34. S. Zhang, G. Wu, J. P. Costeira, and J. M. Moura, Understanding traffic density from large-scale web camera data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 5898–5907

  35. Wen L, Du D, Cai Z, Lei Z, Chang M-C, Qi H, Lim J, Yang M-H, Lyu S (2020) Ua-detrac: a new benchmark and protocol for multi-object detection and tracking. Comput Vis Image Underst 193:102907

    Article  Google Scholar 

  36. Wang Q, Gao J, Lin W, Li X (2020) Nwpu-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Pattern Anal Mach Intell 43(6):2141–2149

    Article  Google Scholar 

  37. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7

  38. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer, pp 615–629

  39. Yu S-Y, Pu J (2020) Aggregated context network for crowd counting. Front Inf Technol Elect Eng 21(11):1626–1638

    Article  Google Scholar 

  40. Chen X, Bin Y, Sang N, Gao C (2019) Scale pyramid network for crowd counting. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1941–1950

  41. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 589–597

  42. Zand M, Damirchi H, Farley A, Molahasani M, Greenspan M, Etemad A (2022) Multiscale crowd counting and localization by multitask point supervision. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1820–1824

  43. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5099–5108

Download references

Acknowledgements

The authors would like to thank the reviewers for their valuable comments and suggestions to improve the quality of this paper.

Funding

This work was supported by Chinese Universities Scientific Fund (No. 2022TC109), Double First-class International Cooperation Project of China Agricultural University (No. 10020799), and Double First-class Project of China Agricultural University.

Author information

Authors and Affiliations

Authors

Contributions

Yang Qu completed data collection, methodology, validation, and wrote the original draft. Liran Yang and Qiuyue Li completed formal analysis. Ping Zhong completed supervision, conceptualization, and reviewed the manuscript. All authors read and approved this manuscript.

Corresponding authors

Correspondence to Ping Zhong or Qiuyue Li.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics Statement

During the whole research process, all authors followed international guidelines and ensured that the animals were not harmed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Algorithm 1
figure a

Pytorch snippet of overlapping splitting.

Algorithm 2
figure b

Pytorch snippet of locally-grouped self-attention.

Algorithm 3
figure c

OSTNettraining.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, Y., Yang, L., Zhong, P. et al. OSTNet: overlapping splitting transformer network with integrated density loss for vehicle density estimation. Appl Intell 54, 8856–8875 (2024). https://doi.org/10.1007/s10489-024-05641-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05641-5

Keywords