A robust deep networks based multi-object multi-camera tracking system for city scale traffic

Zaman, Muhammad Imran; Bajwa, Usama Ijaz; Saleem, Gulshan; Raza, Rana Hammad

doi:10.1007/s11042-023-16243-7

A robust deep networks based multi-object multi-camera tracking system for city scale traffic

Published: 21 July 2023

Volume 83, pages 17163–17181, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muhammad Imran Zaman¹,
Usama Ijaz Bajwa ORCID: orcid.org/0000-0001-5755-1194¹,
Gulshan Saleem¹ &
…
Rana Hammad Raza²

2215 Accesses
1 Citation
Explore all metrics

Abstract

Vision sensors are becoming more important in Intelligent Transportation Systems (ITS) for traffic monitoring, management, and optimization as the number of network cameras continues to rise. However, manual object tracking and matching across multiple non-overlapping cameras pose significant challenges in city-scale urban traffic scenarios. These challenges include handling diverse vehicle attributes, occlusions, illumination variations, shadows, and varying video resolutions. To address these issues, we propose an efficient and cost-effective deep learning-based framework for Multi-Object Multi-Camera Tracking (MO-MCT). The proposed framework utilizes Mask R-CNN for object detection and employs Non-Maximum Suppression (NMS) to select target objects from overlapping detections. Transfer learning is employed for re-identification, enabling the association and generation of vehicle tracklets across multiple cameras. Moreover, we leverage appropriate loss functions and distance measures to handle occlusion, illumination, and shadow challenges. The final solution identification module performs feature extraction using ResNet-152 coupled with Deep SORT based vehicle tracking. The proposed framework is evaluated on the 5th AI City Challenge dataset (Track 3), comprising 46 camera feeds. Among these 46 camera streams, 40 are used for model training and validation, while the remaining six are utilized for model testing. The proposed framework achieves competitive performance with an IDF1 score of 0.8289, and precision and recall scores of 0.9026 and 0.8527 respectively, demonstrating its effectiveness in robust and accurate vehicle tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminatively Trained Multi-source CNN Model for Multi-camera Based Vehicle Tracking Under Occlusion Conditions

Multi-camera trajectory matching based on hierarchical clustering and constraints

Article Open access 19 October 2023

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

Availability of Data and Materials

The dataset for 5th AI City Challenge, Track-3 is used for these experiments and is available for use after registration at the link: https://www.aicitychallenge.org/2021-data-and-evaluation/

Code Availability

The code and trained models can be obtained from the project repository: https://github.com/imranzaman5202/MO-MCT

References

Ahmed, N., Asif, H.M.S.: Ensembling convolutional neural networks for perceptual image quality assessment. In: 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS), pp. 1–5 (2019). IEEE
Ahmed, N., Asif, H.M.S.: Perceptual quality assessment of digital images using deep features. Computing & Informatics 39(3) (2020)
Ahmed, N., Shahzad Asif, H., Bhatti, A.R., Khan, A.: Deep ensembling for perceptual image quality assessment. Soft Computing, 1–22 (2022)
Ahmed N, Asif HMS, Khalid H (2021) Piqi: perceptual image quality index based on ensemble of gaussian process regression. Multimedia Tools and Applications 80(10):15677–15700
Article Google Scholar
Benali Amjoud, A., Amrouch, M.: Convolutional neural networks backbones for object detection. In: International Conference on Image and Signal Processing, pp. 282–289 (2020). Springer
Efficient anomaly recognition using surveillance videos
Gou, M., Karanam, S., Liu, W., Camps, O., Radke, R.J.: Dukemtmc4reid: A large-scale multi-camera person re-identification dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 10–19 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, Y., Han, J., Yu, W., Hong, X., Wei, X., Gong, Y.: City-scale multi-camera vehicle tracking by semantic attribute parsing and cross-camera tracklet matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 576–577 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hosang, J., Benenson, R., Schiele, B.: Learning non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4507–4515 (2017)
Kim, S.-W., Kook, H.-K., Sun, J.-Y., Kang, M.-C., Ko, S.-J.: Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 234–250 (2018)
Kohl, P., Specker, A., Schumann, A., Beyerer, J.: The mta dataset for multi-target multi-camera pedestrian tracking by weighted distance aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1042–1043 (2020)
Kulkarni, P., Mohan, S., Rogers, S., Tabkhi, H.: Key-track: A lightweight scalable lstm-based pedestrian tracker for surveillance systems. In: International Conference on Image Analysis and Recognition, pp. 208–219 (2019). Springer
Kumar, R., Charpiat, G., Thonnat, M.: Multiple object tracking by efficient graph partitioning. In: Asian Conference on Computer Vision, pp. 445–460 (2014). Springer
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: European Conference on Computer Vision, pp. 574–591 (2020). Springer
Li, P., Li, G., Yan, Z., Li, Y., Lu, M., Xu, P., Gu, Y., Bai, B., Zhang, Y., Chuxing, D.: Spatio-temporal consistency and hierarchical matching for multi-target multi-camera vehicle tracking. In: CVPR Workshops, pp. 222–230 (2019)
Li, P., Zhang, J., Zhu, Z., Li, Y., Jiang, L., Huang, G.: State-aware re-identification feature for multi-target multi-camera tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016). Springer
Liu, J., Jiang, N., Zhou, Z., Xu, Y.: Person re-identification with joint-loss. In: 2017 International Conference on Virtual Reality and Visualization (ICVRV), pp. 1–6 (2017). IEEE
Liu, C., Zhang, Y., Luo, H., Tang, J., Chen, W., Xu, X., Wang, F., Li, H., Shen, Y.-D.: City-scale multi-camera vehicle tracking guided by crossroad zones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4129–4137 (2021)
Lv Z, Lou R, Singh AK (2020) Ai empowered communication systems for intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems 22(7):4579–4587
Article Google Scholar
Ma, C., Li, Y., Yang, F., Zhang, Z., Zhuang, Y., Jia, H., Xie, X.: Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 253–261 (2019)
Martinel N, Foresti GL, Micheloni C (2020) Deep pyramidal pooling with attention for person re-identification. IEEE Transactions on Image Processing 29:7306–7316
Article Google Scholar
Naphade, M., Tang, Z., Chang, M.-C., Anastasiu, D.C., Sharma, A., Chellappa, R., Wang, S., Chakraborty, P., Huang, T., Hwang, J.-N., et al. The 2019 ai city challenge. In: CVPR Workshops, vol. 8, p. 2 (2019)
Naphade, M., Wang, S., Anastasiu, D.C., Tang, Z., Chang, M.-C., Yang, X., Yao, Y., Zheng, L., Chakraborty, P., Lopez, C.E., et al. The 5th ai city challenge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4263–4273 (2021)
Ning X, Tian W, Yu Z, Li W, Bai X, Wang Y (2022) Hcfnn: high-order coverage function neural network for image classification. Pattern Recognition 131:108873
Article Google Scholar
Peri, N., Khorramshahi, P., Rambhatla, S.S., Shenoy, V., Rawat, S., Chen, J.-C., Chellappa, R.: Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 622–623 (2020)
Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017)
Qiu Z, Zhao N, Zhou L, Wang M, Yang L, Fang H, He Y, Liu Y (2020) Vision-based moving obstacle detection and tracking in paddy field using improved yolov3 and deep sort. Sensors 20(15):4082
Article Google Scholar
Ren, P., Lu, K., Yang, Y., Yang, Y., Sun, G., Wang, W., Wang, G., Cao, J., Zhao, Z., Liu, W.: Multi-camera vehicle tracking system based on spatial-temporal filtering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4213–4219 (2021)
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision, pp. 17–35 (2016). Springer
Saleem, G., Bajwa, U.I., Raza, R.H.: Surveilia: Anomaly identification using temporally localized surveillance videos. Available at SSRN 4308311
Saleem M, Abbas S, Ghazal TM, Khan MA, Sahawneh N, Ahmad M (2022) Smart cities: Fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques. Egyptian Informatics Journal 23(3):417–426
Article Google Scholar
Saleem G, Bajwa UI, Raza RH (2023) Toward human activity recognition: a survey. Neural Computing and Applications 35(5):4145–4182
Article Google Scholar
Schofield, K., Lynam, N.R.: Vehicle blind spot detection display system. Google Patents. US Patent 5,786,772 (1998)
Sharma A, Anand S, Kaul SK (2020) Intelligent querying for target tracking in camera networks using deep q-learning with n-step bootstrapping. Image and Vision Computing 103:104022
Article Google Scholar
Shim, K., Yoon, S., Ko, K., Kim, C.: Multi-target multi-camera vehicle tracking for city-scale traffic management. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4193–4200 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Specker, A., Stadler, D., Florin, L., Beyerer, J.: An occlusion-aware multi-target multi-camera tracking system. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4173–4182 (2021)
Sun, H., Chen, Z., Yan, S., Xu, L.: Mvp matching: A maximum-value perfect matching for mining hard samples, with application to person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6737–6747 (2019)
Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence 43(1):104–119
Google Scholar
Tan, L., Dong, X., Ma, Y., Yu, C.: A multiple object tracking algorithm based on yolo detection. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–5 (2018). IEEE
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019). PMLR
Tan, X., Wang, Z., Jiang, M., Yang, X., Wang, J., Gao, Y., Su, X., Ye, X., Yuan, Y., He, D., et al. Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features. In: CVPR Workshops, pp. 275–284 (2019)
Tang, S., Andres, B., Andriluka, M., Schiele, B.: Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041 (2015)
Tesfaye YT, Zemene E, Prati A, Pelillo M, Shah M (2019) Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. International Journal of Computer Vision 127(9):1303–1320
Article Google Scholar
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A., Leibe, B.: Mots: Multi-object tracking and segmentation. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 7942–7951 (2019)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122 (2020). Springer
Wang Q, Cao L, Xia J, Zhang Y et al (2020) Mtcnn-kcf-deepsort: Driver face detection and tracking algorithm based on cascaded kernel correlation filtering and deep sort. Technical report, SAE Technical Paper
Google Scholar
Wang Q, Liu Y, Xiong Z, Yuan Y (2022) Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 60:1–15
Google Scholar
Wang C, Wang X, Zhang J, Zhang L, Bai X, Ning X, Zhou J, Hancock E (2022) Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recognition 124:108498
Article Google Scholar
Wang C, Ning X, Sun L, Zhang L, Li W, Bai X (2022) Learning discriminative features by covering local geometric space for point cloud analysis. IEEE Transactions on Geoscience and Remote Sensing 60:1–15
Google Scholar
Wen L, Du D, Cai Z, Lei Z, Chang M-C, Qi H, Lim J, Yang M-H, Lyu S (2020) Ua-detrac: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding 193:102907
Article Google Scholar
Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Wu, M., Qian, Y., Wang, C., Yang, M.: A multi-camera vehicle tracking system based on city-scale vehicle re-id and spatial-temporal information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4077–4086 (2021)
Yang, K.-S., Chen, Y.-K., Chen, T.-S., Liu, C.-T., Chien, S.-Y.: Tracklet-refined multi-camera tracking based on balanced cross-domain re-identification for vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3983–3992 (2021)
Ye, J., Yang, X., Kang, S., He, Y., Zhang, W., Huang, L., Jiang, M., Zhang, W., Shi, Y., Xia, M., et al. A robust mtmc tracking system for ai-city challenge 2021. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4044–4053 (2021)
Yoon K, Gwak J, Song Y-M, Yoon Y-C, Jeon M-G (2020) Oneshotda: Online multi-object tracker with one-shot-learning-based data association. IEEE Access 8:38060–38072
Article Google Scholar
Yuan Y, Xiong Z, Wang Q (2019) Vssa-net: Vertical spatial sequence attention network for traffic sign detection. IEEE transactions on image processing 28(7):3423–3434
Article MathSciNet Google Scholar
Zhang, X., Izquierdo, E.: Real-time multi-target multi-camera tracking with spatial-temporal information. In: 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2019). IEEE
Zhang K, Sun M, Han TX, Yuan X, Guo L, Liu T (2017) Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28(6):1303–1314
Article Google Scholar
Zhang Y, Sheng H, Wu Y, Wang S, Lyu W, Ke W, Xiong Z (2020) Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing 29:6694–6706
Article Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)

Download references

Acknowledgements

Acknowledgments are not compulsory. Where included they should be brief. Grant or contribution numbers may be acknowledged. Please refer to Journal-level guidance for any specific requirements.

Funding

This study acknowledges partial support from the National Center of Big Data and Cloud Computing (NCBC) and HEC of Pakistan for conducting this research.

Author information

Authors and Affiliations

Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
Muhammad Imran Zaman, Usama Ijaz Bajwa & Gulshan Saleem
Pakistan Navy Engineering College, National University of Sciences and Technology (NUST), Islamabad, Pakistan
Rana Hammad Raza

Authors

Muhammad Imran Zaman
View author publications
You can also search for this author in PubMed Google Scholar
Usama Ijaz Bajwa
View author publications
You can also search for this author in PubMed Google Scholar
Gulshan Saleem
View author publications
You can also search for this author in PubMed Google Scholar
Rana Hammad Raza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zaman: conception, implementation, writeup and revision; Bajwa: conception, supervision and revision; Saleem: implementation, writeup and revision; Raza conception, supervision and revision.

Corresponding author

Correspondence to Usama Ijaz Bajwa.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zaman, M.I., Bajwa, U.I., Saleem, G. et al. A robust deep networks based multi-object multi-camera tracking system for city scale traffic. Multimed Tools Appl 83, 17163–17181 (2024). https://doi.org/10.1007/s11042-023-16243-7

Download citation

Received: 13 October 2022
Revised: 31 May 2023
Accepted: 04 July 2023
Published: 21 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16243-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust deep networks based multi-object multi-camera tracking system for city scale traffic

Abstract

Access this article

Similar content being viewed by others

Discriminatively Trained Multi-source CNN Model for Multi-camera Based Vehicle Tracking Under Occlusion Conditions

Multi-camera trajectory matching based on hierarchical clustering and constraints

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

Availability of Data and Materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A robust deep networks based multi-object multi-camera tracking system for city scale traffic

Abstract

Access this article

Similar content being viewed by others

Discriminatively Trained Multi-source CNN Model for Multi-camera Based Vehicle Tracking Under Occlusion Conditions

Multi-camera trajectory matching based on hierarchical clustering and constraints

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

Availability of Data and Materials

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation