Abstract
Vision sensors are becoming more important in Intelligent Transportation Systems (ITS) for traffic monitoring, management, and optimization as the number of network cameras continues to rise. However, manual object tracking and matching across multiple non-overlapping cameras pose significant challenges in city-scale urban traffic scenarios. These challenges include handling diverse vehicle attributes, occlusions, illumination variations, shadows, and varying video resolutions. To address these issues, we propose an efficient and cost-effective deep learning-based framework for Multi-Object Multi-Camera Tracking (MO-MCT). The proposed framework utilizes Mask R-CNN for object detection and employs Non-Maximum Suppression (NMS) to select target objects from overlapping detections. Transfer learning is employed for re-identification, enabling the association and generation of vehicle tracklets across multiple cameras. Moreover, we leverage appropriate loss functions and distance measures to handle occlusion, illumination, and shadow challenges. The final solution identification module performs feature extraction using ResNet-152 coupled with Deep SORT based vehicle tracking. The proposed framework is evaluated on the 5th AI City Challenge dataset (Track 3), comprising 46 camera feeds. Among these 46 camera streams, 40 are used for model training and validation, while the remaining six are utilized for model testing. The proposed framework achieves competitive performance with an IDF1 score of 0.8289, and precision and recall scores of 0.9026 and 0.8527 respectively, demonstrating its effectiveness in robust and accurate vehicle tracking.
Similar content being viewed by others
Availability of Data and Materials
The dataset for 5th AI City Challenge, Track-3 is used for these experiments and is available for use after registration at the link: https://www.aicitychallenge.org/2021-data-and-evaluation/
Code Availability
The code and trained models can be obtained from the project repository: https://github.com/imranzaman5202/MO-MCT
References
Ahmed, N., Asif, H.M.S.: Ensembling convolutional neural networks for perceptual image quality assessment. In: 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS), pp. 1–5 (2019). IEEE
Ahmed, N., Asif, H.M.S.: Perceptual quality assessment of digital images using deep features. Computing & Informatics 39(3) (2020)
Ahmed, N., Shahzad Asif, H., Bhatti, A.R., Khan, A.: Deep ensembling for perceptual image quality assessment. Soft Computing, 1–22 (2022)
Ahmed N, Asif HMS, Khalid H (2021) Piqi: perceptual image quality index based on ensemble of gaussian process regression. Multimedia Tools and Applications 80(10):15677–15700
Benali Amjoud, A., Amrouch, M.: Convolutional neural networks backbones for object detection. In: International Conference on Image and Signal Processing, pp. 282–289 (2020). Springer
Efficient anomaly recognition using surveillance videos
Gou, M., Karanam, S., Liu, W., Camps, O., Radke, R.J.: Dukemtmc4reid: A large-scale multi-camera person re-identification dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 10–19 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, Y., Han, J., Yu, W., Hong, X., Wei, X., Gong, Y.: City-scale multi-camera vehicle tracking by semantic attribute parsing and cross-camera tracklet matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 576–577 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hosang, J., Benenson, R., Schiele, B.: Learning non-maximum suppression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4507–4515 (2017)
Kim, S.-W., Kook, H.-K., Sun, J.-Y., Kang, M.-C., Ko, S.-J.: Parallel feature pyramid network for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 234–250 (2018)
Kohl, P., Specker, A., Schumann, A., Beyerer, J.: The mta dataset for multi-target multi-camera pedestrian tracking by weighted distance aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1042–1043 (2020)
Kulkarni, P., Mohan, S., Rogers, S., Tabkhi, H.: Key-track: A lightweight scalable lstm-based pedestrian tracker for surveillance systems. In: International Conference on Image Analysis and Recognition, pp. 208–219 (2019). Springer
Kumar, R., Charpiat, G., Thonnat, M.: Multiple object tracking by efficient graph partitioning. In: Asian Conference on Computer Vision, pp. 445–460 (2014). Springer
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: Cosypose: Consistent multi-view multi-object 6d pose estimation. In: European Conference on Computer Vision, pp. 574–591 (2020). Springer
Li, P., Li, G., Yan, Z., Li, Y., Lu, M., Xu, P., Gu, Y., Bai, B., Zhang, Y., Chuxing, D.: Spatio-temporal consistency and hierarchical matching for multi-target multi-camera vehicle tracking. In: CVPR Workshops, pp. 222–230 (2019)
Li, P., Zhang, J., Zhu, Z., Li, Y., Jiang, L., Huang, G.: State-aware re-identification feature for multi-target multi-camera tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016). Springer
Liu, J., Jiang, N., Zhou, Z., Xu, Y.: Person re-identification with joint-loss. In: 2017 International Conference on Virtual Reality and Visualization (ICVRV), pp. 1–6 (2017). IEEE
Liu, C., Zhang, Y., Luo, H., Tang, J., Chen, W., Xu, X., Wang, F., Li, H., Shen, Y.-D.: City-scale multi-camera vehicle tracking guided by crossroad zones. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4129–4137 (2021)
Lv Z, Lou R, Singh AK (2020) Ai empowered communication systems for intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems 22(7):4579–4587
Ma, C., Li, Y., Yang, F., Zhang, Z., Zhuang, Y., Jia, H., Xie, X.: Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 253–261 (2019)
Martinel N, Foresti GL, Micheloni C (2020) Deep pyramidal pooling with attention for person re-identification. IEEE Transactions on Image Processing 29:7306–7316
Naphade, M., Tang, Z., Chang, M.-C., Anastasiu, D.C., Sharma, A., Chellappa, R., Wang, S., Chakraborty, P., Huang, T., Hwang, J.-N., et al. The 2019 ai city challenge. In: CVPR Workshops, vol. 8, p. 2 (2019)
Naphade, M., Wang, S., Anastasiu, D.C., Tang, Z., Chang, M.-C., Yang, X., Yao, Y., Zheng, L., Chakraborty, P., Lopez, C.E., et al. The 5th ai city challenge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4263–4273 (2021)
Ning X, Tian W, Yu Z, Li W, Bai X, Wang Y (2022) Hcfnn: high-order coverage function neural network for image classification. Pattern Recognition 131:108873
Peri, N., Khorramshahi, P., Rambhatla, S.S., Shenoy, V., Rawat, S., Chen, J.-C., Chellappa, R.: Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 622–623 (2020)
Qian, X., Fu, Y., Jiang, Y.-G., Xiang, T., Xue, X.: Multi-scale deep learning architectures for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5399–5408 (2017)
Qiu Z, Zhao N, Zhou L, Wang M, Yang L, Fang H, He Y, Liu Y (2020) Vision-based moving obstacle detection and tracking in paddy field using improved yolov3 and deep sort. Sensors 20(15):4082
Ren, P., Lu, K., Yang, Y., Yang, Y., Sun, G., Wang, W., Wang, G., Cao, J., Zhao, Z., Liu, W.: Multi-camera vehicle tracking system based on spatial-temporal filtering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4213–4219 (2021)
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision, pp. 17–35 (2016). Springer
Saleem, G., Bajwa, U.I., Raza, R.H.: Surveilia: Anomaly identification using temporally localized surveillance videos. Available at SSRN 4308311
Saleem M, Abbas S, Ghazal TM, Khan MA, Sahawneh N, Ahmad M (2022) Smart cities: Fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques. Egyptian Informatics Journal 23(3):417–426
Saleem G, Bajwa UI, Raza RH (2023) Toward human activity recognition: a survey. Neural Computing and Applications 35(5):4145–4182
Schofield, K., Lynam, N.R.: Vehicle blind spot detection display system. Google Patents. US Patent 5,786,772 (1998)
Sharma A, Anand S, Kaul SK (2020) Intelligent querying for target tracking in camera networks using deep q-learning with n-step bootstrapping. Image and Vision Computing 103:104022
Shim, K., Yoon, S., Ko, K., Kim, C.: Multi-target multi-camera vehicle tracking for city-scale traffic management. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4193–4200 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Specker, A., Stadler, D., Florin, L., Beyerer, J.: An occlusion-aware multi-target multi-camera tracking system. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4173–4182 (2021)
Sun, H., Chen, Z., Yan, S., Xu, L.: Mvp matching: A maximum-value perfect matching for mining hard samples, with application to person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6737–6747 (2019)
Sun S, Akhtar N, Song H, Mian A, Shah M (2019) Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence 43(1):104–119
Tan, L., Dong, X., Ma, Y., Yu, C.: A multiple object tracking algorithm based on yolo detection. In: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–5 (2018). IEEE
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019). PMLR
Tan, X., Wang, Z., Jiang, M., Yang, X., Wang, J., Gao, Y., Su, X., Ye, X., Yuan, Y., He, D., et al. Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features. In: CVPR Workshops, pp. 275–284 (2019)
Tang, S., Andres, B., Andriluka, M., Schiele, B.: Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041 (2015)
Tesfaye YT, Zemene E, Prati A, Pelillo M, Shah M (2019) Multi-target tracking in multiple non-overlapping cameras using fast-constrained dominant sets. International Journal of Computer Vision 127(9):1303–1320
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A., Leibe, B.: Mots: Multi-object tracking and segmentation. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 7942–7951 (2019)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: European Conference on Computer Vision, pp. 107–122 (2020). Springer
Wang Q, Cao L, Xia J, Zhang Y et al (2020) Mtcnn-kcf-deepsort: Driver face detection and tracking algorithm based on cascaded kernel correlation filtering and deep sort. Technical report, SAE Technical Paper
Wang Q, Liu Y, Xiong Z, Yuan Y (2022) Hybrid feature aligned network for salient object detection in optical remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 60:1–15
Wang C, Wang X, Zhang J, Zhang L, Bai X, Ning X, Zhou J, Hancock E (2022) Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recognition 124:108498
Wang C, Ning X, Sun L, Zhang L, Li W, Bai X (2022) Learning discriminative features by covering local geometric space for point cloud analysis. IEEE Transactions on Geoscience and Remote Sensing 60:1–15
Wen L, Du D, Cai Z, Lei Z, Chang M-C, Qi H, Lim J, Yang M-H, Lyu S (2020) Ua-detrac: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding 193:102907
Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Wu, M., Qian, Y., Wang, C., Yang, M.: A multi-camera vehicle tracking system based on city-scale vehicle re-id and spatial-temporal information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4077–4086 (2021)
Yang, K.-S., Chen, Y.-K., Chen, T.-S., Liu, C.-T., Chien, S.-Y.: Tracklet-refined multi-camera tracking based on balanced cross-domain re-identification for vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3983–3992 (2021)
Ye, J., Yang, X., Kang, S., He, Y., Zhang, W., Huang, L., Jiang, M., Zhang, W., Shi, Y., Xia, M., et al. A robust mtmc tracking system for ai-city challenge 2021. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4044–4053 (2021)
Yoon K, Gwak J, Song Y-M, Yoon Y-C, Jeon M-G (2020) Oneshotda: Online multi-object tracker with one-shot-learning-based data association. IEEE Access 8:38060–38072
Yuan Y, Xiong Z, Wang Q (2019) Vssa-net: Vertical spatial sequence attention network for traffic sign detection. IEEE transactions on image processing 28(7):3423–3434
Zhang, X., Izquierdo, E.: Real-time multi-target multi-camera tracking with spatial-temporal information. In: 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2019). IEEE
Zhang K, Sun M, Han TX, Yuan X, Guo L, Liu T (2017) Residual networks of residual networks: Multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28(6):1303–1314
Zhang Y, Sheng H, Wu Y, Wang S, Lyu W, Ke W, Xiong Z (2020) Long-term tracking with deep tracklet association. IEEE Transactions on Image Processing 29:6694–6706
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)
Acknowledgements
Acknowledgments are not compulsory. Where included they should be brief. Grant or contribution numbers may be acknowledged. Please refer to Journal-level guidance for any specific requirements.
Funding
This study acknowledges partial support from the National Center of Big Data and Cloud Computing (NCBC) and HEC of Pakistan for conducting this research.
Author information
Authors and Affiliations
Contributions
Zaman: conception, implementation, writeup and revision; Bajwa: conception, supervision and revision; Saleem: implementation, writeup and revision; Raza conception, supervision and revision.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zaman, M.I., Bajwa, U.I., Saleem, G. et al. A robust deep networks based multi-object multi-camera tracking system for city scale traffic. Multimed Tools Appl 83, 17163–17181 (2024). https://doi.org/10.1007/s11042-023-16243-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16243-7