Abstract
In this work, a robust and reliable multi-object tracking (MOT) system for autonomous vehicles is presented. In crowded urban road scenes accurate data association between tracked objects and incoming new detections is crucial. To achieve that, a combination of deep learning techniques and a square-root unscented Kalman filter is used. The system follows a tracking-by-detection paradigm and the new deep learning architecture presented is based on Siamese and convolutional LSTM networks. The effectiveness of the proposed system has been tested using the Argoverse dataset.
Supported by organization Universidad Carlos III de Madrid.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a "siamese" time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS 1993, pp. 737–744. , Morgan Kaufmann Publishers Inc., San Francisco (1993)
Chang, M.F., et alJ.: Argoverse: 3d tracking and forecasting with rich maps. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8740–8749 (2019). https://doi.org/10.1109/CVPR.2019.00895
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3029–3037 (2015). https://doi.org/10.1109/ICCV.2015.347
Dey, S., Dutta, A., Toledo, J.I., Ghosh, S.K., Lladós, J., Pal, U.: Signet: Convolutional siamese network for writer independent offline signature verification. CoRR abs/ arXiv: 1707.02131 (2017)
Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475 (2018). https://doi.org/10.1109/WACV.2018.00057
Fiaz, M., Mahmood, A., Jung, S.K.: Tracking noisy targets: A review of recent object tracking approaches (2018)
Fiaz, M., Mahmood, A., Jung, S.K.: Convolutional neural network with structural input for visual object tracking. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 2019, pp. 1345–1352. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3297280.3297416
Gómez-Silva, M.J.: Deep multi-shot network for modelling appearance similarity in multi-person tracking applications. Multimedia Tools Appli. 80(15), 23701–23721 (2021). https://doi.org/10.1007/s11042-020-10256-2
Gómez-Silva, M.J., de la Escalera, A., Armingol, J.M.: Deep learning of appearance affinity for multi-object tracking and re-identification: A comparative view. Electronics 9(11) (2020)
Gómez-Silva, M.J., Armingol, J.M., de la Escalera, A.: Deep part features learning by a normalised double-margin-based contrastive loss function for person re-identification. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) (6: VISAPP), pp. 277–285 (2017)
Yang, H., Qu, S., Zheng, Z.: Visual tracking via online discriminative multiple instance metric learning. Multimedia Tools Appli. 77(4), 4113–4131 (2017). https://doi.org/10.1007/s11042-017-4498-z
Islam, M.Z., Islam, M.M., Asraf, A.: A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images. Inf. Med. Unlocked 20, 100412 (2020). https://doi.org/10.1016/j.imu.2020.100412, https://www.sciencedirect.com/science/article/pii/S2352914820305621
Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Kadar, I. (ed.) Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, pp. 182–193. International Society for Optics and Photonics, SPIE (1997). https://doi.org/10.1117/12.280797
Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 82(1), 35–45 (1960). https://doi.org/10.1115/1.3662552
Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese cnn for robust target association. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 418–425 (2016)
Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014). https://doi.org/10.1109/CVPR.2014.27
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Robust visual tracking via hierarchical convolutional features. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2709–2723 (2019). https://doi.org/10.1109/TPAMI.2018.2865311
Maggiolo, M., Spanakis, G.: Autoregressive convolutional recurrent neural network for univariate and multivariate time series prediction. ArXiv abs/ arXiv: 1903.02540 (2019)
Van der Merwe, R., Wan, E.: The square-root unscented kalman filter for state and parameter-estimation. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, pp. 3461–3464 (2001)
Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 4225–4232. AAAI Press (2017)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Society Indust. Appli. Math. 5(1), 32–38 (1957). http://www.jstor.org/stable/2098689
Nam, H., Baek, M., Han, B.: Modeling and propagating cnns in a tree structure for visual tracking (2016)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016). https://doi.org/10.1109/CVPR.2016.465
Pang, S., Radha, H.: Multi-object tracking using poisson multi-bernoulli mixture filtering for autonomous vehicles. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2021, pp. 7963–7967 (2021)
Possegger, H., Mauthner, T., Roth, P.M., Bischof, H.: Occlusion geodesics for online multi-object tracking. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1306–1313 (2014). https://doi.org/10.1109/CVPR.2014.170
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (October 2017)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR.2015.7298682
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.k., Woo, W.c.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (2015)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017). https://doi.org/10.1109/ICIP.2017.8296962
Xiang, J., Zhang, G., Hou, J.: Online multi-object tracking based on feature representation and bayesian filtering within a deep learning architecture. IEEE Access 7, 27923–27935 (2019). https://doi.org/10.1109/ACCESS.2019.2901520
Yang, R., et al.: Cnn-lstm deep learning architecture for computer vision-based modal frequency detection. Mech. Syst. Signal Process. 144, 106885 (2020). https://doi.org/10.1016/j.ymssp.2020.106885, https://www.sciencedirect.com/science/article/pii/S0888327020302715
Ye, L., Liu, Z., Wang, Y.: Dual convolutional lstm network for referring image segmentation. IEEE Trans. Multimedia 22(12), 3224–3235 (2020). https://doi.org/10.1109/TMM.2020.2971171
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 34–39 (2014). https://doi.org/10.1109/ICPR.2014.16
Acknowledgments
Grant PID2019-104793RB-C31 and RTI2018-096036-B-C21 funded by MCIN/AEI/10.13039/501100011033 and SEGVAUTO-4.0-CM (P2018/EMT-4362) funded by the Comunidad de Madrid.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Urdiales, J., Martín, D., Armingol, J.M. (2022). Deep Learning Data Association Applied to Multi-object Tracking Systems. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-25312-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25311-9
Online ISBN: 978-3-031-25312-6
eBook Packages: Computer ScienceComputer Science (R0)