Skip to main content

Deep Learning Data Association Applied to Multi-object Tracking Systems

  • Conference paper
  • First Online:
Computer Aided Systems Theory – EUROCAST 2022 (EUROCAST 2022)

Abstract

In this work, a robust and reliable multi-object tracking (MOT) system for autonomous vehicles is presented. In crowded urban road scenes accurate data association between tracked objects and incoming new detections is crucial. To achieve that, a combination of deep learning techniques and a square-root unscented Kalman filter is used. The system follows a tracking-by-detection paradigm and the new deep learning architecture presented is based on Siamese and convolutional LSTM networks. The effectiveness of the proposed system has been tested using the Argoverse dataset.

Supported by organization Universidad Carlos III de Madrid.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016). https://doi.org/10.1109/ICIP.2016.7533003

  2. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a "siamese" time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS 1993, pp. 737–744. , Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  3. Chang, M.F., et alJ.: Argoverse: 3d tracking and forecasting with rich maps. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8740–8749 (2019). https://doi.org/10.1109/CVPR.2019.00895

  4. Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3029–3037 (2015). https://doi.org/10.1109/ICCV.2015.347

  5. Dey, S., Dutta, A., Toledo, J.I., Ghosh, S.K., Lladós, J., Pal, U.: Signet: Convolutional siamese network for writer independent offline signature verification. CoRR abs/ arXiv: 1707.02131 (2017)

  6. Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475 (2018). https://doi.org/10.1109/WACV.2018.00057

  7. Fiaz, M., Mahmood, A., Jung, S.K.: Tracking noisy targets: A review of recent object tracking approaches (2018)

    Google Scholar 

  8. Fiaz, M., Mahmood, A., Jung, S.K.: Convolutional neural network with structural input for visual object tracking. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 2019, pp. 1345–1352. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3297280.3297416

  9. Gómez-Silva, M.J.: Deep multi-shot network for modelling appearance similarity in multi-person tracking applications. Multimedia Tools Appli. 80(15), 23701–23721 (2021). https://doi.org/10.1007/s11042-020-10256-2

    Article  Google Scholar 

  10. Gómez-Silva, M.J., de la Escalera, A., Armingol, J.M.: Deep learning of appearance affinity for multi-object tracking and re-identification: A comparative view. Electronics 9(11) (2020)

    Google Scholar 

  11. Gómez-Silva, M.J., Armingol, J.M., de la Escalera, A.: Deep part features learning by a normalised double-margin-based contrastive loss function for person re-identification. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) (6: VISAPP), pp. 277–285 (2017)

    Google Scholar 

  12. Yang, H., Qu, S., Zheng, Z.: Visual tracking via online discriminative multiple instance metric learning. Multimedia Tools Appli. 77(4), 4113–4131 (2017). https://doi.org/10.1007/s11042-017-4498-z

    Article  Google Scholar 

  13. Islam, M.Z., Islam, M.M., Asraf, A.: A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images. Inf. Med. Unlocked 20, 100412 (2020). https://doi.org/10.1016/j.imu.2020.100412, https://www.sciencedirect.com/science/article/pii/S2352914820305621

  14. Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Kadar, I. (ed.) Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, pp. 182–193. International Society for Optics and Photonics, SPIE (1997). https://doi.org/10.1117/12.280797

  15. Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 82(1), 35–45 (1960). https://doi.org/10.1115/1.3662552

    Article  MathSciNet  Google Scholar 

  16. Leal-Taixé, L., Canton-Ferrer, C., Schindler, K.: Learning by tracking: Siamese cnn for robust target association. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 418–425 (2016)

    Google Scholar 

  17. Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 152–159 (2014). https://doi.org/10.1109/CVPR.2014.27

  18. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Robust visual tracking via hierarchical convolutional features. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2709–2723 (2019). https://doi.org/10.1109/TPAMI.2018.2865311

    Article  Google Scholar 

  19. Maggiolo, M., Spanakis, G.: Autoregressive convolutional recurrent neural network for univariate and multivariate time series prediction. ArXiv abs/ arXiv: 1903.02540 (2019)

  20. Van der Merwe, R., Wan, E.: The square-root unscented kalman filter for state and parameter-estimation. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, pp. 3461–3464 (2001)

    Google Scholar 

  21. Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 4225–4232. AAAI Press (2017)

    Google Scholar 

  22. Munkres, J.: Algorithms for the assignment and transportation problems. J. Society Indust. Appli. Math. 5(1), 32–38 (1957). http://www.jstor.org/stable/2098689

  23. Nam, H., Baek, M., Han, B.: Modeling and propagating cnns in a tree structure for visual tracking (2016)

    Google Scholar 

  24. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016). https://doi.org/10.1109/CVPR.2016.465

  25. Pang, S., Radha, H.: Multi-object tracking using poisson multi-bernoulli mixture filtering for autonomous vehicles. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2021, pp. 7963–7967 (2021)

    Google Scholar 

  26. Possegger, H., Mauthner, T., Roth, P.M., Bischof, H.: Occlusion geodesics for online multi-object tracking. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1306–1313 (2014). https://doi.org/10.1109/CVPR.2014.170

  27. Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (October 2017)

    Google Scholar 

  28. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR.2015.7298682

  29. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.k., Woo, W.c.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  30. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017). https://doi.org/10.1109/ICIP.2017.8296962

  31. Xiang, J., Zhang, G., Hou, J.: Online multi-object tracking based on feature representation and bayesian filtering within a deep learning architecture. IEEE Access 7, 27923–27935 (2019). https://doi.org/10.1109/ACCESS.2019.2901520

    Article  Google Scholar 

  32. Yang, R., et al.: Cnn-lstm deep learning architecture for computer vision-based modal frequency detection. Mech. Syst. Signal Process. 144, 106885 (2020). https://doi.org/10.1016/j.ymssp.2020.106885, https://www.sciencedirect.com/science/article/pii/S0888327020302715

  33. Ye, L., Liu, Z., Wang, Y.: Dual convolutional lstm network for referring image segmentation. IEEE Trans. Multimedia 22(12), 3224–3235 (2020). https://doi.org/10.1109/TMM.2020.2971171

    Article  Google Scholar 

  34. Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 34–39 (2014). https://doi.org/10.1109/ICPR.2014.16

Download references

Acknowledgments

Grant PID2019-104793RB-C31 and RTI2018-096036-B-C21 funded by MCIN/AEI/10.13039/501100011033 and SEGVAUTO-4.0-CM (P2018/EMT-4362) funded by the Comunidad de Madrid.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Urdiales .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Urdiales, J., Martín, D., Armingol, J.M. (2022). Deep Learning Data Association Applied to Multi-object Tracking Systems. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25312-6_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25311-9

  • Online ISBN: 978-3-031-25312-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics