Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration

Bal, Aditi Basu; Mounir, Ramy; Aakur, Sathyanarayanan; Sarkar, Sudeep; Srivastava, Anuj

doi:10.1007/978-3-031-19833-5_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13695))

Included in the following conference series:

European Conference on Computer Vision

1877 Accesses
1 Citations

Abstract

Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data. The main challenge here is to find and preserve the registration of nodes (salient detected objects) across time frames when the data has noise and clutter due to false and missing nodes. First, we introduce a quotient-space representation of graphs that incorporates temporal registration of nodes, and we use that metric structure to impose a dynamical model on graph evolution. Then, we derive a Kalman smoother, adapted to the quotient space geometry, to estimate dense, smooth trajectories of graphs. We demonstrate this framework using simulated data and actual video graphs extracted from the Multiview Extended Video with Activities (MEVA) dataset. This framework successfully estimates graphs despite the noise, clutter, and missed detections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aakur, S., de Souza, F.D., Sarkar, S.: Going deeper with semantics: video activity interpretation using semantic contextualization. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 190–199. IEEE (2019)
Google Scholar
Aakur, S.N., de Souza, F.D.M., Sarkar, S.: Generating open world descriptions of video using common sense knowledge in a pattern theory framework. Q. Appl. Math. 77, 323–356 (2019)
Article MathSciNet MATH Google Scholar
Adeli, V., et al.: TRiPOD: human trajectory and pose dynamics forecasting in the wild. CoRR abs/2104.04029 (2021). https://arxiv.org/abs/2104.04029
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. CoRR abs/1912.07515 (2019). http://arxiv.org/abs/1912.07515
Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)
Article Google Scholar
Calissano, A., Feragen, A., Vantini, S.: Populations of unlabeled networks: graph space geometry and geodesic principal components (2020)
Google Scholar
Cao, D., et al.: Spectral temporal graph neural network for multivariate time-series forecasting. In: Advances in Neural Information Processing Systems 33, pp. 17766–17778 (2020)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8, 6085 (2018)
Article Google Scholar
Chen, F., Chen, Z., Biswas, S., Lei, S., Ramakrishnan, N., Lu, C.T.: Graph convolutional networks with kalman filtering for traffic prediction. In: 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2020) (2020)
Google Scholar
Cheng, D., Yang, F., Xiang, S., Liu, J.: Financial time series forecasting with multi-modality graph neural network. Pattern Recogn. 121, 108218 (2022)
Google Scholar
Corona, K., Osterdahl, K., Collins, R., Hoogs, A.: MEVA: a large-scale multiview, multimodal video dataset for activity detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1060–1068, January 2021
Google Scholar
Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 18(4), 377–388 (1996)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Guo, X., Bal, A.B., Needham, T., Srivastava, A.: Statistical shape analysis of brain arterial networks (BAN). Ann. Appl. Stat. 16(2), 1130–1150 (2022)
Article MathSciNet MATH Google Scholar
Guo, X., Srivastava, A., Sarkar, S.: A quotient space formulation for statistical analysis of graphical data. J. Math. Imaging Vis. 63, 735–752 (2021)
Article MathSciNet MATH Google Scholar
Haykin, S.: Kalman Filtering and Neural Networks, vol. 47. Wiley, Hoboken (2004)
Google Scholar
Hewamalage, H., Bergmeir, C., Bandara, K.: Recurrent neural networks for time series forecasting: current status and future directions. Int. J. Forecast. 37(1), 388–427 (2021)
Article Google Scholar
Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6272–6281 (2019)
Google Scholar
Ivanovic, B., Pavone, M.: The trajectron: probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2375–2384 (2019)
Google Scholar
Jain, B.J.: On the geometry of graph spaces. Discrete App. Math. 214, 126–144 (2016)
Article MathSciNet MATH Google Scholar
Jain, B.J.: Statistical graph space analysis. Pattern Recogn. 60, 802–812 (2016)
Article MATH Google Scholar
Ji, J., Krishna, R., Fei-Fei, L., Niebles, J.C.: Action genome: actions as compositions of spatio-temporal scene graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10236–10247 (2020)
Google Scholar
Knyazev, A., Malyshev, A.: Accelerated graph-based nonlinear denoising filters. Procedia Comput. Sci. 80, 607–616 (2016)
Article Google Scholar
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, S.H., Savarese, S.: Social-BiGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. arXiv preprint arXiv:1907.03395 (2019)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Li, J., Gao, X., Jiang, T.: Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2020
Google Scholar
Liu, H., Singh, P.: ConceptNet-a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)
Article Google Scholar
Lu, X., Wang, W., Danelljan, M., Zhou, T., Shen, J., Gool, L.V.: Video object segmentation with episodic graph memory networks. CoRR abs/2007.07020 (2020). https://arxiv.org/abs/2007.07020
Lyzinski, V., Fishkind, D.E., Fiori, M., Vogelstein, J.T., Priebe, C.E., Sapiro, G.: Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 60–73 (2016)
Article Google Scholar
Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14424–14432 (2020)
Google Scholar
Paaßen, B., Göpfert, C., Hammer, B.: Time series prediction for graphs in kernel and dissimilarity spaces. Neural Process. Lett. 48(2), 669–689 (2018)
Article Google Scholar
Rudi, A., Ciliberto, C., Marconi, G., Rosasco, L.: Manifold structured prediction. In: Advances in Neural Information Processing Systems 31 (2018)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XVIII. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Chapter Google Scholar
Shi, L.: Kalman filtering over graphs: theory and applications. IEEE Trans. Autom. Control 54(9), 2230–2234 (2009)
Article MathSciNet MATH Google Scholar
Song, C., Lin, Y., Guo, S., Wan, H.: Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 914–921 (2020)
Google Scholar
Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Thirty-First AAAI conference on artificial intelligence (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR arXiv:1409.3215 (2014)
Tealab, A.: Time series forecasting using artificial neural networks methodologies: a systematic review. Future Comput. Inform. J. 3(2), 334–340 (2018)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 (2017)
Vázquez-Enríquez, M., Alba-Castro, J.L., Docío-Fernández, L., Rodríguez-Banga, E.: Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3462–3471 (2021)
Google Scholar
Vogelstein, J.T., et al.: Fast approximate quadratic programming for graph matching. PLOS One 10(4), e0121002 (2015)
Google Scholar
Wang, C., Gao, D., Qiu, Y., Scherer, S.: Lifelong graph learning. In: 2022 Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Wang, C., Cai, S., Tan, G.: GraphTCN: spatio-temporal interaction modeling for human trajectory prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3450–3459 (2021)
Google Scholar
Wang, W., Lu, X., Shen, J., Crandall, D.J., Shao, L.: Zero-shot video object segmentation via attentive graph neural networks. CoRR abs/2001.06807 (2020). https://arxiv.org/abs/2001.06807
Wang, X., Gupta, A.: Videos as space-time region graphs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 413–431. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_25
Chapter Google Scholar
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715. IEEE (2021)
Google Scholar
Weng, X., Wang, Y., Man, Y., Kitani, K.M.: GNN3DMOT: graph neural network for 3D multi-object tracking with 2D–3D multi-feature learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6499–6508 (2020)
Google Scholar
Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., Zhang, C.: Connecting the dots: multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 753–763 (2020)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: IJCAI (2018)
Google Scholar

Download references

Acknowledgements

This research was supported in part by the US National Science Foundation grants 1955154, IIS 2143150, IIS 1955230, CNS 1513126, and IIS 1956050.

Author information

Authors and Affiliations

Florida State University, Tallahassee, FL, 32309, USA
Aditi Basu Bal & Anuj Srivastava
University of South Florida, Tampa, FL, 33620, USA
Ramy Mounir & Sudeep Sarkar
Oklahoma State University, Stillwater, OK, 74078, USA
Sathyanarayanan Aakur

Authors

Aditi Basu Bal
View author publications
You can also search for this author in PubMed Google Scholar
Ramy Mounir
View author publications
You can also search for this author in PubMed Google Scholar
Sathyanarayanan Aakur
View author publications
You can also search for this author in PubMed Google Scholar
Sudeep Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Anuj Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditi Basu Bal .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 17466 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bal, A.B., Mounir, R., Aakur, S., Sarkar, S., Srivastava, A. (2022). Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13695. Springer, Cham. https://doi.org/10.1007/978-3-031-19833-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-19833-5_26
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19832-8
Online ISBN: 978-3-031-19833-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration