Loci-Segmented: Improving Scene Segmentation Learning

Traub, Manuel; Becker, Frederic; Sauter, Adrian; Otte, Sebsastian; Butz, Martin V.

doi:10.1007/978-3-031-72338-4_4

Manuel Traub¹¹,
Frederic Becker¹¹,
Adrian Sauter¹¹,
Sebsastian Otte^11,12 &
…
Martin V. Butz¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15018))

Included in the following conference series:

International Conference on Artificial Neural Networks

414 Accesses

Abstract

Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system’s well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
Google Scholar
Butz, M.V., Achimova, A., Bilkey, D., Knott, A.: Event-predictive cognition: a root for conceptual human thought. Top. Cogn. Sci. 13, 10–24 (2021). https://doi.org/10.1111/tops.12522
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Elsayed, G., Mahendran, A., van Steenkiste, S., Greff, K., Mozer, M.C., Kipf, T.: Savi++: towards end-to-end object-centric learning from real-world videos. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 28940–28954. Curran Associates, Inc
Google Scholar
Greff, K., et al.: Kubric: A scalable dataset generator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3749–3761 (2022)
Google Scholar
Greff, K., Van Steenkiste, S., Schmidhuber, J.: On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208 (2020)
Gumbsch, C., Butz, M.V., Martius, G.: Sparsely changing latent states for prediction and planning in partially observable domains. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 17518–17531. Curran Associates, Inc. (2021), https://arxiv.org/abs/2110.15949
Ha, D., Schmidhuber, J.: World Models (2018). https://doi.org/10.5281/zenodo.1207631
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to Control: Learning Behaviors by Latent Imagination (Mar 2020). https://doi.org/10.48550/arXiv.1912.01603
Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: learning latent dynamics for planning from pixels. In: Proceedings of the 36th International Conference on Machine Learning, pp. 2555–2565. PMLR (May 2019), iSSN: 2640-3498
Google Scholar
Heald, J.B., Lengyel, M., Wolpert, D.M.: Contextual inference in learning and memory. Trends Cognitive Sci. 27(1), 43–64 (2023). https://doi.org/10.1016/j.tics.2022.10.004, https://www.sciencedirect.com/science/article/pii/S1364661322002650
Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME–J. Basic Eng. 82(Series D), 35–45 (1960)
Google Scholar
Kipf, T., et al.: Conditional object-centric learning from video. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=aD7uesX1GF_
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar
Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., Kipf, T.: Object-centric learning with slot attention. Adv. Neural. Inf. Process. Syst. 33, 11525–11538 (2020)
Google Scholar
Mattar, M.G., Lengyel, M.: Planning in the brain. Neuron 110(6), 914–934 (2022). https://doi.org/10.1016/j.neuron.2021.12.018, https://www.sciencedirect.com/science/article/pii/0896627321010357
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Schwöbel, S., Marković, D., Smolka, M.N., Kiebel, S.J.: Balancing control: A bayesian interpretation of habitual and goal-directed behavior. J. Math. Psychol. 100, 102472 (2021). https://doi.org/10.1016/j.jmp.2020.102472, https://www.sciencedirect.com/science/article/pii/S0022249620301000
Traub, M., Becker, F., Otte, S., Butz, M.V.: Looping loci: Developing object permanence from videos. arXiv preprint arXiv:2310.10372 (2023)
Traub, M., Otte, S., Menge, T., Karlbauer, M., Thuemmel, J., Butz, M.V.: Learning what and where: Disentangling location and identity tracking without supervision. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=NeDc-Ak-H_
Vaswani, A., et al.: Attention is all you need. Adv. Neural inform. Process. syst. 30 (2017)
Google Scholar
Wu, Z., Dvornik, N., Greff, K., Kipf, T., Garg, A.: Slotformer: unsupervised visual dynamics simulation with object-centric models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=TFbwV6I0VLg
Yuan, J., Chen, T., Li, B., Xue, X.: Compositional scene representation learning via reconstruction: A survey (2023)
Google Scholar

Download references

Acknowledgments

This work received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645 as well as from the Cyber Valley in Tübingen, CyVy-RF-2020-15. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Manuel Traub and Frederic Becker, and the Alexander von Humboldt Foundation for supporting Martin Butz and Sebastian Otte.

Author information

Authors and Affiliations

Cognitive Modeling, Department of Computer Science and Department of Psychology, University of Tübingen, Sand 14, 72076, Tübingen, Germany
Manuel Traub, Frederic Becker, Adrian Sauter, Sebsastian Otte & Martin V. Butz
Adaptive AI Lab, Institute for Robotics and Cognitive Systems, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Sebsastian Otte

Authors

Manuel Traub
View author publications
You can also search for this author in PubMed Google Scholar
Frederic Becker
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Sauter
View author publications
You can also search for this author in PubMed Google Scholar
Sebsastian Otte
View author publications
You can also search for this author in PubMed Google Scholar
Martin V. Butz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Traub .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Traub, M., Becker, F., Sauter, A., Otte, S., Butz, M.V. (2024). Loci-Segmented: Improving Scene Segmentation Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15018. Springer, Cham. https://doi.org/10.1007/978-3-031-72338-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72338-4_4
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72337-7
Online ISBN: 978-3-031-72338-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics