Skip to main content

Loci-Segmented: Improving Scene Segmentation Learning

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15018))

Included in the following conference series:

  • 414 Accesses

Abstract

Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system’s well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)

    Google Scholar 

  2. Butz, M.V., Achimova, A., Bilkey, D., Knott, A.: Event-predictive cognition: a root for conceptual human thought. Top. Cogn. Sci. 13, 10–24 (2021). https://doi.org/10.1111/tops.12522

    Article  Google Scholar 

  3. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Elsayed, G., Mahendran, A., van Steenkiste, S., Greff, K., Mozer, M.C., Kipf, T.: Savi++: towards end-to-end object-centric learning from real-world videos. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 28940–28954. Curran Associates, Inc

    Google Scholar 

  5. Greff, K., et al.: Kubric: A scalable dataset generator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3749–3761 (2022)

    Google Scholar 

  6. Greff, K., Van Steenkiste, S., Schmidhuber, J.: On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208 (2020)

  7. Gumbsch, C., Butz, M.V., Martius, G.: Sparsely changing latent states for prediction and planning in partially observable domains. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 17518–17531. Curran Associates, Inc. (2021), https://arxiv.org/abs/2110.15949

  8. Ha, D., Schmidhuber, J.: World Models (2018). https://doi.org/10.5281/zenodo.1207631

  9. Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to Control: Learning Behaviors by Latent Imagination (Mar 2020). https://doi.org/10.48550/arXiv.1912.01603

  10. Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., Davidson, J.: learning latent dynamics for planning from pixels. In: Proceedings of the 36th International Conference on Machine Learning, pp. 2555–2565. PMLR (May 2019), iSSN: 2640-3498

    Google Scholar 

  11. Heald, J.B., Lengyel, M., Wolpert, D.M.: Contextual inference in learning and memory. Trends Cognitive Sci. 27(1), 43–64 (2023). https://doi.org/10.1016/j.tics.2022.10.004, https://www.sciencedirect.com/science/article/pii/S1364661322002650

  12. Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME–J. Basic Eng. 82(Series D), 35–45 (1960)

    Google Scholar 

  13. Kipf, T., et al.: Conditional object-centric learning from video. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=aD7uesX1GF_

  14. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)

    Google Scholar 

  15. Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., Kipf, T.: Object-centric learning with slot attention. Adv. Neural. Inf. Process. Syst. 33, 11525–11538 (2020)

    Google Scholar 

  16. Mattar, M.G., Lengyel, M.: Planning in the brain. Neuron 110(6), 914–934 (2022). https://doi.org/10.1016/j.neuron.2021.12.018, https://www.sciencedirect.com/science/article/pii/0896627321010357

  17. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  18. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4

    Article  Google Scholar 

  19. Schwöbel, S., Marković, D., Smolka, M.N., Kiebel, S.J.: Balancing control: A bayesian interpretation of habitual and goal-directed behavior. J. Math. Psychol. 100, 102472 (2021). https://doi.org/10.1016/j.jmp.2020.102472, https://www.sciencedirect.com/science/article/pii/S0022249620301000

  20. Traub, M., Becker, F., Otte, S., Butz, M.V.: Looping loci: Developing object permanence from videos. arXiv preprint arXiv:2310.10372 (2023)

  21. Traub, M., Otte, S., Menge, T., Karlbauer, M., Thuemmel, J., Butz, M.V.: Learning what and where: Disentangling location and identity tracking without supervision. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=NeDc-Ak-H_

  22. Vaswani, A., et al.: Attention is all you need. Adv. Neural inform. Process. syst. 30 (2017)

    Google Scholar 

  23. Wu, Z., Dvornik, N., Greff, K., Kipf, T., Garg, A.: Slotformer: unsupervised visual dynamics simulation with object-centric models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=TFbwV6I0VLg

  24. Yuan, J., Chen, T., Li, B., Xue, X.: Compositional scene representation learning via reconstruction: A survey (2023)

    Google Scholar 

Download references

Acknowledgments

This work received funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645 as well as from the Cyber Valley in Tübingen, CyVy-RF-2020-15. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Manuel Traub and Frederic Becker, and the Alexander von Humboldt Foundation for supporting Martin Butz and Sebastian Otte.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Traub .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Traub, M., Becker, F., Sauter, A., Otte, S., Butz, M.V. (2024). Loci-Segmented: Improving Scene Segmentation Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15018. Springer, Cham. https://doi.org/10.1007/978-3-031-72338-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72338-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72337-7

  • Online ISBN: 978-3-031-72338-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics