A Closer Look at Generalisation in RAVEN

Spratley, Steven; Ehinger, Krista; Miller, Tim

doi:10.1007/978-3-030-58583-9_36

A Closer Look at Generalisation in RAVEN

Steven Spratley¹²,
Krista Ehinger¹² &
Tim Miller¹²

Conference paper
First Online: 19 November 2020

3536 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12372))

Abstract

Humans have a remarkable capacity to draw parallels between concepts, generalising their experience to new domains. This skill is essential to solving the visual problems featured in the RAVEN and PGM datasets, yet, previous papers have scarcely tested how well models generalise across tasks. Additionally, we encounter a critical issue that allows existing models to inadvertently ‘cheat’ problems in RAVEN. We therefore propose a simple workaround to resolve this issue, and focus the conversation on generalisation performance, as this was severely affected in the process. We revise the existing evaluation, and introduce two relational models, Rel-Base and Rel-AIR, that significantly improve this performance. To our knowledge, Rel-AIR is the first method to employ unsupervised scene decomposition in solving abstract visual reasoning problems, and along with Rel-Base, sets states-of-the-art for image-only reasoning and generalisation across both RAVEN and PGM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Bingham, E., et al.: Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20(1), 973–978 (2018)
Google Scholar
Burgess, C.P., et al.: MONet: unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)
Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)
Google Scholar
Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3412–3420 (2019)
Google Scholar
Engelcke, M., Kosiorek, A.R., Jones, O.P., Posner, I.: Genesis: generative scene inference and sampling with object-centric latent representations. arXiv preprint arXiv:1907.13052 (2019)
Eslami, S.A., Heess, N., Weber, T., Tassa, Y., Szepesvari, D., Hinton, G.E., et al.: Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems, pp. 3225–3233 (2016)
Google Scholar
Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. Am. Psychol. 52(1), 45 (1997)
Article Google Scholar
Greff, K., et al.: Multi-object representation learning with iterative variational inference. arXiv preprint arXiv:1903.00450 (2019)
Hahne, L., Lüddecke, T., Wörgötter, F., Kappel, D.: Attention on abstract visual reasoning. arXiv preprint arXiv:1911.05990 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Higgins, I., et al.: Early visual concept learning with unsupervised deep learning. arXiv preprint arXiv:1606.05579 (2016)
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. ICLR 2(5), 6 (2017)
Google Scholar
Hofstadter, D.R.: Analogy as the core of cognition. The Analogical Mind: Perspect. Cogn. Sci. 499–538 (2001)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Google Scholar
Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning, pp. 2649–2658 (2018)
Google Scholar
Lovett, A., Forbus, K.: Modeling visual problem solving as analogical reasoning. Psychol. Rev. 124(1), 60 (2017)
Article Google Scholar
McCarthy, J., Minsky, M., Rochester, N., Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence (1955). Reprinted online at http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html (2018)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
Google Scholar
Raven, J.: The Raven’s progressive matrices: change and stability over culture and time. Cogn. Psychol. 41(1), 1–48 (2000)
Article MathSciNet Google Scholar
Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T.: Measuring abstract reasoning in neural networks. In: International Conference on Machine Learning, pp. 4477–4486 (2018)
Google Scholar
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Chapter Google Scholar
Stanić, A., Schmidhuber, J.: R-SQAIR: relational sequential attend, infer, repeat. arXiv preprint arXiv:1910.05231 (2019)
Steenbrugge, X., Leroux, S., Verbelen, T., Dhoedt, B.: Improving generalization for abstract reasoning tasks using disentangled feature representations. arXiv preprint arXiv:1811.04784 (2018)
van Steenkiste, S., Locatello, F., Schmidhuber, J., Bachem, O.: Are disentangled representations helpful for abstract visual reasoning? In: Advances in Neural Information Processing Systems, pp. 14222–14235 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, D., Jamnik, M., Lio, P.: Unsupervised and interpretable scene discovery with discrete-attend-infer-repeat. arXiv preprint arXiv:1903.06581 (2019)
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018)
Google Scholar
Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C.: RAVEN: a dataset for relational and analogical visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5317–5327 (2019)
Google Scholar
Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H., Zhu, S.C.: Learning perceptual inference by contrasting. In: Advances in Neural Information Processing Systems, pp. 1073–1085 (2019)
Google Scholar
Zheng, K., Zha, Z.J., Wei, W.: Abstract reasoning with distracting features. In: Advances in Neural Information Processing Systems, pp. 5834–5845 (2019)
Google Scholar
Zhuo, T., Kankanhalli, M.: Solving Raven’s progressive matrices with neural networks. arXiv preprint arXiv:2002.01646 (2020)

Download references

Author information

Authors and Affiliations

School of Computing and Information Systems, The University of Melbourne, Victoria, Australia
Steven Spratley, Krista Ehinger & Tim Miller

Authors

Steven Spratley
View author publications
You can also search for this author in PubMed Google Scholar
Krista Ehinger
View author publications
You can also search for this author in PubMed Google Scholar
Tim Miller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Spratley .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 148 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spratley, S., Ehinger, K., Miller, T. (2020). A Closer Look at Generalisation in RAVEN. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12372. Springer, Cham. https://doi.org/10.1007/978-3-030-58583-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-58583-9_36
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58582-2
Online ISBN: 978-3-030-58583-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics