Abstract
Humans have a remarkable capacity to draw parallels between concepts, generalising their experience to new domains. This skill is essential to solving the visual problems featured in the RAVEN and PGM datasets, yet, previous papers have scarcely tested how well models generalise across tasks. Additionally, we encounter a critical issue that allows existing models to inadvertently ‘cheat’ problems in RAVEN. We therefore propose a simple workaround to resolve this issue, and focus the conversation on generalisation performance, as this was severely affected in the process. We revise the existing evaluation, and introduce two relational models, Rel-Base and Rel-AIR, that significantly improve this performance. To our knowledge, Rel-AIR is the first method to employ unsupervised scene decomposition in solving abstract visual reasoning problems, and along with Rel-Base, sets states-of-the-art for image-only reasoning and generalisation across both RAVEN and PGM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bingham, E., et al.: Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20(1), 973–978 (2018)
Burgess, C.P., et al.: MONet: unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)
Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)
Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3412–3420 (2019)
Engelcke, M., Kosiorek, A.R., Jones, O.P., Posner, I.: Genesis: generative scene inference and sampling with object-centric latent representations. arXiv preprint arXiv:1907.13052 (2019)
Eslami, S.A., Heess, N., Weber, T., Tassa, Y., Szepesvari, D., Hinton, G.E., et al.: Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems, pp. 3225–3233 (2016)
Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. Am. Psychol. 52(1), 45 (1997)
Greff, K., et al.: Multi-object representation learning with iterative variational inference. arXiv preprint arXiv:1903.00450 (2019)
Hahne, L., Lüddecke, T., Wörgötter, F., Kappel, D.: Attention on abstract visual reasoning. arXiv preprint arXiv:1911.05990 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Higgins, I., et al.: Early visual concept learning with unsupervised deep learning. arXiv preprint arXiv:1606.05579 (2016)
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. ICLR 2(5), 6 (2017)
Hofstadter, D.R.: Analogy as the core of cognition. The Analogical Mind: Perspect. Cogn. Sci. 499–538 (2001)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Kim, H., Mnih, A.: Disentangling by factorising. In: International Conference on Machine Learning, pp. 2649–2658 (2018)
Lovett, A., Forbus, K.: Modeling visual problem solving as analogical reasoning. Psychol. Rev. 124(1), 60 (2017)
McCarthy, J., Minsky, M., Rochester, N., Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence (1955). Reprinted online at http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html (2018)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
Raven, J.: The Raven’s progressive matrices: change and stability over culture and time. Cogn. Psychol. 41(1), 1–48 (2000)
Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T.: Measuring abstract reasoning in neural networks. In: International Conference on Machine Learning, pp. 4477–4486 (2018)
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Stanić, A., Schmidhuber, J.: R-SQAIR: relational sequential attend, infer, repeat. arXiv preprint arXiv:1910.05231 (2019)
Steenbrugge, X., Leroux, S., Verbelen, T., Dhoedt, B.: Improving generalization for abstract reasoning tasks using disentangled feature representations. arXiv preprint arXiv:1811.04784 (2018)
van Steenkiste, S., Locatello, F., Schmidhuber, J., Bachem, O.: Are disentangled representations helpful for abstract visual reasoning? In: Advances in Neural Information Processing Systems, pp. 14222–14235 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, D., Jamnik, M., Lio, P.: Unsupervised and interpretable scene discovery with discrete-attend-infer-repeat. arXiv preprint arXiv:1903.06581 (2019)
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018)
Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C.: RAVEN: a dataset for relational and analogical visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5317–5327 (2019)
Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H., Zhu, S.C.: Learning perceptual inference by contrasting. In: Advances in Neural Information Processing Systems, pp. 1073–1085 (2019)
Zheng, K., Zha, Z.J., Wei, W.: Abstract reasoning with distracting features. In: Advances in Neural Information Processing Systems, pp. 5834–5845 (2019)
Zhuo, T., Kankanhalli, M.: Solving Raven’s progressive matrices with neural networks. arXiv preprint arXiv:2002.01646 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Spratley, S., Ehinger, K., Miller, T. (2020). A Closer Look at Generalisation in RAVEN. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12372. Springer, Cham. https://doi.org/10.1007/978-3-030-58583-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-58583-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58582-2
Online ISBN: 978-3-030-58583-9
eBook Packages: Computer ScienceComputer Science (R0)