Skip to main content

The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

  • Conference paper
  • First Online:
Artificial General Intelligence (AGI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13921))

Included in the following conference series:

Abstract

If A and B are sets such that \(A \subset B\), generalisation may be understood as the inference from A of a hypothesis sufficient to construct B. One might infer any number of hypotheses from A, yet only some of those may generalise to B. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a “proxy for intelligence”). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between 1.1 and 5 times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind’s Apperception Engine is able to generalise effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This proof is conditional upon certain assumptions regarding the nature of cognition as enactive, and a formalism thereof.

  2. 2.

    Assuming tasks are uniformly distributed, and weakness is well defined.

  3. 3.

    An example of how one might translate propositional logic into this representation is given at the end of this paper. It is worth noting that this representation of logical formulae addresses the symbol grounding problem [12], and was specifically constructed to address subjective performance claims in the context of AIXI [13].

  4. 4.

    Each state is just reality from the perspective of a point along one or more dimensions. States of reality must be separated by something, or there would be only one state of reality. For example two different states of reality may be reality from the perspective of two different points in time, or in space and so on.

  5. 5.

    Statements are the logical formulae about which we will reason.

  6. 6.

    e.g. \(Z_s\) is the extension of s.

  7. 7.

    For example, we might represent chess as a supervised learning problem where \(s \in S_\alpha \) is the state of a chessboard, \(z \in Z_s\) is a sequence of moves by two players that begins in s, and \(d \in D_\alpha \cap Z_s\) is such a sequence of moves that terminates in victory for one player in particular (the one undertaking the task).

  8. 8.

    For example we might use weakness multiplied by a constant to the same effect.

  9. 9.

    \(\frac{2^{|Z_\textbf{h} |}}{2^{|L_\mathfrak {v} |}}\) is maximised when \(\textbf{h} = \emptyset \), because the optimal hypothesis given no information is to assume nothing (you’ve no sequence to predict, so why make assertions that might contradict the environment?).

  10. 10.

    Two statements a and b are mutually exclusive if \(a \not \in Z_b\) and \(b \not \in Z_a\), which we’ll write as \(\mu (a,b)\). Given \(x \in L_\mathfrak {v}\), the set of all mutually exclusive statements is a set \(K_x \subset L_\mathfrak {v}\) such that \(x \in K_x\) and \(\forall a, b \in K_x : \mu (a,b)\). It follows that \(\forall x \in L_\mathfrak {v}, \underset{b \in K_x}{\sum }\ p(b) = 1\).

  11. 11.

    We acknowledge that some may object to the term universal, because \(\mathfrak {v}\) is finite.

  12. 12.

    We do not know which possibilities will eventuate. A less specific statement contradicts fewer possibilities. Of all hypotheses sufficient to explain what we perceive, the least specific is most likely.

References

  1. Bennett, M.T.: Technical Appendices. Version 1.2.1 (2023). https://doi.org/10.5281/zenodo.7641742. https://github.com/ViscousLemming/Technical-Appendices

  2. Sober, E.: Ockham’s Razors: A User’s Manual. Cambridge University Press (2015)

    Google Scholar 

  3. Rissanen, J.: Modeling by shortest data description*. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  4. Chollet, F.: On the Measure of Intelligence (2019)

    Google Scholar 

  5. Chaitin, G.: The limits of reason. Sci. Am. 294(3), 74–81 (2006)

    Article  Google Scholar 

  6. Solomonoff, R.: A formal theory of inductive inference. Part I. Inf. Control 7(1), 1–22 (1964)

    Google Scholar 

  7. Solomonoff, R.: A formal theory of inductive inference. Part II. Inf. Control 7(2), 224–254 (1964)

    Google Scholar 

  8. Kolmogorov, A.: On tables of random numbers. Sankhya: Indian J. Stati. A 369–376 (1963)

    Google Scholar 

  9. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, Heidelberg (2010)

    MATH  Google Scholar 

  10. Bennett, M.T.: Symbol emergence and the solutions to any task. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 30–40. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_4

    Chapter  Google Scholar 

  11. Ward, D., Silverman, D., Villalobos, M.: Introduction: the varieties of enactivism. Topoi 36(3), 365–375 (2017). https://doi.org/10.1007/s11245-017-9484-6

    Article  Google Scholar 

  12. Harnad, S.: The symbol grounding problem. Physica D: Nonlinear Phenomena 42(1), 335–346 (1990)

    Article  Google Scholar 

  13. Leike, J., Hutter, M.: Bad universal priors and notions of optimality. In: Proceedings of the 28th COLT, PMLR, pp. 1244–1259 (2015)

    Google Scholar 

  14. Gupta, A.: Definitions. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Winter 2021. Stanford University (2021)

    Google Scholar 

  15. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS. Curran Association Inc., USA (2019)

    Google Scholar 

  16. Kirk, D.: NVIDIA Cuda Software and GPU parallel computing architecture. In: ISMM 2007, Canada, pp. 103–104. ACM (2007)

    Google Scholar 

  17. Meurer, A., et al.: SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017). https://doi.org/10.7717/peerj-cs.103

  18. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)

    Article  Google Scholar 

  19. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010)

    Article  MathSciNet  Google Scholar 

  20. Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. LNCS, vol. 7070, pp. 236–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44958-1_18

    Chapter  Google Scholar 

  21. Evans, R.: Kant’s cognitive architecture. Ph.D. thesis. Imperial (2020)

    Google Scholar 

  22. Evans, R., Sergot, M., Stephenson, A.: Formalizing Kant’s rules. J. Philos. Logic 49, 613–680 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  23. Evans, R., et al.: Making sense of raw input. Artif. Intell. 299 (2021)

    Google Scholar 

  24. Bennett, M.T.: Compression, the fermi paradox and artificial super-intelligence. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 41–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_5

    Chapter  Google Scholar 

  25. Delétang, G., et al.: Neural Networks and the Chomsky Hierarchy (2022)

    Google Scholar 

  26. Power, A., et al.: Grokking: generalization beyond overfitting on small algorithmic datasets. In: ICLR (2022)

    Google Scholar 

Download references

Acknowledgement

Appendices available on GitHub [1], supported by JST (JPMJMS2033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Timothy Bennett .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bennett, M.T. (2023). The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest. In: Hammer, P., Alirezaie, M., Strannegård, C. (eds) Artificial General Intelligence. AGI 2023. Lecture Notes in Computer Science(), vol 13921. Springer, Cham. https://doi.org/10.1007/978-3-031-33469-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33469-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33468-9

  • Online ISBN: 978-3-031-33469-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics