The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest

Bennett, Michael Timothy

doi:10.1007/978-3-031-33469-6_5

Michael Timothy Bennett ORCID: orcid.org/0000-0001-6895-8782¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13921))

Included in the following conference series:

International Conference on Artificial General Intelligence

1272 Accesses
2 Citations
440 Altmetric

Abstract

If A and B are sets such that $A \subset B$, generalisation may be understood as the inference from A of a hypothesis sufficient to construct B. One might infer any number of hypotheses from A, yet only some of those may generalise to B. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a “proxy for intelligence”). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between 1.1 and 5 times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind’s Apperception Engine is able to generalise effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Is My Neural Net Driven by the MDL Principle?

Learnability can be undecidable

Article 07 January 2019

Why Does Deep and Cheap Learning Work So Well?

Article 21 July 2017

Notes

1.
This proof is conditional upon certain assumptions regarding the nature of cognition as enactive, and a formalism thereof.
2.
Assuming tasks are uniformly distributed, and weakness is well defined.
3.
An example of how one might translate propositional logic into this representation is given at the end of this paper. It is worth noting that this representation of logical formulae addresses the symbol grounding problem [12], and was specifically constructed to address subjective performance claims in the context of AIXI [13].
4.
Each state is just reality from the perspective of a point along one or more dimensions. States of reality must be separated by something, or there would be only one state of reality. For example two different states of reality may be reality from the perspective of two different points in time, or in space and so on.
5.
Statements are the logical formulae about which we will reason.
6.
e.g. $Z_s$ is the extension of s.
7.
For example, we might represent chess as a supervised learning problem where $s \in S_\alpha $ is the state of a chessboard, $z \in Z_s$ is a sequence of moves by two players that begins in s, and $d \in D_\alpha \cap Z_s$ is such a sequence of moves that terminates in victory for one player in particular (the one undertaking the task).
8.
For example we might use weakness multiplied by a constant to the same effect.
9.
$\frac{2^{|Z_\textbf{h} |}}{2^{|L_\mathfrak {v} |}}$ is maximised when $\textbf{h} = \emptyset $, because the optimal hypothesis given no information is to assume nothing (you’ve no sequence to predict, so why make assertions that might contradict the environment?).
10.
Two statements a and b are mutually exclusive if $a \not \in Z_b$ and $b \not \in Z_a$, which we’ll write as $\mu (a,b)$. Given $x \in L_\mathfrak {v}$, the set of all mutually exclusive statements is a set $K_x \subset L_\mathfrak {v}$ such that $x \in K_x$ and $\forall a, b \in K_x : \mu (a,b)$. It follows that $\forall x \in L_\mathfrak {v}, \underset{b \in K_x}{\sum }\ p(b) = 1$.
11.
We acknowledge that some may object to the term universal, because $\mathfrak {v}$ is finite.
12.
We do not know which possibilities will eventuate. A less specific statement contradicts fewer possibilities. Of all hypotheses sufficient to explain what we perceive, the least specific is most likely.

References

Bennett, M.T.: Technical Appendices. Version 1.2.1 (2023). https://doi.org/10.5281/zenodo.7641742. https://github.com/ViscousLemming/Technical-Appendices
Sober, E.: Ockham’s Razors: A User’s Manual. Cambridge University Press (2015)
Google Scholar
Rissanen, J.: Modeling by shortest data description*. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Chollet, F.: On the Measure of Intelligence (2019)
Google Scholar
Chaitin, G.: The limits of reason. Sci. Am. 294(3), 74–81 (2006)
Article Google Scholar
Solomonoff, R.: A formal theory of inductive inference. Part I. Inf. Control 7(1), 1–22 (1964)
Google Scholar
Solomonoff, R.: A formal theory of inductive inference. Part II. Inf. Control 7(2), 224–254 (1964)
Google Scholar
Kolmogorov, A.: On tables of random numbers. Sankhya: Indian J. Stati. A 369–376 (1963)
Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, Heidelberg (2010)
MATH Google Scholar
Bennett, M.T.: Symbol emergence and the solutions to any task. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 30–40. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_4
Chapter Google Scholar
Ward, D., Silverman, D., Villalobos, M.: Introduction: the varieties of enactivism. Topoi 36(3), 365–375 (2017). https://doi.org/10.1007/s11245-017-9484-6
Article Google Scholar
Harnad, S.: The symbol grounding problem. Physica D: Nonlinear Phenomena 42(1), 335–346 (1990)
Article Google Scholar
Leike, J., Hutter, M.: Bad universal priors and notions of optimality. In: Proceedings of the 28th COLT, PMLR, pp. 1244–1259 (2015)
Google Scholar
Gupta, A.: Definitions. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Winter 2021. Stanford University (2021)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS. Curran Association Inc., USA (2019)
Google Scholar
Kirk, D.: NVIDIA Cuda Software and GPU parallel computing architecture. In: ISMM 2007, Canada, pp. 103–104. ACM (2007)
Google Scholar
Meurer, A., et al.: SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017). https://doi.org/10.7717/peerj-cs.103
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
Article Google Scholar
Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010)
Article MathSciNet Google Scholar
Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. LNCS, vol. 7070, pp. 236–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-44958-1_18
Chapter Google Scholar
Evans, R.: Kant’s cognitive architecture. Ph.D. thesis. Imperial (2020)
Google Scholar
Evans, R., Sergot, M., Stephenson, A.: Formalizing Kant’s rules. J. Philos. Logic 49, 613–680 (2020)
Article MathSciNet MATH Google Scholar
Evans, R., et al.: Making sense of raw input. Artif. Intell. 299 (2021)
Google Scholar
Bennett, M.T.: Compression, the fermi paradox and artificial super-intelligence. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 41–44. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_5
Chapter Google Scholar
Delétang, G., et al.: Neural Networks and the Chomsky Hierarchy (2022)
Google Scholar
Power, A., et al.: Grokking: generalization beyond overfitting on small algorithmic datasets. In: ICLR (2022)
Google Scholar

Download references

Acknowledgement

Appendices available on GitHub [1], supported by JST (JPMJMS2033).

Author information

Authors and Affiliations

The Australian National University, Canberra, Australia
Michael Timothy Bennett

Authors

Michael Timothy Bennett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Timothy Bennett .

Editor information

Editors and Affiliations

Department of Psychology, Stockholm University, Stockholm, Sweden
Patrick Hammer
Örebro University, Örebro, Sweden
Marjan Alirezaie
University of Gothenburg, Gothenburg, Sweden
Claes Strannegård

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bennett, M.T. (2023). The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest. In: Hammer, P., Alirezaie, M., Strannegård, C. (eds) Artificial General Intelligence. AGI 2023. Lecture Notes in Computer Science(), vol 13921. Springer, Cham. https://doi.org/10.1007/978-3-031-33469-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-33469-6_5
Published: 24 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33468-9
Online ISBN: 978-3-031-33469-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest