Abstract
Real-world agents need to learn how to react to their environment. To achieve this, it is crucial that they have a model of this environment that is adapted during interaction and although important aspects may be hidden. This paper presents a new type of model for partially observable environments that enables an agent to represent hidden states but can still be generated and queried in realtime. Agents can use such a model to predict the outcomes of their actions and to infer action policies. These policies turn out to be better than the optimal policy in a partially observable Markov decision process as it can be inferred, for example, by Q- or Sarsa-learning. The structure and generation of these models are motivated both by phenomenological considerations from semiotics and the philosophy of mind. The performance of these models is compared to a baseline of Markov models for prediction and interaction in partially observable environments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In the last one, the authors explicitly state that “[t]he agent rarely observes the exact same frame from a previous episode” which makes the environment according to a state based conception practically be fully observable.
- 2.
Comparisons with \(n \ge 2 \) yield similar results.
- 3.
As a consequence, the performance of the baseline approach is lower than in experiments with memory reset.
References
Bengio, Y., Courville, A., Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. CoRR abs/1206.5538 (2012)
Corneil, D., Gerstner, W., Brea, J.: Efficient model-based deep reinforcement learning with variational state tabulation. arXiv preprint arXiv:1802.04325 (2018)
Crook, P., Hayes, G.: Learning in a state of confusion: perceptual aliasing in grid world navigation. In: Towards Intelligent Mobile Robots, vol. 4 (2003)
Drescher, G.: Made-Up Minds. MIT press, Cambridge (1991)
Fikes, R., Nilsson, N.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)
Gelfond, M., Lifschitz, V.: Action languages. Electron. Trans. AI 3, 195–210 (1998)
Holmes, M., Isbell, C.: Schema learning: experience-based construction of predictive action models. In: Advances in Neural Information Processing Systems, pp. 585–592 (2005)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998)
Kansky, K., et al.: Schema networks: zero-shot transfer with a generative causal model of intuitive physics. arXiv preprint arXiv:1706.04317 (2017)
Lifschitz, V., Turner, H.: Representing transition systems by logic programs. In: Gelfond, M., Leone, N., Pfeifer, G. (eds.) LPNMR 1999. LNCS (LNAI), vol. 1730, pp. 92–106. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46767-X_7
Marty, R.: C.S. Peirce’s phaneroscopy and semiotics. Semiotica 41(1–4), 169–182 (1982)
Maturana, H.: Autopoiesis and Cognition: The Realization of the Living. Springer, Dordrecht (1980). https://doi.org/10.1007/978-94-009-8947-4
McCallum, A.: Overcoming incomplete perception with utile distinction memory. In: Proceedings of the 10th International Conference on Machine Learning, pp. 190–196 (1993)
McCallum, A.: Instance-based state identification for reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 377–384 (1995)
McCallum, A.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, University of Rochester, Department of Computer Science (1996)
van Otterlo, M.: The Logic of Adaptive Behavior: Knowledge Representation and Algorithms for Adaptive Sequential Decision Making Under Uncertainty in First-Order and Relational Domains. Ios Press, Amsterdam (2009). Frontiers in artificial intelligence and applications
Perotto, F.S., Buisson, J.C., Alvares, L.O.C.: Constructivist anticipatory learning mechanism (calm): dealing with partially deterministic and partially observable environments. In: International Conference on Epigenetic Robotics, pp. 117–127. Lund University Cognitive Science (2007)
Ring, M., Schaul, T., Schmidhuber, J.: The two-dimensional organization of behavior. In: 2011 IEEE International Conference on Development and Learning (ICDL), vol. 2, pp. 1–8. IEEE (2011)
Searle, J.: Intrinsic intentionality. Behav. Brain Sci. 3(03), 450–457 (1980)
Searle, J.: Intentionality: An Essay in the Philosophy of Mind. Cambridge Univ. Press, Cambridge Paperback Library, Cambridge (1983)
Sun, R., Sessions, C.: Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors. Syst. Man Cybern. Part B Cybern. 30(3), 403–418 (2000)
Sutton, R.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, vol. 1. MIT press, Cambridge (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wernsdorfer, M. (2018). A Phenomenologically Justifiable Simulation of Mental Modeling. In: Iklé, M., Franz, A., Rzepka, R., Goertzel, B. (eds) Artificial General Intelligence. AGI 2018. Lecture Notes in Computer Science(), vol 10999. Springer, Cham. https://doi.org/10.1007/978-3-319-97676-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-97676-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97675-4
Online ISBN: 978-3-319-97676-1
eBook Packages: Computer ScienceComputer Science (R0)