Skip to main content

Solving Relational and First-Order Logical Markov Decision Processes: A Survey

  • Chapter
Reinforcement Learning

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Abstract

In this chapter we survey representations and techniques for Markov decision processes, reinforcement learning, and dynamic programming in worlds explicitly modeled in terms of objects and relations. Such relational worlds can be found everywhere in planning domains, games, real-world indoor scenes and many more. Relational representations allow for expressive and natural datastructures that capture the objects and relations in an explicit way, enabling generalization over objects and relations, but also over similar problems which differ in the number of objects. The field was recently surveyed completely in (van Otterlo, 2009b), and here we describe a large portion of the main approaches. We discuss model-free – both value-based and policy-based – and model-based dynamic programming techniques. Several other aspects will be covered, such as models and hierarchies, and we end with several recent efforts and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  • Alpaydin, E.: Introduction to Machine Learning. The MIT Press, Cambridge (2004)

    Google Scholar 

  • Andersen, C.C.S.: Hierarchical relational reinforcement learning. Master’s thesis, Aalborg University, Denmark (2005)

    Google Scholar 

  • Asgharbeygi, N., Stracuzzi, D.J., Langley, P.: Relational temporal difference learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 49–56 (2006)

    Google Scholar 

  • Aycenina, M.: Hierarchical relational reinforcement learning. In: Stanford Doctoral Symposium (2002) (unpublished)

    Google Scholar 

  • Baum, E.B.: Toward a model of intelligence as an economy of agents. Machine Learning 35(2), 155–185 (1999)

    Article  MATH  Google Scholar 

  • Baum, E.B.: What is Thought? The MIT Press, Cambridge (2004)

    Google Scholar 

  • Bergadano, F., Gunetti, D.: Inductive Logic Programming: From Machine Learning to Software Engineering. The MIT Press, Cambridge (1995)

    Google Scholar 

  • Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  • Boutilier, C., Poole, D.: Computing optimal policies for partially observable markov decision processes using compact representations. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1168–1175 (1996)

    Google Scholar 

  • Boutilier, C., Dean, T., Hanks, S.: Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)

    MathSciNet  MATH  Google Scholar 

  • Boutilier, C., Dearden, R.W., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121(1-2), 49–107 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDP’s. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 690–697 (2001)

    Google Scholar 

  • Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Proceedings of the Neural Information Processing Conference (NIPS), pp. 369–376 (1995)

    Google Scholar 

  • Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Morgan Kaufmann Publishers, San Francisco (2004)

    Google Scholar 

  • Castilho, M.A., Kunzle, L.A., Lecheta, E., Palodeto, V., Silva, F.: An Investigation on Genetic Algorithms for Generic STRIPS Planning. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 185–194. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 726–731 (1991)

    Google Scholar 

  • Chen, J., Muggleton, S.: Decision-theoretic logic programs. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2010)

    Google Scholar 

  • Cocora, A., Kersting, K., Plagemann, C., Burgard, W., De Raedt, L.: Learning relational navigation policies. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2006)

    Google Scholar 

  • Cole, J., Lloyd, J.W., Ng, K.S.: Symbolic learning for adaptive agents. In: Proceedings of the Annual Partner Conference, Smart Internet Technology Cooperative Research Centre (2003), http://csl.anu.edu.au/jwl/crc_paper.pdf

  • Croonenborghs, T.: Model-assisted approaches for relational reinforcement learning. PhD thesis, Department of Computer Science, Catholic University of Leuven, Belgium (2009)

    Google Scholar 

  • Croonenborghs, T., Driessens, K., Bruynooghe, M.: Learning relational options for inductive transfer in relational reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007a)

    Google Scholar 

  • Croonenborghs, T., Ramon, J., Blockeel, H., Bruynooghe, M.: Online learning and exploiting relational models in reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 726–731 (2007b)

    Google Scholar 

  • Dabney, W., McGovern, A.: Utile distinctions for relational reinforcement learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 738–743 (2007)

    Google Scholar 

  • de la Rosa, T., Jimenez, S., Borrajo, D.: Learning relational decision trees for guiding heuristic planning. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2008)

    Google Scholar 

  • De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  • Dietterich, T.G., Flann, N.S.: Explanation-based learning and reinforcement learning: A unified view. Machine Learning 28(503), 169–210 (1997)

    Article  Google Scholar 

  • Diuk, C.: An object-oriented representation for efficient reinforcement learning. PhD thesis, Rutgers University, Computer Science Department (2010)

    Google Scholar 

  • Diuk, C., Cohen, A., Littman, M.L.: An object-oriented representation for efficient reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)

    Google Scholar 

  • Driessens, K., Blockeel, H.: Learning Digger using hierarchical reinforcement learning for concurrent goals. In: Proceedings of the European Workshop on Reinforcement Learning, EWRL (2001)

    Google Scholar 

  • Driessens, K., Džeroski, S.: Integrating experimentation and guidance in relational reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 115–122 (2002)

    Google Scholar 

  • Driessens, K., Džeroski, S.: Combining model-based and instance-based learning for first order regression. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 193–200 (2005)

    Google Scholar 

  • Driessens, K., Ramon, J.: Relational instance based regression for relational reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 123–130 (2003)

    Google Scholar 

  • Driessens, K., Ramon, J., Blockeel, H.: Speeding Up Relational Reinforcement Learning Through the Use of an Incremental First Order Decision Tree Learner. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 97–108. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  • Džeroski, S., De Raedt, L., Blockeel, H.: Relational reinforcement learning. In: Shavlik, J. (ed.) Proceedings of the International Conference on Machine Learning (ICML), pp. 136–143 (1998)

    Google Scholar 

  • Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001)

    Article  MATH  Google Scholar 

  • Feng, Z., Dearden, R.W., Meuleau, N., Washington, R.: Dynamic programming for structured continuous Markov decision problems. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 154–161 (2004)

    Google Scholar 

  • Fern, A., Yoon, S.W., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research (JAIR) 25, 75–118 (2006); special issue on the International Planning Competition 2004

    MathSciNet  MATH  Google Scholar 

  • Fern, A., Yoon, S.W., Givan, R.: Reinforcement learning in relational domains: A policy-language approach. The MIT Press, Cambridge (2007)

    Google Scholar 

  • Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(2) (1971)

    Google Scholar 

  • Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried Didn’t work very well: Deictic representations in reinforcement learning. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 154–161 (2002)

    Google Scholar 

  • Finzi, A., Lukasiewicz, T.: Game-theoretic agent programming in Golog. In: Proceedings of the European Conference on Artificial Intelligence (ECAI) (2004a)

    Google Scholar 

  • Finzi, A., Lukasiewicz, T.: Relational Markov Games. In: Alferes, J.J., Leite, J. (eds.) JELIA 2004. LNCS (LNAI), vol. 3229, pp. 320–333. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • García-Durán, R., Fernández, F., Borrajo, D.: Learning and transferring relational instance-based policies. In: Proceedings of the AAAI-2008 Workshop on Transfer Learning for Complex Tasks (2008)

    Google Scholar 

  • Gardiol, N.H., Kaelbling, L.P.: Envelope-based planning in relational MDPs. In: Proceedings of the Neural Information Processing Conference (NIPS) (2003)

    Google Scholar 

  • Gardiol, N.H., Kaelbling, L.P.: Adaptive envelope MDPs for relational equivalence-based planning. Tech. Rep. MIT-CSAIL-TR-2008-050, MIT CS & AI Lab, Cambridge, MA (2008)

    Google Scholar 

  • Gärtner, T., Driessens, K., Ramon, J.: Graph kernels and Gaussian processes for relational reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2003)

    Google Scholar 

  • Gearhart, C.: Genetic programming as policy search in Markov decision processes. In: Genetic Algorithms and Genetic Programming at Stanford, pp. 61–67 (2003)

    Google Scholar 

  • Geffner, H., Bonet, B.: High-level planning and control with incomplete information using pomdps. In: Proceedings Fall AAAI Symposium on Cognitive Robotics (1998)

    Google Scholar 

  • Gil, Y.: Learning by experimentation: Incremental refinement of incomplete planning domains. In: Proceedings of the International Conference on Machine Learning (ICML) (1994)

    Google Scholar 

  • Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 261–268 (1995)

    Google Scholar 

  • Gretton, C.: Gradient-based relational reinforcement-learning of temporally extended policies. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007a)

    Google Scholar 

  • Gretton, C.: Gradient-based relational reinforcement learning of temporally extended policies. In: Workshop on Artificial Intelligence Planning and Learning at the International Conference on Automated Planning Systems (2007b)

    Google Scholar 

  • Gretton, C., Thiébaux, S.: Exploiting first-order regression in inductive policy selection. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 217–225 (2004a)

    Google Scholar 

  • Gretton, C., Thiébaux, S.: Exploiting first-order regression in inductive policy selection (extended abstract). In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004b)

    Google Scholar 

  • Groote, J.F., Tveretina, O.: Binary decision diagrams for first-order predicate logic. The Journal of Logic and Algebraic Programming 57, 1–22 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Grounds, M., Kudenko, D.: Combining Reinforcement Learning with Symbolic Planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds.) ALAMAS 2005, ALAMAS 2006, and ALAMAS 2007. LNCS (LNAI), vol. 4865, pp. 75–86. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Guestrin, C.: Planning under uncertainty in complex structured environments. PhD thesis, Computer Science Department, Stanford University (2003)

    Google Scholar 

  • Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing plans to new environments in relational MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1003–1010 (2003a)

    Google Scholar 

  • Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research (JAIR) 19, 399–468 (2003b)

    MathSciNet  MATH  Google Scholar 

  • Halbritter, F., Geibel, P.: Learning Models of Relational MDPs Using Graph Kernels. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 409–419. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Hanks, S., McDermott, D.V.: Modeling a dynamic and uncertain world I: Symbolic and probabilistic reasoning about change. Artificial Intelligence 66(1), 1–55 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Guerra-Hernández, A., Fallah-Seghrouchni, A.E., Soldano, H.: Learning in BDI Multi-Agent Systems. In: Dix, J., Leite, J. (eds.) CLIMA 2004. LNCS (LNAI), vol. 3259, pp. 218–233. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Hernández, J., Morales, E.F.: Relational reinforcement learning with continuous actions by combining behavioral cloning and locally weighted regression. Journal of Intelligent Systems and Applications 2, 69–79 (2010)

    Article  Google Scholar 

  • Häming, K., Peters, G.: Relational Reinforcement Learning Applied to Appearance-Based Object Recognition. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds.) EANN 2009. Communications in Computer and Information Science, vol. 43, pp. 301–312. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Hölldobler, S., Skvortsova, O.: A logic-based approach to dynamic programming. In: Proceedings of the AAAI Workshop on Learning and Planning in Markov Processes - Advances and Challenges (2004)

    Google Scholar 

  • Itoh, H., Nakamura, K.: Towards learning to learn and plan by relational reinforcement learning. In: Proceedings of the ICML Workshop on Relational Reinforcement Learning (2004)

    Google Scholar 

  • Joshi, S.: First-order decision diagrams for decision-theoretic planning. PhD thesis, Tufts University, Computer Science Department (2010)

    Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: The AAAI Spring Symposium (2001)

    Google Scholar 

  • Karabaev, E., Skvortsova, O.: A heuristic search algorithm for solving first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2005)

    Google Scholar 

  • Karabaev, E., Rammé, G., Skvortsova, O.: Efficient symbolic reasoning for first-order MDPs. In: ECAI Workshop on Planning, Learning and Monitoring with Uncertainty and Dynamic Worlds (2006)

    Google Scholar 

  • Katz, D., Pyuro, Y., Brock, O.: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Proceedings of Robotics: Science and Systems IV (2008)

    Google Scholar 

  • Kersting, K., De Raedt, L.: Logical Markov decision programs and the convergence of TD(λ). In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2004)

    Google Scholar 

  • Kersting, K., Driessens, K.: Non-parametric gradients: A unified treatment of propositional and relational domains. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)

    Google Scholar 

  • Kersting, K., van Otterlo, M., De Raedt, L.: Bellman goes relational. In: Proceedings of the International Conference on Machine Learning (ICML) (2004)

    Google Scholar 

  • Khardon, R.: Learning to take actions. Machine Learning 35(1), 57–90 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Kochenderfer, M.J.: Evolving Hierarchical and Recursive Teleo-Reactive Programs Through Genetic Programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 83–92. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  • Lane, T., Wilson, A.: Toward a topological theory of relational reinforcement learning for navigation tasks. In: Proceedings of the International Florida Artificial Intelligence Research Society Conference (FLAIRS) (2005)

    Google Scholar 

  • Lang, T., Toussaint, M.: Approximate inference for planning in stochastic relational worlds. In: Proceedings of the International Conference on Machine Learning (ICML) (2009)

    Google Scholar 

  • Lang, T., Toussaint, M.: Probabilistic backward and forward reasoning in stochastic relational worlds. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)

    Google Scholar 

  • Langley, P.: Cognitive architectures and general intelligent systems. AI Magazine 27, 33–44 (2006)

    Google Scholar 

  • Lanzi, P.L.: Learning classifier systems from a reinforcement learning perspective. Soft Computing 6, 162–170 (2002)

    Article  MATH  Google Scholar 

  • Lecoeuche, R.: Learning optimal dialogue management rules by using reinforcement learning and inductive logic programming. In: Proceedings of the North American Chapter of the Association for Computational Linguistics, NAACL (2001)

    Google Scholar 

  • Letia, I., Precup, D.: Developing collaborative Golog agents by reinforcement learning. In: Proceedings of the 13th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2001). IEEE Computer Society (2001)

    Google Scholar 

  • Levine, J., Humphreys, D.: Learning Action Strategies for Planning Domains Using Genetic Programming. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 684–695. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  • Lison, P.: Towards relational POMDPs for adaptive dialogue management. In: ACL 2010: Proceedings of the ACL 2010 Student Research Workshop, pp. 7–12. Association for Computational Linguistics, Morristown (2010)

    Google Scholar 

  • Littman, M.L., Sutton, R.S., Singh, S.: Predictive representations of state. In: Proceedings of the Neural Information Processing Conference (NIPS) (2001)

    Google Scholar 

  • Lloyd, J.W.: Logic for Learning: Learning Comprehensible Theories From Structured Data. Springer, Heidelberg (2003)

    Book  Google Scholar 

  • Martin, M., Geffner, H.: Learning generalized policies in planning using concept languages. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR) (2000)

    Google Scholar 

  • Mausam, Weld, D.S.: Solving relational MDPs with first-order machine learning. In: Workshop on Planning under Uncertainty and Incomplete Information at ICAPS 2003 (2003)

    Google Scholar 

  • McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 387–395 (1995)

    Google Scholar 

  • Mellor, D.: A Learning Classifier System Approach to Relational Reinforcement Learning. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006 and IWLCS 2007. LNCS (LNAI), vol. 4998, pp. 169–188. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Minker, J.: Logic-Based Artificial Intelligence. Kluwer Academic Publishers Group, Dordrecht (2000)

    Book  MATH  Google Scholar 

  • Minton, S., Carbonell, J., Knoblock, C.A., Kuokka, D.R., Etzioni, O., Gil, Y.: Explanation-based learning: A problem solving perspective. Artificial Intelligence 40(1-3), 63–118 (1989)

    Article  Google Scholar 

  • Mooney, R.J., Califf, M.E.: Induction of first-order decision lists: Results on learning the past tense of english verbs. Journal of Artificial Intelligence Research (JAIR) 3, 1–24 (1995)

    Google Scholar 

  • Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1), 103–130 (1993)

    Google Scholar 

  • Morales, E.F.: Scaling up reinforcement learning with a relational representation. In: Proceedings of the Workshop on Adaptability in Multi-Agent Systems at AORC 2003, Sydney (2003)

    Google Scholar 

  • Morales, E.F.: Learning to fly by combining reinforcement learning with behavioral cloning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 598–605 (2004)

    Google Scholar 

  • Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research (JAIR) 11, 241–276 (1999)

    MATH  Google Scholar 

  • Mourão, K., Petrick, R.P.A., Steedman, M.: Using kernel perceptrons to learn action effects for planning. In: Proceedings of the International Conference on Cognitive Systems (CogSys), pp. 45–50 (2008)

    Google Scholar 

  • Muller, T.J., van Otterlo, M.: Evolutionary reinforcement learning in relational domains. In: Proceedings of the 7th European Workshop on Reinforcement Learning (2005)

    Google Scholar 

  • Nason, S., Laird, J.E.: Soar-RL: Integrating reinforcement learning with soar. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004)

    Google Scholar 

  • Nath, A., Domingos, P.: A language for relational decision theory. In: International Workshop on Statistical Relational Learning, SRL (2009)

    Google Scholar 

  • Neruda, R., Slusny, S.: Performance comparison of two reinforcement learning algorithms for small mobile robots. International Journal of Control and Automation 2(1), 59–68 (2009)

    Google Scholar 

  • Oates, T., Cohen, P.R.: Learning planning operators with conditional and probabilistic effects. In: Planning with Incomplete Information for Robot Problems: Papers from the 1996 AAAI Spring Symposium, pp. 86–94 (1996)

    Google Scholar 

  • Pasula, H.M., Zettlemoyer, L.S., Kaelbling, L.P.: Learning probabilistic planning rules. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2004)

    Google Scholar 

  • Poole, D.: The independent choice logic for modeling multiple agents under uncertainty. Artificial Intelligence 94, 7–56 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Ramon, J., Driessens, K., Croonenborghs, T.: Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 699–707. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press, Cambridge (2001)

    Google Scholar 

  • Rodrigues, C., Gerard, P., Rouveirol, C.: On and off-policy relational reinforcement learning. In: Late-Breaking Papers of the International Conference on Inductive Logic Programming (2008)

    Google Scholar 

  • Rodrigues, C., Gérard, P., Rouveirol, C.: IncremEntal Learning of Relational Action Models in Noisy Environments. In: Frasconi, P., Lisi, F.A. (eds.) ILP 2010. LNCS, vol. 6489, pp. 206–213. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  • Roncagliolo, S., Tadepalli, P.: Function approximation in hierarchical relational reinforcement learning. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML (2004)

    Google Scholar 

  • Russell, S.J., Norvig, P.: Artificial Intelligence: a Modern Approach, 2nd edn. Prentice Hall, New Jersey (2003)

    Google Scholar 

  • Ryan, M.R.K.: Using abstract models of behaviors to automatically generate reinforcement learning hierarchies. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 522–529 (2002)

    Google Scholar 

  • Saad, E.: A Logical Framework to Reinforcement Learning Using Hybrid Probabilistic Logic Programs. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 341–355. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Safaei, J., Ghassem-Sani, G.: Incremental learning of planning operators in stochastic domains. In: Proceedings of the International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pp. 644–655 (2007)

    Google Scholar 

  • Sanner, S.: Simultaneous learning of structure and value in relational reinforcement learning. In: Driessens, K., Fern, A., van Otterlo, M. (eds.) Proceedings of the ICML-2005 Workshop on Rich Representations for Reinforcement Learning (2005)

    Google Scholar 

  • Sanner, S.: Online feature discovery in relational reinforcement learning. In: Proceedings of the ICML-2006 Workshop on Open Problems in Statistical Relational Learning (2006)

    Google Scholar 

  • Sanner, S., Boutilier, C.: Approximate linear programming for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2005)

    Google Scholar 

  • Sanner, S., Boutilier, C.: Practical linear value-approximation techniques for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2006)

    Google Scholar 

  • Sanner, S., Boutilier, C.: Approximate solution techniques for factored first-order MDPs. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007)

    Google Scholar 

  • Sanner, S., Kersting, K.: Symbolic dynamic programming for first-order pomdps. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)

    Google Scholar 

  • Schmid, U.: Inductive synthesis of functional programs: Learning domain-specific control rules and abstraction schemes. In: Habilitationsschrift, Fakultät IV, Elektrotechnik und Informatik, Technische Universität Berlin, Germany (2001)

    Google Scholar 

  • Schuurmans, D., Patrascu, R.: Direct value approximation for factored MDPs. In: Proceedings of the Neural Information Processing Conference (NIPS) (2001)

    Google Scholar 

  • Shapiro, D., Langley, P.: Separating skills from preference. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 570–577 (2002)

    Google Scholar 

  • Simpkins, C., Bhat, S., Isbell, C.L., Mateas, M.: Adaptive Programming: Integrating Reinforcement Learning into a Programming Language. In: Proceedings of the Twenty-Third ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA (2008)

    Google Scholar 

  • Slaney, J., Thiébaux, S.: Blocks world revisited. Artificial Intelligence 125, 119–153 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Song, Z.W., Chen, X.P.: States evolution in Θ(λ)-learning based on logical mdps with negation. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1624–1629 (2007)

    Google Scholar 

  • Song, Z.W., Chen, X.P.: Agent learning in relational domains based on logical mdps with negation. Journal of Computers 3(9), 29–38 (2008)

    Article  Google Scholar 

  • Stone, P.: Learning and multiagent reasoning for autonomous agents. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Computers and Thought Award Paper (2007)

    Google Scholar 

  • Stracuzzi, D.J., Asgharbeygi, N.: Transfer of knowledge structures with relational temporal difference learning. In: Proceedings of the ICML 2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)

    Google Scholar 

  • Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

  • Sutton, R.S., McAllester, D.A., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the Neural Information Processing Conference (NIPS), pp. 1057–1063 (2000)

    Google Scholar 

  • Thielscher, M.: Introduction to the Fluent Calculus. Electronic Transactions on Artificial Intelligence 2(3-4), 179–192 (1998)

    MathSciNet  Google Scholar 

  • Thon, I., Guttman, B., van Otterlo, M., Landwehr, N., De Raedt, L.: From non-deterministic to probabilistic planning with the help of statistical relational learning. In: Workshop on Planning and Learning at ICAPS (2009)

    Google Scholar 

  • Torrey, L.: Relational transfer in reinforcement learning. PhD thesis, University of Wisconsin-Madison, Computer Science Department (2009)

    Google Scholar 

  • Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Relational macros for transfer in reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007)

    Google Scholar 

  • Torrey, L., Shavlik, J., Natarajan, S., Kuppili, P., Walker, T.: Transfer in reinforcement learning via markov logic networks. In: Proceedings of the AAAI-2008 Workshop on Transfer Learning for Complex Tasks (2008)

    Google Scholar 

  • Toussaint, M.: Probabilistic inference as a model of planned behavior. Künstliche Intelligenz (German Artificial Intelligence Journal) 3 (2009)

    Google Scholar 

  • Toussaint, M., Plath, N., Lang, T., Jetchev, N.: Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference. In: IEEE International Conference on Robotics and Automation, ICRA (2010)

    Google Scholar 

  • Van den Broeck, G., Thon, I., van Otterlo, M., De Raedt, L.: DTProbLog: A decision-theoretic probabilistic prolog. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)

    Google Scholar 

  • van Otterlo, M.: Efficient reinforcement learning using relational aggregation. In: Proceedings of the Sixth European Workshop on Reinforcement Learning, Nancy, France (EWRL-6) (2003)

    Google Scholar 

  • van Otterlo, M.: Reinforcement learning for relational MDPs. In: Nowé, A., Lenaerts, T., Steenhaut, K. (eds.) Machine Learning Conference of Belgium and the Netherlands (BeNeLearn 2004), pp. 138–145 (2004)

    Google Scholar 

  • van Otterlo, M.: Intensional dynamic programming: A rosetta stone for structured dynamic programming. Journal of Algorithms 64, 169–191 (2009a)

    Article  MATH  Google Scholar 

  • van Otterlo, M.: The Logic of Adaptive Behavior: Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains. IOS Press, Amsterdam (2009b)

    Google Scholar 

  • van Otterlo, M., De Vuyst, T.: Evolving and transferring probabilistic policies for relational reinforcement learning. In: Proceedings of the Belgium-Netherlands Artificial Intelligence Conference (BNAIC), pp. 201–208 (2009)

    Google Scholar 

  • van Otterlo, M., Wiering, M.A., Dastani, M., Meyer, J.J.: A characterization of sapient agents. In: Mayorga, R.V., Perlovsky, L.I. (eds.) Toward Computational Sapience: Principles and Systems, ch. 9. Springer, Heidelberg (2007)

    Google Scholar 

  • Vargas, B., Morales, E.: Solving navigation tasks with learned teleo-reactive programs, pp. 4185–4185 (2008), doi:10.1109/IROS.2008.4651240

    Google Scholar 

  • Vargas-Govea, B., Morales, E.: Learning Relational Grammars from Sequences of Actions. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 892–900. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Vere, S.A.: Induction of relational productions in the presence of background information. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 349–355 (1977)

    Google Scholar 

  • Walker, T., Shavlik, J., Maclin, R.: Relational reinforcement learning via sampling the space of first-order conjunctive features. In: Proceedings of the Workshop on Relational Reinforcement Learning at ICML 2004 (2004)

    Google Scholar 

  • Walker, T., Torrey, L., Shavlik, J., Maclin, R.: Building relational world models for reinforcement learning. In: Proceedings of the International Conference on Inductive Logic Programming (ILP) (2007)

    Google Scholar 

  • Walsh, T.J.: Efficient learning of relational models for sequential decision making. PhD thesis, Rutgers University, Computer Science Department (2010)

    Google Scholar 

  • Walsh, T.J., Littman, M.L.: Efficient learning of action schemas and web-service descriptions. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2008)

    Google Scholar 

  • Walsh, T.J., Li, L., Littman, M.L.: Transferring state abstractions between mdps. In: ICML-2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)

    Google Scholar 

  • Wang, C.: First-order markov decision processes. PhD thesis, Department of Computer Science, Tufts University, U.S.A (2007)

    Google Scholar 

  • Wang, C., Khardon, R.: Policy iteration for relational mdps. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2007)

    Google Scholar 

  • Wang, C., Khardon, R.: Relational partially observable mdps. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2010)

    Google Scholar 

  • Wang, C., Schmolze, J.: Planning with pomdps using a compact, logic-based representation. In: Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, ICTAI (2005)

    Google Scholar 

  • Wang, C., Joshi, S., Khardon, R.: First order decision diagrams for relational MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2007)

    Google Scholar 

  • Wang, C., Joshi, S., Khardon, R.: First order decision diagrams for relational MDPs. Journal of Artificial Intelligence Research (JAIR) 31, 431–472 (2008a)

    MathSciNet  MATH  Google Scholar 

  • Wang, W., Gao, Y., Chen, X., Ge, S.: Reinforcement Learning with Markov Logic Networks. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 230–242. Springer, Heidelberg (2008b)

    Chapter  Google Scholar 

  • Wang, X.: Learning by observation and practice: An incremental approach for planning operator acquisition. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 549–557 (1995)

    Google Scholar 

  • Wingate, D., Soni, V., Wolfe, B., Singh, S.: Relational knowledge with predictive state representations. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2007)

    Google Scholar 

  • Wooldridge, M.: An introduction to MultiAgent Systems. John Wiley & Sons Ltd., West Sussex (2002)

    Google Scholar 

  • Wu, J.H., Givan, R.: Discovering relational domain features for probabilistic planning. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems (ICAPS) (2007)

    Google Scholar 

  • Wu, K., Yang, Q., Jiang, Y.: ARMS: Action-relation modelling system for learning action models. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2005)

    Google Scholar 

  • Xu, J.Z., Laird, J.E.: Instance-based online learning of deterministic relational action models. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)

    Google Scholar 

  • Yoon, S.W., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2002)

    Google Scholar 

  • Zettlemoyer, L.S., Pasula, H.M., Kaelbling, L.P.: Learning planning rules in noisy stochastic worlds. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2005)

    Google Scholar 

  • Zhao, H., Doshi, P.: Haley: A hierarchical framework for logical composition of web services. In: Proceedings of the International Conference on Web Services (ICWS), pp. 312–319 (2007)

    Google Scholar 

  • Zhuo, H., Li, L., Bian, R., Wan, H.: Requirement Specification Based on Action Model Learning. In: Huang, D.-S., Heutte, L., Loog, M. (eds.) ICIC 2007. LNCS, vol. 4681, pp. 565–574. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martijn van Otterlo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van Otterlo, M. (2012). Solving Relational and First-Order Logical Markov Decision Processes: A Survey. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27645-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27644-6

  • Online ISBN: 978-3-642-27645-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics