Skip to main content
Log in

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

This paper considers the problem of computing an optimal policy for a Markov decision process, under lack of complete a priori knowledge of (1) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (2) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer’s “indifference-zone” approach for the ranking & selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bechhofer RE (1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann Math Stat 25:16–39

    Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Even-Dar E, Mannor S, Mansour Y (2002) PAC bounds for multi-armed bandit and Markov decision processes. In: Proceedings of COLT’02. ACM, New York, pp 255–270

    Google Scholar 

  • Feller W (1971) An introduction to probability theory and its applications, vol. II, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Fiechter CN (1994) Efficient reinforcement learning. In: Proceedings of COLT’94. ACM, New York, pp 88–97

    Google Scholar 

  • Fiechter CN (1997) Expected mistake bound model for on-line reinforcement learning. In: Proceedings of ICML’97. AAAI, Menlo Park, pp 116–124

    Google Scholar 

  • Heizer J, Render B (2004) Operations management, 7th edn. Pearson/Prentice Hall, Upper Saddle River

    Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sum of bounded random variables. J Am Stat Assoc 58:13–30

    Article  MATH  Google Scholar 

  • Kearns M, Singh S (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Neural Inf Process Syst 11:996–1002

    Google Scholar 

  • Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49:209–232

    Article  MATH  Google Scholar 

  • Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge

    Google Scholar 

  • Kim S-H, Nelson BL (2004) Selecting the best system. Technical report, School of Industrial & Systems Eng., Georgia Tech

  • Mitchell TM (1997) Machine learning. McGraw Hill, London

    MATH  Google Scholar 

  • Reveliotis SA (2003) Uncertainty management in optimal disassembly planning through learning-based strategies. In: Proceedings of the NSF–IEEE–ORSI international workshop on IT-enabled manufacturing, logistics and supply chain management. NSF/IEEE/ORSI, Piscataway, pp 135–141

    Google Scholar 

  • Reveliotis SA (2004) Modelling and controlling uncertainty in optimal disassembly planning through reinforcement learning. In: IEEE international conference on robotics & automation. IEEE, Piscataway, pp 2625–2632

    Google Scholar 

  • Reveliotis SA (2007) Uncertainty management in optimal disassembly planning through learning-based strategies. IIE Trans 39:645–658

    Article  Google Scholar 

  • Reveliotis SA, Bountourelis T (2006) Efficient learning algorithms for episodic tasks with acyclic state spaces. In: Proceedings of the 2006 IEEE international conference on automation science and engineering. IEEE, Piscataway, pp 421–428

    Google Scholar 

  • Sutton RS, Barto AG (2000) Reinforcement learning. MIT Press, Cambridge

    Google Scholar 

  • Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT Press, Cambridge

    MATH  Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spyros Reveliotis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reveliotis, S., Bountourelis, T. Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces. Discrete Event Dyn Syst 17, 307–327 (2007). https://doi.org/10.1007/s10626-007-0014-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-007-0014-3

Keywords

Navigation