Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Reveliotis, Spyros; Bountourelis, Theologos

doi:10.1007/s10626-007-0014-3

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Published: 24 July 2007

Volume 17, pages 307–327, (2007)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Spyros Reveliotis¹ &
Theologos Bountourelis¹

48 Accesses
6 Citations
Explore all metrics

Abstract

This paper considers the problem of computing an optimal policy for a Markov decision process, under lack of complete a priori knowledge of (1) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (2) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer’s “indifference-zone” approach for the ranking & selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulated Annealing: From Basics to Applications

An overview of machine learning techniques in constraint solving

Article Open access 30 August 2021

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

Article 26 March 2024

References

Bechhofer RE (1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann Math Stat 25:16–39
Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Even-Dar E, Mannor S, Mansour Y (2002) PAC bounds for multi-armed bandit and Markov decision processes. In: Proceedings of COLT’02. ACM, New York, pp 255–270
Google Scholar
Feller W (1971) An introduction to probability theory and its applications, vol. II, 2nd edn. Wiley, New York
MATH Google Scholar
Fiechter CN (1994) Efficient reinforcement learning. In: Proceedings of COLT’94. ACM, New York, pp 88–97
Google Scholar
Fiechter CN (1997) Expected mistake bound model for on-line reinforcement learning. In: Proceedings of ICML’97. AAAI, Menlo Park, pp 116–124
Google Scholar
Heizer J, Render B (2004) Operations management, 7th edn. Pearson/Prentice Hall, Upper Saddle River
Google Scholar
Hoeffding W (1963) Probability inequalities for sum of bounded random variables. J Am Stat Assoc 58:13–30
Article MATH Google Scholar
Kearns M, Singh S (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Neural Inf Process Syst 11:996–1002
Google Scholar
Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49:209–232
Article MATH Google Scholar
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Google Scholar
Kim S-H, Nelson BL (2004) Selecting the best system. Technical report, School of Industrial & Systems Eng., Georgia Tech
Mitchell TM (1997) Machine learning. McGraw Hill, London
MATH Google Scholar
Reveliotis SA (2003) Uncertainty management in optimal disassembly planning through learning-based strategies. In: Proceedings of the NSF–IEEE–ORSI international workshop on IT-enabled manufacturing, logistics and supply chain management. NSF/IEEE/ORSI, Piscataway, pp 135–141
Google Scholar
Reveliotis SA (2004) Modelling and controlling uncertainty in optimal disassembly planning through reinforcement learning. In: IEEE international conference on robotics & automation. IEEE, Piscataway, pp 2625–2632
Google Scholar
Reveliotis SA (2007) Uncertainty management in optimal disassembly planning through learning-based strategies. IIE Trans 39:645–658
Article Google Scholar
Reveliotis SA, Bountourelis T (2006) Efficient learning algorithms for episodic tasks with acyclic state spaces. In: Proceedings of the 2006 IEEE international conference on automation science and engineering. IEEE, Piscataway, pp 421–428
Google Scholar
Sutton RS, Barto AG (2000) Reinforcement learning. MIT Press, Cambridge
Google Scholar
Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT Press, Cambridge
MATH Google Scholar
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University

Download references

Author information

Authors and Affiliations

School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Spyros Reveliotis & Theologos Bountourelis

Authors

Spyros Reveliotis
View author publications
You can also search for this author in PubMed Google Scholar
Theologos Bountourelis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Spyros Reveliotis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reveliotis, S., Bountourelis, T. Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces. Discrete Event Dyn Syst 17, 307–327 (2007). https://doi.org/10.1007/s10626-007-0014-3

Download citation

Received: 01 October 2005
Accepted: 01 June 2007
Published: 24 July 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s10626-007-0014-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

An overview of machine learning techniques in constraint solving

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

An overview of machine learning techniques in constraint solving

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation