Skip to main content
Log in

Neural Network Ensembles in Reinforcement Learning

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The integration of function approximation methods into reinforcement learning models allows for learning state- and state-action values in large state spaces. Model-free methods, like temporal-difference or SARSA, yield good results for problems where the Markov property holds. However, methods based on a temporal-difference are known to be unstable estimators of the value functions, when used with function approximation. Such unstable behavior depends on the Markov chain, the discounting value and the chosen function approximator. In this paper, we propose a meta-algorithm to learn state- or state-action values in a neural network ensemble, formed by a committee of multiple agents. The agents learn from joint decisions. It is shown that the committee benefits from the diversity on the estimation of the values. We empirically evaluate our algorithm on a generalized maze problem and on SZ-Tetris. The empirical evaluations confirm our analytical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Baird LCI (1995) Residual algorithms: reinforcement learning with function approximation. In: Prieditis A, Russell SJ (eds) Proceedings of the 12th international conference on machine learning (ICML’95). Morgan Kaufmann, p 30–37

  2. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  3. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  MathSciNet  Google Scholar 

  5. Burgiel H (1997) How to lose at tetris. Math Gazette 81(491):194–200

    Article  Google Scholar 

  6. Dietrich C, Palm G, Riede K, Schwenker F (2004) Classification of bioacoustic time series based on the combination of global and local decisions. Pattern Recognit 37(12):2293–2305

    Article  Google Scholar 

  7. Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: Sansone C, Kittler J, Roli F (eds) Proceedings of the 10th international workshop on multiple classifier systems (MCS 2011). Lecture notes in computer science, vol 6713. Springer, p 56–65

  8. Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698

    Article  Google Scholar 

  9. Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, p 401–406

  10. Li L (2008) A worst-case comparison between temporal difference and residual gradient with linear function approximation. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the 25th international conference on machine learning (ICML’08). ACM, ACM International Conference Proceeding Series, vol 307. pp 560–567.

  11. Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In: Fürnkranz J, Joachims T (eds) Machine learning (ICML’10). Omnipress, p 959–966.

  12. Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4–5):439–458

    Article  Google Scholar 

  13. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  14. Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk AP, Bottou L, Littman ML (eds) Proceedings of the 26th international conference on machine learning (ICML’09), vol 382. ACM, ACM International Conference Proceeding Series, p 125.

  15. Szita I, Szepesvári C (2010) SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In: ICML 2010 workshop on machine learning and games.

  16. Tsitsiklis J, Roy BV (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans on Autom Control 42(5):674–690

    Article  MATH  Google Scholar 

  17. Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern Part B 38(4):930–936

    Article  Google Scholar 

  18. Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) MCS. Lecture Notes in Computer Science, vol 5519. Springer, p 529–538.

  19. Zhou ZH (2011) Unlabeled data and multiple views. In: Schwenker F, Trentin E (eds) PSL. Lecture Notes in Computer Science, vol 7081. Springer, p 1–7.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Faußer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Faußer, S., Schwenker, F. Neural Network Ensembles in Reinforcement Learning. Neural Process Lett 41, 55–69 (2015). https://doi.org/10.1007/s11063-013-9334-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-013-9334-5

Keywords

Navigation