Abstract
The integration of function approximation methods into reinforcement learning models allows for learning state- and state-action values in large state spaces. Model-free methods, like temporal-difference or SARSA, yield good results for problems where the Markov property holds. However, methods based on a temporal-difference are known to be unstable estimators of the value functions, when used with function approximation. Such unstable behavior depends on the Markov chain, the discounting value and the chosen function approximator. In this paper, we propose a meta-algorithm to learn state- or state-action values in a neural network ensemble, formed by a committee of multiple agents. The agents learn from joint decisions. It is shown that the committee benefits from the diversity on the estimation of the values. We empirically evaluate our algorithm on a generalized maze problem and on SZ-Tetris. The empirical evaluations confirm our analytical results.
Similar content being viewed by others
References
Baird LCI (1995) Residual algorithms: reinforcement learning with function approximation. In: Prieditis A, Russell SJ (eds) Proceedings of the 12th international conference on machine learning (ICML’95). Morgan Kaufmann, p 30–37
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Burgiel H (1997) How to lose at tetris. Math Gazette 81(491):194–200
Dietrich C, Palm G, Riede K, Schwenker F (2004) Classification of bioacoustic time series based on the combination of global and local decisions. Pattern Recognit 37(12):2293–2305
Faußer S, Schwenker F (2011) Ensemble methods for reinforcement learning with function approximation. In: Sansone C, Kittler J, Roli F (eds) Proceedings of the 10th international workshop on multiple classifier systems (MCS 2011). Lecture notes in computer science, vol 6713. Springer, p 56–65
Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698
Hans A, Udluft S (2010) Ensembles of neural networks for robust reinforcement learning. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, p 401–406
Li L (2008) A worst-case comparison between temporal difference and residual gradient with linear function approximation. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the 25th international conference on machine learning (ICML’08). ACM, ACM International Conference Proceeding Series, vol 307. pp 560–567.
Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In: Fürnkranz J, Joachims T (eds) Machine learning (ICML’10). Omnipress, p 959–966.
Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis-function networks. Neural Netw 14(4–5):439–458
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk AP, Bottou L, Littman ML (eds) Proceedings of the 26th international conference on machine learning (ICML’09), vol 382. ACM, ACM International Conference Proceeding Series, p 125.
Szita I, Szepesvári C (2010) SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In: ICML 2010 workshop on machine learning and games.
Tsitsiklis J, Roy BV (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans on Autom Control 42(5):674–690
Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern Part B 38(4):930–936
Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) MCS. Lecture Notes in Computer Science, vol 5519. Springer, p 529–538.
Zhou ZH (2011) Unlabeled data and multiple views. In: Schwenker F, Trentin E (eds) PSL. Lecture Notes in Computer Science, vol 7081. Springer, p 1–7.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Faußer, S., Schwenker, F. Neural Network Ensembles in Reinforcement Learning. Neural Process Lett 41, 55–69 (2015). https://doi.org/10.1007/s11063-013-9334-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-013-9334-5