Effective Policy Gradient Search for Reinforcement Learning Through NEAT Based Feature Extraction

Peng, Yiming; Chen, Gang; Zhang, Mengjie; Mei, Yi

doi:10.1007/978-3-319-68759-9_39

Yiming Peng²²,
Gang Chen²²,
Mengjie Zhang²² &
…
Yi Mei²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10593))

Included in the following conference series:

Asia-Pacific Conference on Simulated Evolution and Learning

3170 Accesses
1 Citations

Abstract

To improve the effectiveness of commonly used Policy Gradient Search (PGS) algorithms for Reinforcement Learning (RL), many existing works considered the importance of extracting useful state features from raw environment inputs. However, these works only studied the feature extraction process, but the learned features have not been demonstrated to improve reinforcement learning performance. In this paper, we consider NeuroEvolution of Augmenting Topology (NEAT) for automated feature extraction, as it can evolve Neural Networks with suitable topologies that can help extract useful features. Following this idea, we develop a new algorithm called NEAT with Regular Actor Critic for Policy Gradient Search, which integrates a popular Actor-Critic PGS algorithm (i.e., Regular Actor-Critic) with NEAT based feature extraction. The algorithm manages to learn useful state features as well as good policies to tackle complex RL problems. The results on benchmark problems confirm that our proposed algorithm is significantly more effective than NEAT in terms of learning performance, and that the learned features by our proposed algorithm on one learning problem can maintain the effectiveness while it is used with RAC on another related learning problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Balduzzi, D., Frean, M., Leary, L., Lewis, J.P.: The shattered gradients problem: if resnets are the answer, then what is the question? arXiv.org (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
Article MathSciNet MATH Google Scholar
Chen, G., Douch, C.I.J., Zhang, M.: Accuracy-based learning classifier systems for multistep reinforcement learning: a fuzzy logic approach to handling continuous inputs and learning continuous actions. IEEE Trans. Evol. Comput. 20(6), 953–971 (2016)
Article Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)
Google Scholar
Castro, D., Mannor, S.: Adaptive bases for reinforcement learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 312–327. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15880-3_26
Chapter Google Scholar
Grondman, I., Busoniu, L., Lopes, G.A.D., Babuška, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 1291–1307 (2012)
Article Google Scholar
Gu, S., Lillicrap, T.P., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: ICML, pp. 2829–2838 (2016)
Google Scholar
Hermundstad, A.M., Brown, K.S., Bassett, D.S., Carlson, J.M.: Learning, memory, and the role of neural network architecture. PLoS Comput. Biol. 7(6), e1002063 (2011)
Article MathSciNet Google Scholar
Kamio, S., Iba, H.: Adaptation technique for integrating genetic programming and reinforcement learning for real robots. IEEE Trans. Evol. Comput. 9(3), 318–333 (2005)
Article Google Scholar
Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the fourier basis. In: 2011 AAAI, pp. 380–385 (2011)
Google Scholar
Lanzi, P.L.: Learning classifier systems: then and now. Evol. Intell. 1(1), 63–82 (2008)
Article Google Scholar
Loscalzo, S., Wright, R., Yu, L.: Predictive feature selection for genetic policy search. AAMAS 2014, 1–33 (2014)
Google Scholar
Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1), 215–238 (2005)
Article MathSciNet MATH Google Scholar
Parr, R., Painter-Wakefield, C., Li, L.: Analyzing feature generation for value-function approximation. In: ICML, pp. 737–744 (2007)
Google Scholar
Peng, Y., Chen, G., Zhang, M., Pang, S.: A sandpile model for reliable actor-critic reinforcement learning. In: IJCNN, pp. 4014–4021. IEEE (2017)
Google Scholar
Peng, Y., Chen, G., Zhang, M., Pang, S.: Generalized compatible function approximation for policy gradient search. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9947, pp. 615–622. Springer, Cham (2016). doi:10.1007/978-3-319-46687-3_68
Chapter Google Scholar
Schrum, J., Miikkulainen, R.: Discovering multimodal behavior in ms. pac-man through evolution of modular neural networks. IEEE Trans. Comput. Intell. AI Games 8(1), 67–81 (2016)
Article Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving neural network through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MIT press, Cambridge (1998)
Google Scholar
Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, pp. 1057–1063 (1999)
Google Scholar
Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7(5), 877–917 (2006)
MathSciNet MATH Google Scholar
Whiteson, S., Stone, P., Stanley, K.O., Miikkulainen, R., Kohl, N.: Automatic feature selection in neuroevolution. In: 2005 GECCO, pp. 1225–1232 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Yiming Peng, Gang Chen, Mengjie Zhang & Yi Mei

Authors

Yiming Peng
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Mei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiming Peng .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Yuhui Shi
City University of Hong Kong, Hong Kong, Kowloon, Hong Kong
Kay Chen Tan
Victoria University of Wellington, Wellington, Wellington, New Zealand
Mengjie Zhang
Southern University of Science and Technology, Shenzhen, China
Ke Tang
RMIT University, Melbourne, Victoria, Australia
Xiaodong Li
City University of Hong Kong, Kowloon Tong, Hong Kong
Qingfu Zhang
Peking University, Beijing, China
Ying Tan
University of Leipzig, Leipzig, Germany
Martin Middendorf
University of Surrey, Guildford, Surrey, United Kingdom
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, Y., Chen, G., Zhang, M., Mei, Y. (2017). Effective Policy Gradient Search for Reinforcement Learning Through NEAT Based Feature Extraction. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-68759-9_39
Published: 14 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68758-2
Online ISBN: 978-3-319-68759-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effective Policy Gradient Search for Reinforcement Learning Through NEAT Based Feature Extraction