Skip to main content

Effective Policy Gradient Search for Reinforcement Learning Through NEAT Based Feature Extraction

  • Conference paper
  • First Online:
Book cover Simulated Evolution and Learning (SEAL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10593))

Included in the following conference series:

Abstract

To improve the effectiveness of commonly used Policy Gradient Search (PGS) algorithms for Reinforcement Learning (RL), many existing works considered the importance of extracting useful state features from raw environment inputs. However, these works only studied the feature extraction process, but the learned features have not been demonstrated to improve reinforcement learning performance. In this paper, we consider NeuroEvolution of Augmenting Topology (NEAT) for automated feature extraction, as it can evolve Neural Networks with suitable topologies that can help extract useful features. Following this idea, we develop a new algorithm called NEAT with Regular Actor Critic for Policy Gradient Search, which integrates a popular Actor-Critic PGS algorithm (i.e., Regular Actor-Critic) with NEAT based feature extraction. The algorithm manages to learn useful state features as well as good policies to tackle complex RL problems. The results on benchmark problems confirm that our proposed algorithm is significantly more effective than NEAT in terms of learning performance, and that the learned features by our proposed algorithm on one learning problem can maintain the effectiveness while it is used with RAC on another related learning problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balduzzi, D., Frean, M., Leary, L., Lewis, J.P.: The shattered gradients problem: if resnets are the answer, then what is the question? arXiv.org (2017)

  2. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  3. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chen, G., Douch, C.I.J., Zhang, M.: Accuracy-based learning classifier systems for multistep reinforcement learning: a fuzzy logic approach to handling continuous inputs and learning continuous actions. IEEE Trans. Evol. Comput. 20(6), 953–971 (2016)

    Article  Google Scholar 

  5. Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)

    Google Scholar 

  6. Castro, D., Mannor, S.: Adaptive bases for reinforcement learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 312–327. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15880-3_26

    Chapter  Google Scholar 

  7. Grondman, I., Busoniu, L., Lopes, G.A.D., Babuška, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 1291–1307 (2012)

    Article  Google Scholar 

  8. Gu, S., Lillicrap, T.P., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: ICML, pp. 2829–2838 (2016)

    Google Scholar 

  9. Hermundstad, A.M., Brown, K.S., Bassett, D.S., Carlson, J.M.: Learning, memory, and the role of neural network architecture. PLoS Comput. Biol. 7(6), e1002063 (2011)

    Article  MathSciNet  Google Scholar 

  10. Kamio, S., Iba, H.: Adaptation technique for integrating genetic programming and reinforcement learning for real robots. IEEE Trans. Evol. Comput. 9(3), 318–333 (2005)

    Article  Google Scholar 

  11. Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the fourier basis. In: 2011 AAAI, pp. 380–385 (2011)

    Google Scholar 

  12. Lanzi, P.L.: Learning classifier systems: then and now. Evol. Intell. 1(1), 63–82 (2008)

    Article  Google Scholar 

  13. Loscalzo, S., Wright, R., Yu, L.: Predictive feature selection for genetic policy search. AAMAS 2014, 1–33 (2014)

    Google Scholar 

  14. Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1), 215–238 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Parr, R., Painter-Wakefield, C., Li, L.: Analyzing feature generation for value-function approximation. In: ICML, pp. 737–744 (2007)

    Google Scholar 

  16. Peng, Y., Chen, G., Zhang, M., Pang, S.: A sandpile model for reliable actor-critic reinforcement learning. In: IJCNN, pp. 4014–4021. IEEE (2017)

    Google Scholar 

  17. Peng, Y., Chen, G., Zhang, M., Pang, S.: Generalized compatible function approximation for policy gradient search. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9947, pp. 615–622. Springer, Cham (2016). doi:10.1007/978-3-319-46687-3_68

    Chapter  Google Scholar 

  18. Schrum, J., Miikkulainen, R.: Discovering multimodal behavior in ms. pac-man through evolution of modular neural networks. IEEE Trans. Comput. Intell. AI Games 8(1), 67–81 (2016)

    Article  Google Scholar 

  19. Stanley, K.O., Miikkulainen, R.: Evolving neural network through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)

    Article  Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MIT press, Cambridge (1998)

    Google Scholar 

  21. Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, pp. 1057–1063 (1999)

    Google Scholar 

  22. Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7(5), 877–917 (2006)

    MathSciNet  MATH  Google Scholar 

  23. Whiteson, S., Stone, P., Stanley, K.O., Miikkulainen, R., Kohl, N.: Automatic feature selection in neuroevolution. In: 2005 GECCO, pp. 1225–1232 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiming Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Peng, Y., Chen, G., Zhang, M., Mei, Y. (2017). Effective Policy Gradient Search for Reinforcement Learning Through NEAT Based Feature Extraction. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68759-9_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68758-2

  • Online ISBN: 978-3-319-68759-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics