Skip to main content

Advertisement

Log in

Predictive feature selection for genetic policy search

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Automatic learning of control policies is becoming increasingly important to allow autonomous agents to operate alongside, or in place of, humans in dangerous and fast-paced situations. Reinforcement learning (RL), including genetic policy search algorithms, comprise a promising technology area capable of learning such control policies. Unfortunately, RL techniques can take prohibitively long to learn a sufficiently good control policy in environments described by many sensors (features). We argue that in many cases only a subset of available features are needed to learn the task at hand, since others may represent irrelevant or redundant information. In this work, we propose a predictive feature selection framework that analyzes data obtained during execution of a genetic policy search algorithm to identify relevant features on-line. This serves to constrain the policy search space and reduces the time needed to locate a sufficiently good policy by embedding feature selection into the process of learning a control policy. We explore this framework through an instantiation called predictive feature selection embedded in neuroevolution of augmenting topology (NEAT), or PFS-NEAT. In an empirical study, we demonstrate that PFS-NEAT is capable of enabling NEAT to successfully find good control policies in two benchmark environments, and show that it can outperform three competing feature selection algorithms, FS-NEAT, FD-NEAT, and SAFS-NEAT, in several variants of these environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Project located at: http://rars.sourceforge.net.

  2. http://anji.sourceforge.net.

  3. http://www.cs.waikato.ac.nz/ml/weka.

References

  1. Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57, 469–483.

    Article  Google Scholar 

  2. Bellman, R. (2003). Dynamic programming. Mineola: Dover Publications.

    Google Scholar 

  3. Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48). New York: ACM.

  4. Böhm, N., Kkai, G., & Mandl, S. (2004). Evolving a heuristic function for the game of tetris. Lernen, Wissensentdeckung und Adaptivität (LWA) (pp. 118–122). Berlin: Humbold-Universität.

    Google Scholar 

  5. Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. JAIR, 11, 1–94.

    MathSciNet  Google Scholar 

  6. Cannady, J. (2000). Next generation intrusion detection: Autonomous reinforcement learning of network attacks. In Proceedings of the 23rd National Information Systems Secuity Conference (pp. 1–12).

  7. Castelletti, A., Galelli, S., Restelli, M., & Soncini-Sessa, R. (2011). Tree-based variable selection for dimensionality reduction of large-scale control systems. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), IEEE (pp. 62–69).

  8. Cliff, D., & Miller, G. (1995). Racking the red queen: Measurements of adaptive progress in co-evolutionary simulations. In F. Morn, A. Moreno, J. Merelo, & P. Chacn (Eds.), Advances in artificial life, lecture notes in computer science (Vol. 929, pp. 200–218). Berlin Heidelberg: Springer. doi: 10.1007/3-540-59496-5_300

  9. Deisenroth, M., & Rasmussen, C. (2011). Pilco: A model-based and data-efficient approach to policy search. In L. Getoor & T. Scheffer (Eds.), Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 465–472). New York: ACM.

  10. Devijver, P., & Kittler, J. (1982). Pattern recognition: A statistical approach. London: Prentice Hall International.

    Google Scholar 

  11. Dietterich, T. G. (1998). The maxq method for hierarchical reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann (pp. 118–126).

  12. Diuk, C., Li, L., & Leffler, B. (2009). The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In L. Bottou & M. Littman (Eds.), Proceedings of the 26th International Conference on Machine Learning (pp. 249–256). Montreal: Omnipress.

    Google Scholar 

  13. Doroodgar, B., & Nejat, G. (2010). A hierarchical reinforcement learning based control architecture for semi-autonomous rescue robots in cluttered environments. In 2010 IEEE Conference on Automation Science and Engineering (CASE) (pp. 948–953).

  14. Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. JMLR, 6, 503–556.

    MathSciNet  Google Scholar 

  15. Goldberg, D. E., & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. Proceedings of the Second International Conference on Genetic Algorithms on Genetic Algorithms and Their Application (pp. 41–49). Hillsdale, NJ: L. Erlbaum Associates Inc.

    Google Scholar 

  16. Gomez, F., & Miikkulainen, R. (1997). Incremental evolution of complex general behavior. Adaptive Behavior, 5, 5–317.

    Article  Google Scholar 

  17. Gomez, F. J., & Miikkulainen, R. (1999). Solving non-markovian control tasks with neuroevolution. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, Morgan Kaufmann (pp. 1356–1361).

  18. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.

    Google Scholar 

  19. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.

    Article  Google Scholar 

  20. Hachiya, H., & Sugiyama, M. (2010). Feature selection for reinforcement learning: Evaluating implicit state-reward dependency via conditional mutual information. In Proceedings of the ECML (pp. 474–489).

  21. Hall, M. (1999). Correlation based feature selection for machine learning. PhD thesis, University of Waikato, Department of Computer Science.

  22. Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York: Springer.

    Google Scholar 

  23. Jung, T., & Stone, P. (2009). Feature selection for value function approximation using bayesian model selection. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 660–675).

  24. Knowles, J. D., Watson, R. A., & Corne, D. W. (2001). Reducing local optima in single-objective problems by multi-objectivization. In E. Zitzler, L. Thiele, K. Deb, C. Coello Coello, & D. Corne (Eds.), Evolutionary multi-criterion optimization, lecture notes in computer science (Vol. 1993, pp. 269–283). Berlin Heidelberg: Springer. doi: 10.1007/3-540-44719-9_19

  25. Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 521–528).

  26. Konidaris, G., & Barto, A. (2009). Efficient skill learning using abstraction selection. Proceedings of the 21st International Jont Conference on Artifical Intelligence (pp. 1107–1112). San Francisco, CA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  27. Konidaris, G., Kuindersma, S., Barto, A., & Grupen, R. (2010). Constructing skill trees for reinforcement learning agents from demonstration trajectories. NIPS, 23, 1162–1170.

    Google Scholar 

  28. Kveton, B., Hauskrecht, M., & Guestrin, C. (2006). Solving factored MDPs with hybrid state and action variables. Journal of Artificial Intelligence Research, 27, 153–201.

    MathSciNet  Google Scholar 

  29. Lazaric, A., Restelli, M., & Bonarini, A. (2007). Reinforcement learning in continuous action spaces through sequential monte carlo methods. Advances in Neural Information Processing Systems (pp. 833–840). Cambridge: MIT Press.

    Google Scholar 

  30. Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19(2), 189–223.

    Article  Google Scholar 

  31. Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics (pp. 531–539).

  32. Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

    Article  Google Scholar 

  33. Loscalzo, S., Wright, R., Acunto, K., & Yu, L. (2012). Sample aware embedded feature selection for reinforcement learning. In Proceedings of GECCO (pp. 879–886).

  34. Mahadevan, S. (2005). Representation policy iteration. Proceedings of the Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05) (pp. 372–379). Arlington, Virginia: AUAI Press.

    Google Scholar 

  35. March, J. G. (1991). Exploration and exploitation in organizational learning. Organizational Science, 2(1), 71–87.

    Article  MathSciNet  Google Scholar 

  36. Melo, F. S., & Lopes, M. (2008). Fitted natural actor-critic: A new algorithm for continuous state-action MDPs. In ECML/PKDD(2) (pp. 66–81).

  37. Mouret, J. B., & Doncieux, S. (2012). Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evolutionary Computation, 20(1), 91–133. doi:10.1162/EVCO_a_00048.

    Article  Google Scholar 

  38. Nouri, A., & Littman, M. (2010). Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning, 81, 85–98.

    Article  MathSciNet  Google Scholar 

  39. Parr, R., Painter-Wakefield, C., Li, L., & Littman, M.L. (2007). Analyzing feature generation for value-function approximation. In ICML (pp. 737–744).

  40. Pazis, J., & Lagoudakis, M. G. (2009). Binary action search for learning continuous-action control policies. In Proceedings of the 26th Annual International Conference on Machine Learning ICML ’09 (pp. 793–800). New York: ACM.

  41. Petrik, M., Taylor, G., Parr, R., & Zilberstein, S. (2010). Feature selection using regularization in approximate linear programs for markov decision processes. In Proceedings of the 27th International Conference on Machine Learning (pp. 871–878).

  42. Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimensionality (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  43. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley-Interscience.

    Book  Google Scholar 

  44. Servin, A., & Kudenko, D. (2008). Multi-agent reinforcement learning for intrusion detection: A case study and evaluation. In Proceedings of the European Conference on Artificial Intelligence (pp. 873–874).

  45. Sher, G. I. (2012). Handbook of neuroevolution through Erlang. New York: Springer.

    Google Scholar 

  46. Stanley, K. O., & Miikkulainen, R. (2002). Efficient reinforcement learning through evolving neural network topologies. In Proceedings of GECCO (pp. 569–577).

  47. Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  48. Tan, M., Hartley, M., Bister, M., & Deklerck, R. (2009). Automated feature selection in neuroevolution. Evolutionary Intelligence, 1(4), 271–292.

    Article  Google Scholar 

  49. Tan, M., Deklerck, R., Jansen, B., & Cornelis, J. (2012). Analysis of a feature-deselective neuroevolution classifier (FD-NEAT) in a computer-aided lung nodule detection system for ct images. In T. Soule & J. H. Moore (Eds.), GECCO (Companion) (pp. 539–546). New York: ACM.

    Chapter  Google Scholar 

  50. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. JMLR, 10, 1633–1685.

    MathSciNet  Google Scholar 

  51. Tesauro, G., Das, R., Chan, H., Kephart, J. O., Levine, D., III FLR, & Lefurgy, C. (2007). Managing power consumption and performance of computing systems using reinforcement learning. In NIPS.

  52. Vigorito, C. M., & Barto, A. G. (2009). Incremental structure learning in factored MDPs with continuous states and actions. Tech. rep.: University of Massachusetts Amherst - Department of Computer Science.

    Google Scholar 

  53. Watkins, C. J. C. H., & Dayan, P. (1992). Technical note q-learning. Machine Learning, 8, 279–292.

    Google Scholar 

  54. Whiteson, S., & Stone, P. (2006). Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7, 877–917.

    MathSciNet  Google Scholar 

  55. Whiteson, S., Stone, P., & Stanley, K. O. (2005). Automatic feature selection in neuroevolution. In Proceedings of GECCO (pp. 1225–1232).

  56. Wright, R., Loscalzo, S., & Yu, L. (2011). Embedded incremental feature selection for reinforcement learning. In ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, Artificial Intelligence, Rome, Italy, January 28–30 (Vol. 1, pp. 263–268).

  57. Xu, L., Yan, P., & Chang, T. (1988). Best first strategy for feature selection. In Proceedings of the Ninth International Conference on Pattern Recognition (pp. 706–708).

Download references

Acknowledgments

This work was performed under 13-RI-CRADA-13, and was supported in part through computational resources provided by the U.S. DoD HPCMP AFRL/RI Affiliated Resource Center. The authors would like to thank the anonymous reviewers for their helpful comments and suggestions, and Kevin Acunto for his work porting the RARS environment to Java.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Loscalzo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Loscalzo, S., Wright, R. & Yu, L. Predictive feature selection for genetic policy search. Auton Agent Multi-Agent Syst 29, 754–786 (2015). https://doi.org/10.1007/s10458-014-9268-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-014-9268-y

Keywords

Navigation