Abstract
Automatic learning of control policies is becoming increasingly important to allow autonomous agents to operate alongside, or in place of, humans in dangerous and fast-paced situations. Reinforcement learning (RL), including genetic policy search algorithms, comprise a promising technology area capable of learning such control policies. Unfortunately, RL techniques can take prohibitively long to learn a sufficiently good control policy in environments described by many sensors (features). We argue that in many cases only a subset of available features are needed to learn the task at hand, since others may represent irrelevant or redundant information. In this work, we propose a predictive feature selection framework that analyzes data obtained during execution of a genetic policy search algorithm to identify relevant features on-line. This serves to constrain the policy search space and reduces the time needed to locate a sufficiently good policy by embedding feature selection into the process of learning a control policy. We explore this framework through an instantiation called predictive feature selection embedded in neuroevolution of augmenting topology (NEAT), or PFS-NEAT. In an empirical study, we demonstrate that PFS-NEAT is capable of enabling NEAT to successfully find good control policies in two benchmark environments, and show that it can outperform three competing feature selection algorithms, FS-NEAT, FD-NEAT, and SAFS-NEAT, in several variants of these environments.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10458-014-9268-y/MediaObjects/10458_2014_9268_Fig9_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Project located at: http://rars.sourceforge.net.
References
Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57, 469–483.
Bellman, R. (2003). Dynamic programming. Mineola: Dover Publications.
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48). New York: ACM.
Böhm, N., Kkai, G., & Mandl, S. (2004). Evolving a heuristic function for the game of tetris. Lernen, Wissensentdeckung und Adaptivität (LWA) (pp. 118–122). Berlin: Humbold-Universität.
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. JAIR, 11, 1–94.
Cannady, J. (2000). Next generation intrusion detection: Autonomous reinforcement learning of network attacks. In Proceedings of the 23rd National Information Systems Secuity Conference (pp. 1–12).
Castelletti, A., Galelli, S., Restelli, M., & Soncini-Sessa, R. (2011). Tree-based variable selection for dimensionality reduction of large-scale control systems. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), IEEE (pp. 62–69).
Cliff, D., & Miller, G. (1995). Racking the red queen: Measurements of adaptive progress in co-evolutionary simulations. In F. Morn, A. Moreno, J. Merelo, & P. Chacn (Eds.), Advances in artificial life, lecture notes in computer science (Vol. 929, pp. 200–218). Berlin Heidelberg: Springer. doi: 10.1007/3-540-59496-5_300
Deisenroth, M., & Rasmussen, C. (2011). Pilco: A model-based and data-efficient approach to policy search. In L. Getoor & T. Scheffer (Eds.), Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 465–472). New York: ACM.
Devijver, P., & Kittler, J. (1982). Pattern recognition: A statistical approach. London: Prentice Hall International.
Dietterich, T. G. (1998). The maxq method for hierarchical reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann (pp. 118–126).
Diuk, C., Li, L., & Leffler, B. (2009). The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In L. Bottou & M. Littman (Eds.), Proceedings of the 26th International Conference on Machine Learning (pp. 249–256). Montreal: Omnipress.
Doroodgar, B., & Nejat, G. (2010). A hierarchical reinforcement learning based control architecture for semi-autonomous rescue robots in cluttered environments. In 2010 IEEE Conference on Automation Science and Engineering (CASE) (pp. 948–953).
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. JMLR, 6, 503–556.
Goldberg, D. E., & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. Proceedings of the Second International Conference on Genetic Algorithms on Genetic Algorithms and Their Application (pp. 41–49). Hillsdale, NJ: L. Erlbaum Associates Inc.
Gomez, F., & Miikkulainen, R. (1997). Incremental evolution of complex general behavior. Adaptive Behavior, 5, 5–317.
Gomez, F. J., & Miikkulainen, R. (1999). Solving non-markovian control tasks with neuroevolution. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, Morgan Kaufmann (pp. 1356–1361).
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.
Hachiya, H., & Sugiyama, M. (2010). Feature selection for reinforcement learning: Evaluating implicit state-reward dependency via conditional mutual information. In Proceedings of the ECML (pp. 474–489).
Hall, M. (1999). Correlation based feature selection for machine learning. PhD thesis, University of Waikato, Department of Computer Science.
Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York: Springer.
Jung, T., & Stone, P. (2009). Feature selection for value function approximation using bayesian model selection. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 660–675).
Knowles, J. D., Watson, R. A., & Corne, D. W. (2001). Reducing local optima in single-objective problems by multi-objectivization. In E. Zitzler, L. Thiele, K. Deb, C. Coello Coello, & D. Corne (Eds.), Evolutionary multi-criterion optimization, lecture notes in computer science (Vol. 1993, pp. 269–283). Berlin Heidelberg: Springer. doi: 10.1007/3-540-44719-9_19
Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 521–528).
Konidaris, G., & Barto, A. (2009). Efficient skill learning using abstraction selection. Proceedings of the 21st International Jont Conference on Artifical Intelligence (pp. 1107–1112). San Francisco, CA: Morgan Kaufmann Publishers Inc.
Konidaris, G., Kuindersma, S., Barto, A., & Grupen, R. (2010). Constructing skill trees for reinforcement learning agents from demonstration trajectories. NIPS, 23, 1162–1170.
Kveton, B., Hauskrecht, M., & Guestrin, C. (2006). Solving factored MDPs with hybrid state and action variables. Journal of Artificial Intelligence Research, 27, 153–201.
Lazaric, A., Restelli, M., & Bonarini, A. (2007). Reinforcement learning in continuous action spaces through sequential monte carlo methods. Advances in Neural Information Processing Systems (pp. 833–840). Cambridge: MIT Press.
Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19(2), 189–223.
Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics (pp. 531–539).
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Loscalzo, S., Wright, R., Acunto, K., & Yu, L. (2012). Sample aware embedded feature selection for reinforcement learning. In Proceedings of GECCO (pp. 879–886).
Mahadevan, S. (2005). Representation policy iteration. Proceedings of the Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05) (pp. 372–379). Arlington, Virginia: AUAI Press.
March, J. G. (1991). Exploration and exploitation in organizational learning. Organizational Science, 2(1), 71–87.
Melo, F. S., & Lopes, M. (2008). Fitted natural actor-critic: A new algorithm for continuous state-action MDPs. In ECML/PKDD(2) (pp. 66–81).
Mouret, J. B., & Doncieux, S. (2012). Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evolutionary Computation, 20(1), 91–133. doi:10.1162/EVCO_a_00048.
Nouri, A., & Littman, M. (2010). Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning, 81, 85–98.
Parr, R., Painter-Wakefield, C., Li, L., & Littman, M.L. (2007). Analyzing feature generation for value-function approximation. In ICML (pp. 737–744).
Pazis, J., & Lagoudakis, M. G. (2009). Binary action search for learning continuous-action control policies. In Proceedings of the 26th Annual International Conference on Machine Learning ICML ’09 (pp. 793–800). New York: ACM.
Petrik, M., Taylor, G., Parr, R., & Zilberstein, S. (2010). Feature selection using regularization in approximate linear programs for markov decision processes. In Proceedings of the 27th International Conference on Machine Learning (pp. 871–878).
Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimensionality (2nd ed.). Hoboken, NJ: Wiley.
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley-Interscience.
Servin, A., & Kudenko, D. (2008). Multi-agent reinforcement learning for intrusion detection: A case study and evaluation. In Proceedings of the European Conference on Artificial Intelligence (pp. 873–874).
Sher, G. I. (2012). Handbook of neuroevolution through Erlang. New York: Springer.
Stanley, K. O., & Miikkulainen, R. (2002). Efficient reinforcement learning through evolving neural network topologies. In Proceedings of GECCO (pp. 569–577).
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Tan, M., Hartley, M., Bister, M., & Deklerck, R. (2009). Automated feature selection in neuroevolution. Evolutionary Intelligence, 1(4), 271–292.
Tan, M., Deklerck, R., Jansen, B., & Cornelis, J. (2012). Analysis of a feature-deselective neuroevolution classifier (FD-NEAT) in a computer-aided lung nodule detection system for ct images. In T. Soule & J. H. Moore (Eds.), GECCO (Companion) (pp. 539–546). New York: ACM.
Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. JMLR, 10, 1633–1685.
Tesauro, G., Das, R., Chan, H., Kephart, J. O., Levine, D., III FLR, & Lefurgy, C. (2007). Managing power consumption and performance of computing systems using reinforcement learning. In NIPS.
Vigorito, C. M., & Barto, A. G. (2009). Incremental structure learning in factored MDPs with continuous states and actions. Tech. rep.: University of Massachusetts Amherst - Department of Computer Science.
Watkins, C. J. C. H., & Dayan, P. (1992). Technical note q-learning. Machine Learning, 8, 279–292.
Whiteson, S., & Stone, P. (2006). Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7, 877–917.
Whiteson, S., Stone, P., & Stanley, K. O. (2005). Automatic feature selection in neuroevolution. In Proceedings of GECCO (pp. 1225–1232).
Wright, R., Loscalzo, S., & Yu, L. (2011). Embedded incremental feature selection for reinforcement learning. In ICAART 2011 - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, Artificial Intelligence, Rome, Italy, January 28–30 (Vol. 1, pp. 263–268).
Xu, L., Yan, P., & Chang, T. (1988). Best first strategy for feature selection. In Proceedings of the Ninth International Conference on Pattern Recognition (pp. 706–708).
Acknowledgments
This work was performed under 13-RI-CRADA-13, and was supported in part through computational resources provided by the U.S. DoD HPCMP AFRL/RI Affiliated Resource Center. The authors would like to thank the anonymous reviewers for their helpful comments and suggestions, and Kevin Acunto for his work porting the RARS environment to Java.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Loscalzo, S., Wright, R. & Yu, L. Predictive feature selection for genetic policy search. Auton Agent Multi-Agent Syst 29, 754–786 (2015). https://doi.org/10.1007/s10458-014-9268-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-014-9268-y