research-article

Dynamics-aware novelty search with behavior repulsion

Authors:

Wei LiAuthors Info & Claims

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1112 - 1120

https://doi.org/10.1145/3512290.3528761

Published: 08 July 2022 Publication History

Abstract

Searching solutions for the task with sparse or deceptive rewards is a fundamental problem in Evolutionary Algorithms (EA) and Reinforcement Learning (RL). Existing methods in RL have been proposed to enhance the exploration by encouraging agents to obtain novel states. However, solely seeking a single local optimal solution could be insufficient for the tasks with the deceptive local optima. Novelty-Search (NS) and Quality-Diversity (QD) have shown promising results for finding diverse solutions with different behavioral characteristics. However, manually defining the task-specific behavior description limits these methods to low-dimensional tasks. This paper presents Dynamics-aware Novelty Search with Behavior Repulsion (DANSBR), a hybrid algorithm that evolves high-performing solutions by introducing a generalized novelty measurement and a bidirectional gradient-based mutation operator based on the Quality-Diversity paradigm. The novelty of a single solution is defined as the prediction error of an approximate dynamic model in the task-agnostic behavior space. The mutation operator drives the solution to behave differently or obtain better performance in a sample-eficient manner. As a result of better exploration, our approach outperforms several baselines on high-dimensional continuous control tasks with sparse rewards. Empirical results also demonstrate that DANSBR improves the performance on the task with deceptive rewards.

References

[1]

Paolo G, Coninx A, Doncieux S, et al. Sparse reward exploration via novelty search and emitters[J]. arXiv preprint arXiv:2102.03140, 2021.

[2]

Ecoffet A, Huizinga J, Lehman J, et al. First return, then explore[J]. Nature, 2021, 590(7847): 580--586.

[3]

Plappert M, Houthooft R, Dhariwal P, et al. Parameter space noise for exploration[J]. arXiv preprint arXiv:1706.01905, 2017.

[4]

Conti E, Madhavan V, Such F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. arXiv preprint arXiv:1712.06560, 2017.

[5]

Oh J, Guo Y, Singh S, et al. Self-imitation learning[C]//International Conference on Machine Learning. PMLR, 2018: 3878--3887.

[6]

Singh S, Barto A G, Chentanez N. Intrinsically motivated reinforcement learning[R]. MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE, 2005.

[7]

Şimşek Ö, Barto A G. An intrinsic reward mechanism for efficient explo-ration[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 833--840.

[8]

Bellemare M, Srinivasan S, Ostrovski G, et al. Unifying count-based exploration and intrinsic motivation[J]. Advances in neural information processing systems, 2016, 29: 1471--1479.

[9]

Houthooft R, Chen X, Duan Y, et al. Vime: Variational information maximizing exploration[J]. arXiv preprint arXiv:1605.09674, 2016.

[10]

Ostrovski G, Bellemare M G, Oord A, et al. Count-based exploration with neural density models[C]//International conference on machine learning. PMLR, 2017: 2721--2730.

[11]

Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//International conference on machine learning. PMLR, 2017: 2778--2787.

[12]

Burda Y, Edwards H, Storkey A, et al. Exploration by random network distillation[J]. arXiv preprint arXiv:1810.12894, 2018.

[13]

Raileanu R, Rocktäschel T. RIDE: Rewarding impact-driven exploration for procedurally-generated environments[J]. arXiv preprint arXiv:2002.12292, 2020.

[14]

Sukhbaatar S, Lin Z, Kostrikov I, et al. Intrinsic motivation and automatic curricula via asymmetric self-play[J]. arXiv preprint arXiv:1703.05407, 2017.

[15]

Flet-Berliac Y, Ferret J, Pietquin O, et al. Adversarially guided actor-critic[J]. arXiv preprint arXiv:2102.04376, 2021.

[16]

Zheng Z, Oh J, Hessel M, et al. What can learned intrinsic rewards capture?[C]//International Conference on Machine Learning. PMLR, 2020: 11436--11446.

[17]

Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861--1870.

[18]

Pathak D, Gandhi D, Gupta A. Self-supervised exploration via disagreement[C]//International conference on machine learning. PMLR, 2019: 5062--5071.

[19]

Zhao R, Tresp V. Curiosity-driven experience prioritization via density estimation[J]. arXiv preprint arXiv:1902.08039, 2019.

[20]

Cully A, Clune J, Tarapore D, et al. Robots that can adapt like animals[J]. Nature, 2015, 521(7553): 503--507.

[21]

Gomez F J. Sustaining diversity using behavioral information distance[C]//Proceedings of the 11th Annual conference on Genetic and evolutionary computation. 2009: 113--120.

[22]

Pugh J K, Soros L B, Stanley K O. Quality diversity: A new frontier for evolutionary computation[J]. Frontiers in Robotics and AI, 2016, 3: 40.

[23]

Lehman J, Stanley K O. Exploiting open-endedness to solve problems through the search for novelty[C]//ALIFE. 2008: 329--336.

[24]

Le Goff L K, Buchanan E, Hart E, et al. Sample and time eficient policy learning with CMA-ES and Bayesian Optimisation[C]//Artificial Life Conference Proceedings. One Rogers Street, Cambridge, MA 02142-1209 USA journals-info@ mit. edu: MIT Press, 2020: 432--440.

[25]

Goff L K L, Hart E. On the challenges of jointly optimising robot morphology and control using a hierarchical optimisation scheme[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2021: 1498--1502.

[26]

Lehman J, Stanley K O. Evolving a diversity of virtual creatures through novelty search and local competition[C]//Proceedings of the 13th annual conference on Genetic and evolutionary computation. 2011: 211--218.

[27]

Mouret J B, Clune J. Illuminating search spaces by mapping elites[J]. arXiv preprint arXiv:1504.04909, 2015.

[28]

Abdi H, Williams L J. Principal component analysis[J]. Wiley interdisciplinary reviews: computational statistics, 2010, 2(4): 433--459.

Digital Library

[29]

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. science, 2006, 313(5786): 504--507.

[30]

Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 81--89.

[31]

Cully A, Demiris Y. Hierarchical behavioral repertoires with unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2018: 69--76.

[32]

Paolo G, Laflaquiere A, Coninx A, et al. Unsupervised learning and exploration of reachable outcome space[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020: 2379--2385.

[33]

Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 81--89.

[34]

Paolo G, Coninx A, Laflaquière A, et al. Discovering and Exploiting Sparse Rewards in a Learned Behavior Space[J]. arXiv preprint arXiv:2111.01919, 2021.

[35]

Grillotti L, Cully A. Unsupervised Behaviour Discovery with Quality-Diversity Optimisation[J]. arXiv preprint arXiv:2106.05648, 2021.

[36]

Hansen N. The CMA evolution strategy: A tutorial[J]. arXiv preprint arXiv:1604.00772, 2016.

[37]

Bossens D M, Mouret J B, Tarapore D. Learning behaviour-performance maps with meta-evolution[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020: 49--57.

[38]

Bossens D M, Tarapore D. Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective[J]. arXiv preprint arXiv:2109.03918, 2021.

[39]

Khadka S, Tumer K. Evolution-guided policy gradient in reinforcement learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018: 1196--1208.

[40]

Rezende D, Mohamed S. Variational inference with normalizing flows[C]//International conference on machine learning. PMLR, 2015: 1530--1538.

[41]

Oudeyer P Y, Smith L B. How evolution may work through curiosity-driven developmental process[J]. Topics in Cognitive Science, 2016, 8(2): 492--502.

[42]

Péré A, Forestier S, Sigaud O, et al. Unsupervised learning of goal spaces for intrinsically motivated goal exploration[J]. arXiv preprint arXiv:1803.00781, 2018.

[43]

Meyerson E, Lehman J, Miikkulainen R. Learning behavior characterizations for novelty search[C]//Proceedings of the Genetic and Evolutionary Computation Conference 2016. 2016: 149--156.

[44]

Cully A. Multi-emitter MAP-elites: improving quality, diversity and data efficiency with heterogeneous sets of emitters[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2021: 84--92.

[45]

Parker-Holder J, Pacchiano A, Choromanski K, et al. Effective Diversity in Population Based Reinforcement Learning[J]. arXiv preprint arXiv:2002.00632, 2020.

[46]

Hong Z W, Shann T Y, Su S Y, et al. Diversity-driven exploration strategy for deep reinforcement learning[J]. arXiv preprint arXiv:1802.04564, 2018.

[47]

Conti E, Madhavan V, Such F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. arXiv preprint arXiv:1712.06560, 2017.

[48]

Liu Q, Liu X, Cai G. PNS: Population-Guided Novelty Search Learning Method for Reinforcement Learning[J]. arXiv preprint arXiv:1811.10264, 2018.

[49]

Kulesza A, Taskar B. Determinantal point processes for machine learning[J]. arXiv preprint arXiv:1207.6083, 2012.

[50]

Salimans T, Ho J, Chen X, et al. Evolution strategies as a scalable alternative to reinforcement learning[J]. arXiv preprint arXiv:1703.03864, 2017.

[51]

Gangwani T, Peng J. Policy optimization by genetic distillation[J]. arXiv preprint arXiv:1711.01012, 2017.

[52]

Bodnar C, Day B, Lió P. Proximal distilled evolutionary reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(04): 3283--3290.

[53]

Ross S, Gordon G, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 627--635.

[54]

Nilsson O, Cully A. Policy gradient assisted MAP-Elites[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2021: 866--875.

[55]

Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actorcritic methods[C]//International Conference on Machine Learning. PMLR, 2018: 1587--1596.

[56]

Colas C, Madhavan V, Huizinga J, et al. Scaling map-elites to deep neuroevolution[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020: 67--75.

[57]

Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv:1812.05905, 2018.

[58]

Osa T, Pajarinen J, Neumann G, et al. An algorithmic perspective on imitation learning[J]. arXiv preprint arXiv:1811.06711, 2018.

[59]

Todorov E, Erez T, Tassa Y. Mujoco: A physics engine for model-based control[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012: 5026--5033.

[60]

Mazoure B, Doan T, Durand A, et al. Leveraging exploration in off-policy algorithms via normalizing flows[C]//Conference on Robot Learning. PMLR, 2020: 430--444.

[61]

Hong Z W, Shann T Y, Su S Y, et al. Diversity-driven exploration strategy for deep reinforcement learning[J]. arXiv preprint arXiv:1802.04564, 2018.

Cited By

Gašperov BĐurasević MJakobovic D(2024)Leveraging More of Biology in Evolutionary Reinforcement LearningApplications of Evolutionary Computation10.1007/978-3-031-56855-8_6(91-114)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56855-8_6

Index Terms

Dynamics-aware novelty search with behavior repulsion
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies

Recommendations

Why and how to measure exploration in behavioral space
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Exploration and exploitation are two complementary aspects of Evolutionary Algorithms. Exploration, in particular, is promoted by specific diversity keeping mechanisms generally relying on the genotype or the fitness value. Recent works suggest that, in ...
Learning individual mating preferences
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Mate selection is a key step in evolutionary algorithms which traditionally has been panmictic and based solely on fitness. Various mate selection techniques have been published which show improved performance due to the introduction of mate ...
Sparse reward exploration via novelty search and emitters
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

July 2022

1472 pages

ISBN:9781450392372

DOI:10.1145/3512290

Editor:
Jonathan E. Fieldsend
University of Exeter
,
General Chair:
Markus Wagner
The University of Adelaide

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '22

Sponsor:

SIGEVO

GECCO '22: Genetic and Evolutionary Computation Conference

July 9 - 13, 2022

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
133
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gašperov BĐurasević MJakobovic D(2024)Leveraging More of Biology in Evolutionary Reinforcement LearningApplications of Evolutionary Computation10.1007/978-3-031-56855-8_6(91-114)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56855-8_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten