skip to main content
10.1145/3512290.3528761acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Dynamics-aware novelty search with behavior repulsion

Published: 08 July 2022 Publication History

Abstract

Searching solutions for the task with sparse or deceptive rewards is a fundamental problem in Evolutionary Algorithms (EA) and Reinforcement Learning (RL). Existing methods in RL have been proposed to enhance the exploration by encouraging agents to obtain novel states. However, solely seeking a single local optimal solution could be insufficient for the tasks with the deceptive local optima. Novelty-Search (NS) and Quality-Diversity (QD) have shown promising results for finding diverse solutions with different behavioral characteristics. However, manually defining the task-specific behavior description limits these methods to low-dimensional tasks. This paper presents Dynamics-aware Novelty Search with Behavior Repulsion (DANSBR), a hybrid algorithm that evolves high-performing solutions by introducing a generalized novelty measurement and a bidirectional gradient-based mutation operator based on the Quality-Diversity paradigm. The novelty of a single solution is defined as the prediction error of an approximate dynamic model in the task-agnostic behavior space. The mutation operator drives the solution to behave differently or obtain better performance in a sample-eficient manner. As a result of better exploration, our approach outperforms several baselines on high-dimensional continuous control tasks with sparse rewards. Empirical results also demonstrate that DANSBR improves the performance on the task with deceptive rewards.

References

[1]
Paolo G, Coninx A, Doncieux S, et al. Sparse reward exploration via novelty search and emitters[J]. arXiv preprint arXiv:2102.03140, 2021.
[2]
Ecoffet A, Huizinga J, Lehman J, et al. First return, then explore[J]. Nature, 2021, 590(7847): 580--586.
[3]
Plappert M, Houthooft R, Dhariwal P, et al. Parameter space noise for exploration[J]. arXiv preprint arXiv:1706.01905, 2017.
[4]
Conti E, Madhavan V, Such F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. arXiv preprint arXiv:1712.06560, 2017.
[5]
Oh J, Guo Y, Singh S, et al. Self-imitation learning[C]//International Conference on Machine Learning. PMLR, 2018: 3878--3887.
[6]
Singh S, Barto A G, Chentanez N. Intrinsically motivated reinforcement learning[R]. MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE, 2005.
[7]
Şimşek Ö, Barto A G. An intrinsic reward mechanism for efficient explo-ration[C]//Proceedings of the 23rd international conference on Machine learning. 2006: 833--840.
[8]
Bellemare M, Srinivasan S, Ostrovski G, et al. Unifying count-based exploration and intrinsic motivation[J]. Advances in neural information processing systems, 2016, 29: 1471--1479.
[9]
Houthooft R, Chen X, Duan Y, et al. Vime: Variational information maximizing exploration[J]. arXiv preprint arXiv:1605.09674, 2016.
[10]
Ostrovski G, Bellemare M G, Oord A, et al. Count-based exploration with neural density models[C]//International conference on machine learning. PMLR, 2017: 2721--2730.
[11]
Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//International conference on machine learning. PMLR, 2017: 2778--2787.
[12]
Burda Y, Edwards H, Storkey A, et al. Exploration by random network distillation[J]. arXiv preprint arXiv:1810.12894, 2018.
[13]
Raileanu R, Rocktäschel T. RIDE: Rewarding impact-driven exploration for procedurally-generated environments[J]. arXiv preprint arXiv:2002.12292, 2020.
[14]
Sukhbaatar S, Lin Z, Kostrikov I, et al. Intrinsic motivation and automatic curricula via asymmetric self-play[J]. arXiv preprint arXiv:1703.05407, 2017.
[15]
Flet-Berliac Y, Ferret J, Pietquin O, et al. Adversarially guided actor-critic[J]. arXiv preprint arXiv:2102.04376, 2021.
[16]
Zheng Z, Oh J, Hessel M, et al. What can learned intrinsic rewards capture?[C]//International Conference on Machine Learning. PMLR, 2020: 11436--11446.
[17]
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861--1870.
[18]
Pathak D, Gandhi D, Gupta A. Self-supervised exploration via disagreement[C]//International conference on machine learning. PMLR, 2019: 5062--5071.
[19]
Zhao R, Tresp V. Curiosity-driven experience prioritization via density estimation[J]. arXiv preprint arXiv:1902.08039, 2019.
[20]
Cully A, Clune J, Tarapore D, et al. Robots that can adapt like animals[J]. Nature, 2015, 521(7553): 503--507.
[21]
Gomez F J. Sustaining diversity using behavioral information distance[C]//Proceedings of the 11th Annual conference on Genetic and evolutionary computation. 2009: 113--120.
[22]
Pugh J K, Soros L B, Stanley K O. Quality diversity: A new frontier for evolutionary computation[J]. Frontiers in Robotics and AI, 2016, 3: 40.
[23]
Lehman J, Stanley K O. Exploiting open-endedness to solve problems through the search for novelty[C]//ALIFE. 2008: 329--336.
[24]
Le Goff L K, Buchanan E, Hart E, et al. Sample and time eficient policy learning with CMA-ES and Bayesian Optimisation[C]//Artificial Life Conference Proceedings. One Rogers Street, Cambridge, MA 02142-1209 USA journals-info@ mit. edu: MIT Press, 2020: 432--440.
[25]
Goff L K L, Hart E. On the challenges of jointly optimising robot morphology and control using a hierarchical optimisation scheme[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2021: 1498--1502.
[26]
Lehman J, Stanley K O. Evolving a diversity of virtual creatures through novelty search and local competition[C]//Proceedings of the 13th annual conference on Genetic and evolutionary computation. 2011: 211--218.
[27]
Mouret J B, Clune J. Illuminating search spaces by mapping elites[J]. arXiv preprint arXiv:1504.04909, 2015.
[28]
Abdi H, Williams L J. Principal component analysis[J]. Wiley interdisciplinary reviews: computational statistics, 2010, 2(4): 433--459.
[29]
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. science, 2006, 313(5786): 504--507.
[30]
Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 81--89.
[31]
Cully A, Demiris Y. Hierarchical behavioral repertoires with unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2018: 69--76.
[32]
Paolo G, Laflaquiere A, Coninx A, et al. Unsupervised learning and exploration of reachable outcome space[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020: 2379--2385.
[33]
Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 81--89.
[34]
Paolo G, Coninx A, Laflaquière A, et al. Discovering and Exploiting Sparse Rewards in a Learned Behavior Space[J]. arXiv preprint arXiv:2111.01919, 2021.
[35]
Grillotti L, Cully A. Unsupervised Behaviour Discovery with Quality-Diversity Optimisation[J]. arXiv preprint arXiv:2106.05648, 2021.
[36]
Hansen N. The CMA evolution strategy: A tutorial[J]. arXiv preprint arXiv:1604.00772, 2016.
[37]
Bossens D M, Mouret J B, Tarapore D. Learning behaviour-performance maps with meta-evolution[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020: 49--57.
[38]
Bossens D M, Tarapore D. Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective[J]. arXiv preprint arXiv:2109.03918, 2021.
[39]
Khadka S, Tumer K. Evolution-guided policy gradient in reinforcement learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018: 1196--1208.
[40]
Rezende D, Mohamed S. Variational inference with normalizing flows[C]//International conference on machine learning. PMLR, 2015: 1530--1538.
[41]
Oudeyer P Y, Smith L B. How evolution may work through curiosity-driven developmental process[J]. Topics in Cognitive Science, 2016, 8(2): 492--502.
[42]
Péré A, Forestier S, Sigaud O, et al. Unsupervised learning of goal spaces for intrinsically motivated goal exploration[J]. arXiv preprint arXiv:1803.00781, 2018.
[43]
Meyerson E, Lehman J, Miikkulainen R. Learning behavior characterizations for novelty search[C]//Proceedings of the Genetic and Evolutionary Computation Conference 2016. 2016: 149--156.
[44]
Cully A. Multi-emitter MAP-elites: improving quality, diversity and data efficiency with heterogeneous sets of emitters[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2021: 84--92.
[45]
Parker-Holder J, Pacchiano A, Choromanski K, et al. Effective Diversity in Population Based Reinforcement Learning[J]. arXiv preprint arXiv:2002.00632, 2020.
[46]
Hong Z W, Shann T Y, Su S Y, et al. Diversity-driven exploration strategy for deep reinforcement learning[J]. arXiv preprint arXiv:1802.04564, 2018.
[47]
Conti E, Madhavan V, Such F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. arXiv preprint arXiv:1712.06560, 2017.
[48]
Liu Q, Liu X, Cai G. PNS: Population-Guided Novelty Search Learning Method for Reinforcement Learning[J]. arXiv preprint arXiv:1811.10264, 2018.
[49]
Kulesza A, Taskar B. Determinantal point processes for machine learning[J]. arXiv preprint arXiv:1207.6083, 2012.
[50]
Salimans T, Ho J, Chen X, et al. Evolution strategies as a scalable alternative to reinforcement learning[J]. arXiv preprint arXiv:1703.03864, 2017.
[51]
Gangwani T, Peng J. Policy optimization by genetic distillation[J]. arXiv preprint arXiv:1711.01012, 2017.
[52]
Bodnar C, Day B, Lió P. Proximal distilled evolutionary reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(04): 3283--3290.
[53]
Ross S, Gordon G, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 627--635.
[54]
Nilsson O, Cully A. Policy gradient assisted MAP-Elites[C]//Proceedings of the Genetic and Evolutionary Computation Conference. 2021: 866--875.
[55]
Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actorcritic methods[C]//International Conference on Machine Learning. PMLR, 2018: 1587--1596.
[56]
Colas C, Madhavan V, Huizinga J, et al. Scaling map-elites to deep neuroevolution[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020: 67--75.
[57]
Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv:1812.05905, 2018.
[58]
Osa T, Pajarinen J, Neumann G, et al. An algorithmic perspective on imitation learning[J]. arXiv preprint arXiv:1811.06711, 2018.
[59]
Todorov E, Erez T, Tassa Y. Mujoco: A physics engine for model-based control[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012: 5026--5033.
[60]
Mazoure B, Doan T, Durand A, et al. Leveraging exploration in off-policy algorithms via normalizing flows[C]//Conference on Robot Learning. PMLR, 2020: 430--444.
[61]
Hong Z W, Shann T Y, Su S Y, et al. Diversity-driven exploration strategy for deep reinforcement learning[J]. arXiv preprint arXiv:1802.04564, 2018.

Cited By

View all
  • (2024)Leveraging More of Biology in Evolutionary Reinforcement LearningApplications of Evolutionary Computation10.1007/978-3-031-56855-8_6(91-114)Online publication date: 3-Mar-2024

Index Terms

  1. Dynamics-aware novelty search with behavior repulsion

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2022
    1472 pages
    ISBN:9781450392372
    DOI:10.1145/3512290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evolutionary algorithm
    2. exploration
    3. quality diversity
    4. reinforcement learning

    Qualifiers

    • Research-article

    Conference

    GECCO '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Leveraging More of Biology in Evolutionary Reinforcement LearningApplications of Evolutionary Computation10.1007/978-3-031-56855-8_6(91-114)Online publication date: 3-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media