University of AmsterdamUniversity of AmsterdamUvA

  • Terms of use
  • Contact

UvA-DARE (Digital Academic Repository)

  • Home
  • Advanced Search
  • Browse
  • My selection

Search UvA-DARE

item 1 out of 1
return to search results
Author
W. Shang
D. van der Wal
H. van HoofORCID logo
M. Welling
Year
2020
host editors
U. Brefeld
E. Fromont
A. Hotho
A. Knobbe
M. Maathuis
C. Robardet
Title
Stochastic Activation Actor Critic Methods
Event
Europen Conference on Machine Learning and Knowledge Discovery in Databases
Book/source title
Machine Learning and Knowledge Discovery in Databases
Book/source subtitle
European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019 : proceedings
Pages (from-to)
103-117
Publisher
Springer
Volume (Publisher)
III
ISBN
9783030461324
ISBN (electronic)
9783030461331
Series
Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence, 0302-9743, 11908
Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence, 0302-9743, 11908
Document type
Conference contribution
Faculty
Faculty of Science (FNWI)
Institute
Informatics Institute (IVI)
Abstract
Stochastic elements in reinforcement learning (RL) have shown promise to improve exploration and handling of uncertainty, such as the utilization of stochastic weights in NoisyNets and stochastic policies in the maximum entropy RL frameworks. Yet effective and general approaches to include such elements in actor-critic models are still lacking. Inspired by the aforementioned techniques, we propose an effective way to inject randomness into actor-critic models to improve general exploratory behavior and reflect environment uncertainty. Specifically, randomness is added at the level of intermediate activations that feed into both policy and value functions to achieve better correlated and more complex perturbations. The proposed framework also features flexibility and simplicity, which allows straightforward adaptation to a variety of tasks. We test several actor-critic models enhanced with stochastic activations and demonstrate their effectiveness in a wide range of Atari 2600 games, a continuous control problem and a car racing task. Lastly, in a qualitative analysis, we present evidence of the proposed model adapting the noise in the policy and value functions to reflect uncertainty and ambiguity in the environment.
URL
go to publisher's site
Link
Submitted manuscript
Language
English
Persistent Identifier
https://hdl.handle.net/11245.1/4019ed8f-9157-41d6-a8d3-9d2cca43ed04
Downloads
  • application/pdf download logo

    483(Submitted manuscript)

  • application/pdf download logo

    Shang2020_Chapter_StochasticActivationActorCriti(Final published version)

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

PrintPrint this pageShareShare via emailShare on facebookShare on linkedinShare on twitter
  • University library
  • About UvA-DARE
  • Disclaimer
Copyright UvA 2014