ABSTRACT
This research evaluates reward shaping with unconventional formats of human input. Conventionally, a human teacher is assumed to provide numeric rewards for complete task training. However, there are limitations to this conventional format. Firstly, the continuous demand of numeric rewards is onerous. Secondly, it is limited in extracting useful knowledge from humans. In this research, we have tested three unconventional formats of human input, two to increase social appeal of reward shaping and one to efficiently extraction knowledge from humans. The preliminary results on simulated domains validate the usefulness of these formats in terms of user's comfort and learning performance.
- T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217--226. ACM, 2006. Google ScholarDigital Library
- W. B. Knox and P. Stone. Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture, pages 9--16. ACM, 2009. Google ScholarDigital Library
- A. Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, pages 278--287, 1999. Google ScholarDigital Library
- H. B. Suay and S. Chernova. Effect of human guidance and state space size on interactive reinforcement learning. In 2011 Ro-Man, pages 1--6. IEEE, 2011.Google ScholarCross Ref
- A. L. Thomaz and C. Breazeal. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI, volume 6, pages 1000--1005, 2006. Google ScholarDigital Library
Index Terms
- Unconventional Formats of Background Knowledge from Human Teacher in Reward Shaping
Recommendations
Potential Based Reward Shaping Using Learning to Rank
HRI '17: Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot InteractionThis paper presents a novel method for the computation of potential function using human input for potential based reward shaping. It defines a ranking over state space which is used to define a potential function. Specifically, it seeks multiple, ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Multi-agent, reward shaping for RoboCup KeepAway
AAMAS '11: The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory [2], potential-based reward shaping does not alter the Nash Equilibria of a stochastic ...
Comments