Abstract:
When we learn from unknown environment to collect reward, we face speed-accuracy trade-off for the decision-making that agents act. We will lose if we continue to act gre...Show MoreMetadata
Abstract:
When we learn from unknown environment to collect reward, we face speed-accuracy trade-off for the decision-making that agents act. We will lose if we continue to act greedily, but we cannot maximize reward if we search continually. From experience, it is assumed that human beings act with some kind of standards to cope with trade-off. Hence, we focused symmetric reasoning that is kind of Illogical cognitive properties peculiar to human beings, as a valid solution for speed-accuracy tradeoff. In this study, we simulated the N armed bandit problem as a simple decision-making problem, using Loosely Symmetric model (LS) which is a model of flexibly and loosely symmetric reasoning. In addition, with theoretical consideration for LS and the the change of the reference point as an idea, we developed LS with Variable Reference (LSVR) as a newly improved model, and simulated this model. As a result, In case it has many choices, we confirmed that LSVR can collect overwhelming reward than UCB1 that is the excellent decision-making model used Go AI.
Date of Conference: 20-24 November 2012
Date Added to IEEE Xplore: 22 April 2013
ISBN Information: