Abstract:
This paper proposes an architecture where each agent maintains a cooperative tendency table (CTT). In the process of learning, agents need not communicate with each other...Show MoreMetadata
Abstract:
This paper proposes an architecture where each agent maintains a cooperative tendency table (CTT). In the process of learning, agents need not communicate with each other but observe partners' actions while taking actions. If one of the agents meets a bad situation, such as bumping onto obstacles after taking an action. In such a case, agents will receive a bad reward from the environment. Similarly, if one agent reaches a goal after taking an action, agents obtain a good reward instead. Rewards are used to update the policy and to adjust cooperative tendency values which are recorded in the individual CTT. When an agent perceives a state, the corresponding cooperative tendency value, and the Q-value are merged to a Shaped-Q value. The action with maximal Shaped-Q value in this state will be selected. After agents take actions and receive a reward, agents update their own CTTs. Therefore, agents could use this method to reach a consensus more quickly to enhance learning efficiency and reduce the occurrence of stagnation. The simulation results demonstrate that the proposed method can speed up the learning process and solve the problem of huge memory space consumption to some degrees. As well, it can make agents complete the task together more efficiently.
Date of Conference: 05-08 October 2017
Date Added to IEEE Xplore: 30 November 2017
ISBN Information: