Abstract
This paper explores ways to discover strategy from a state-action-state-reward log recorded during a reinforcement learning session. The term strategy here implies that we are interested not only in a one-step state-action but also a fruitful sequence of state-actions. Traditional RL has proved that it can successfully learn a good sequence of actions. However, it is often observed that some of the action sequences learned could be more effective. For example, an effective five-step navigation to the north direction can be achieved in thousands of ways if there are no other constraints since an agent could move in numerous tactics to achieve the same end result. Traditional RL such as value learning or state-action value learning does not directly address this issue. In this preliminary experiment, sets of state-action (i.e., a one-step policy) are extracted from 10,446 records, grouped together and then joined together forming a directed graph. This graph summarizes the policy learned by the agent. We argue that strategy could be extracted from the analysis of this graph network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atyabi, A., Phon-Amnuaisuk, S., Ho, C.K.: Navigating a robotic swarm in an uncharted 2D landscape. Appl. Soft Comput. 10(1), 149–169 (2010). https://doi.org/10.1016/j.asoc.2009.06.017
Dereszynski, E., Hostetler, J., Fern, A., Dietterich, T., Hoang, T.T., Udarbe, M.: Learning probabilistic behavior models in real-time strategy games. In: Proceedings of the Seventh AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2011, pp. 20–25. AAAI Press (2011). http://dl.acm.org/citation.cfm?id=3014589.3014594
Dharmarajan, K., Dorairangaswamy, M.A.: Web user navigation pattern behavior prediction using nearest neighbor interchange from weblog data. Int. J. Pure Appl. Math. 116, 761–775 (2017)
Dhiman, V., Banerjee, S., Griffin, B., Siskind, J.M., Corso, J.J.: A critical investigation of deep reinforcement learning for navigation. CoRR abs/1802.02274 (2018)
Duchoň, F., et al.: Path planning with modified a star algorithm for a mobile robot. Procedia Eng. 96, 59–69 (2014). https://doi.org/10.1016/j.proeng.2014.12.098
Feldman, J.A., Sproull, R.F.: Decision theory and artificial intelligence II: the hungry monkey. Cogn. Sci. 1(2), 158–192 (1977)
Glavin, F.G., Madden, M.G.: Adaptive shooting for bots in first person shooter games using reinforcement learning. IEEE Trans. Comput. Intell. AI Games 7(2), 180–192 (2015)
Guruji, A.K., Agarwal, H., Parsediya, D.: Time-efficient \(A\ast \) algorithm for robot path planning. Procedia Technol. 23, 144–149 (2016). https://doi.org/10.1016/j.protcy.2016.03.010
Hsieh, J.L., Sun, C.T.: Building a player strategy model by analyzing replays of real-time strategy games. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008. https://doi.org/10.1109/ijcnn.2008.4634237
Hussein, A., Elyan, E., Gaber, M.M., Jayne, C.: Deep imitation learning for 3D navigation tasks. Neural Comput. Appl. 29(7), 389–404 (2018)
Li, Y., Lin, Q., Zhong, G., Duan, D., Jin, Y., Bi, W.: A directed labeled graph frequent pattern mining algorithm based on minimum code. In: 2009 Third International Conference on Multimedia and Ubiquitous Engineering. IEEE, June 2009. https://doi.org/10.1109/mue.2009.67
McCarthy, J.: Situations, actions, and causal laws, p. 14, July 1963
Phon-Amnuaisuk, S.: Evolving and discovering Tetris gameplay strategies. Procedia Comput. Sci. 60, 458–467 (2015). https://doi.org/10.1016/j.procs.2015.08.167
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson, London (2010)
Haji Mohd Sani, N., Phon-Amnuaisuk, S., Au, T.W., Tan, E.L.: Learning to navigate in a 3D environment. In: Sombattheera, C., Stolzenburg, F., Lin, F., Nayak, A. (eds.) MIWAI 2016. LNCS (LNAI), vol. 10053, pp. 271–278. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49397-8_23
Sani, N.H.M., Phon-Amnuaisuk, S., Au, T.W., Tan, E.L.: Learning to navigate in 3D virtual environment using Q-learning. In: Omar, S., Haji Suhaili, W.S., Phon-Amnuaisuk, S. (eds.) CIIS 2018. AISC, vol. 888, pp. 191–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03302-6_17
Tekin, U., Buzluca, F.: A graph mining approach for detecting identical design structures in object-oriented design models. Sci. Comput. Program. 95, 406–425 (2014). https://doi.org/10.1016/j.scico.2013.09.015
Wang, D., Tan, A.H.: Creating autonomous adaptive agents in a real-time first-person shooter computer game. IEEE Trans. Comput. Intell. AI Games 7(2), 123–138 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Haji Mohd Sani, N., Phon-Amnuaisuk, S., Au, T.W. (2019). Discovering Strategy in Navigation Problem. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2019. Communications in Computer and Information Science, vol 1071. Springer, Singapore. https://doi.org/10.1007/978-981-32-9563-6_24
Download citation
DOI: https://doi.org/10.1007/978-981-32-9563-6_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9562-9
Online ISBN: 978-981-32-9563-6
eBook Packages: Computer ScienceComputer Science (R0)