Skip to main content

Discovering Strategy in Navigation Problem

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1071))

Included in the following conference series:

  • 1078 Accesses

Abstract

This paper explores ways to discover strategy from a state-action-state-reward log recorded during a reinforcement learning session. The term strategy here implies that we are interested not only in a one-step state-action but also a fruitful sequence of state-actions. Traditional RL has proved that it can successfully learn a good sequence of actions. However, it is often observed that some of the action sequences learned could be more effective. For example, an effective five-step navigation to the north direction can be achieved in thousands of ways if there are no other constraints since an agent could move in numerous tactics to achieve the same end result. Traditional RL such as value learning or state-action value learning does not directly address this issue. In this preliminary experiment, sets of state-action (i.e., a one-step policy) are extracted from 10,446 records, grouped together and then joined together forming a directed graph. This graph summarizes the policy learned by the agent. We argue that strategy could be extracted from the analysis of this graph network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Atyabi, A., Phon-Amnuaisuk, S., Ho, C.K.: Navigating a robotic swarm in an uncharted 2D landscape. Appl. Soft Comput. 10(1), 149–169 (2010). https://doi.org/10.1016/j.asoc.2009.06.017

    Article  MATH  Google Scholar 

  2. Dereszynski, E., Hostetler, J., Fern, A., Dietterich, T., Hoang, T.T., Udarbe, M.: Learning probabilistic behavior models in real-time strategy games. In: Proceedings of the Seventh AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2011, pp. 20–25. AAAI Press (2011). http://dl.acm.org/citation.cfm?id=3014589.3014594

  3. Dharmarajan, K., Dorairangaswamy, M.A.: Web user navigation pattern behavior prediction using nearest neighbor interchange from weblog data. Int. J. Pure Appl. Math. 116, 761–775 (2017)

    Google Scholar 

  4. Dhiman, V., Banerjee, S., Griffin, B., Siskind, J.M., Corso, J.J.: A critical investigation of deep reinforcement learning for navigation. CoRR abs/1802.02274 (2018)

    Google Scholar 

  5. Duchoň, F., et al.: Path planning with modified a star algorithm for a mobile robot. Procedia Eng. 96, 59–69 (2014). https://doi.org/10.1016/j.proeng.2014.12.098

    Article  Google Scholar 

  6. Feldman, J.A., Sproull, R.F.: Decision theory and artificial intelligence II: the hungry monkey. Cogn. Sci. 1(2), 158–192 (1977)

    Article  Google Scholar 

  7. Glavin, F.G., Madden, M.G.: Adaptive shooting for bots in first person shooter games using reinforcement learning. IEEE Trans. Comput. Intell. AI Games 7(2), 180–192 (2015)

    Article  Google Scholar 

  8. Guruji, A.K., Agarwal, H., Parsediya, D.: Time-efficient \(A\ast \) algorithm for robot path planning. Procedia Technol. 23, 144–149 (2016). https://doi.org/10.1016/j.protcy.2016.03.010

    Article  Google Scholar 

  9. Hsieh, J.L., Sun, C.T.: Building a player strategy model by analyzing replays of real-time strategy games. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, June 2008. https://doi.org/10.1109/ijcnn.2008.4634237

  10. Hussein, A., Elyan, E., Gaber, M.M., Jayne, C.: Deep imitation learning for 3D navigation tasks. Neural Comput. Appl. 29(7), 389–404 (2018)

    Article  Google Scholar 

  11. Li, Y., Lin, Q., Zhong, G., Duan, D., Jin, Y., Bi, W.: A directed labeled graph frequent pattern mining algorithm based on minimum code. In: 2009 Third International Conference on Multimedia and Ubiquitous Engineering. IEEE, June 2009. https://doi.org/10.1109/mue.2009.67

  12. McCarthy, J.: Situations, actions, and causal laws, p. 14, July 1963

    Google Scholar 

  13. Phon-Amnuaisuk, S.: Evolving and discovering Tetris gameplay strategies. Procedia Comput. Sci. 60, 458–467 (2015). https://doi.org/10.1016/j.procs.2015.08.167

    Article  Google Scholar 

  14. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson, London (2010)

    MATH  Google Scholar 

  15. Haji Mohd Sani, N., Phon-Amnuaisuk, S., Au, T.W., Tan, E.L.: Learning to navigate in a 3D environment. In: Sombattheera, C., Stolzenburg, F., Lin, F., Nayak, A. (eds.) MIWAI 2016. LNCS (LNAI), vol. 10053, pp. 271–278. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49397-8_23

    Chapter  Google Scholar 

  16. Sani, N.H.M., Phon-Amnuaisuk, S., Au, T.W., Tan, E.L.: Learning to navigate in 3D virtual environment using Q-learning. In: Omar, S., Haji Suhaili, W.S., Phon-Amnuaisuk, S. (eds.) CIIS 2018. AISC, vol. 888, pp. 191–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03302-6_17

    Chapter  Google Scholar 

  17. Tekin, U., Buzluca, F.: A graph mining approach for detecting identical design structures in object-oriented design models. Sci. Comput. Program. 95, 406–425 (2014). https://doi.org/10.1016/j.scico.2013.09.015

    Article  Google Scholar 

  18. Wang, D., Tan, A.H.: Creating autonomous adaptive agents in a real-time first-person shooter computer game. IEEE Trans. Comput. Intell. AI Games 7(2), 123–138 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nurulhidayati Haji Mohd Sani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haji Mohd Sani, N., Phon-Amnuaisuk, S., Au, T.W. (2019). Discovering Strategy in Navigation Problem. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2019. Communications in Computer and Information Science, vol 1071. Springer, Singapore. https://doi.org/10.1007/978-981-32-9563-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-32-9563-6_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-32-9562-9

  • Online ISBN: 978-981-32-9563-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics