Abstract
The Value Iteration Network (VIN) is a neural network widely used in path-finding reinforcement learning problems. The planning module in VIN enables the network to understand the nature of a problem, thus giving the network an impressive generalization ability. However, reinforcement learning (RL) with VIN can not guarantee efficient training due to the network depth and max-pooling operation. A great network depth makes it harder for the network to learn from samples when using gradient descent algorithms. The max-pooling operation may increase the difficulty of learning negative rewards due to overestimation. This paper proposes a new neural network, Value Iteration Residual Network (VIRN) with Self-Attention, using a unique spatial self-attention module and aggressive iteration to solve the above-mentioned problems. A preliminary evaluation using Mr. Pac-Man demonstrated that VIRN effectively improved the training efficiency compared with VIN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arulkumaran, K., Peter Deisenroth, M., Brundage, M., Anthony Bharath, A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26â38 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book. The MIT Press, Cambridge, MA, USA (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84â90 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735â1780 (1997)
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Lee, D., Sugiyama, M., et al., (eds) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc., (2016)
Bertsekas, D.: Dynamic programming and optimal control, vol. 1. Athena Scientific (2012)
Shen, J., Hankui Zhuo, H., Xu, J., Zhong, B., Jialin Pan, S.: Transfer value iteration networks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5676â5683. AAAI Press (2020)
Jin, X., Lan, W., Wang, T., Yu, P.: Value iteration networks with double estimator for planetary rover path planning. Sensors (Basel, Switzerland), 21 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770â778 (2016)
Bellman, R.: A markovian decision process. J. Math. Mech. 6(5), 679â684 (1957)
Schleich, D., Klamt, T., Behnke, S.: Value iteration networks on multiple levels of abstraction. Robotics: Science and Systems XV, abs/1905.11068 (2019)
Niu, S., Chen, S., Guo, H., et al.: Generalized value iteration networks: life beyond lattices. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, (2018)
Hasselt, H.: Double q-learning. In: Advances in Neural Information Processing Systems, vol. 23, (2010)
Acknowledgement
This work was partially supported by JSPS KAKENHI, JSPS Research Fellowships for Young Scientists.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Âİ 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, J., Li, J., Mao, Z., Tei, K. (2023). Value Iteration Residual Network with Self-attention. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-031-35501-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-35501-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35500-4
Online ISBN: 978-3-031-35501-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)