Abstract
Learning to process visual input for Deep Reinforcement Learning is challenging and training a neural network with nothing else but a sparse and delayed reward signal seems rather inappropriate. In this work, Deep Q-Networks are leveraged by several unsupervised machine learning methods that provide additional information for the training of the feature extraction stage to find a well suited representation of the input data. The influence of convolutional filters that were pretrained on a supervised classification task, a Convolutional Autoencoder and Slow Feature Analysis are investigated in an end-to-end architecture. Experiments are performed on five ViZDoom environments. We found that the unsupervised methods boost Deep Q-Networks significantly depending on the underlying task the agent has to fulfill. While pretrained filters improve object detection tasks, we find that Convolutional Autoencoders leverage navigation and orientation tasks. Combining these two approaches leads to an agent that performs well on all tested environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alvernaz, S., Togelius, J.: Autoencoder-augmented neuroevolution for visual Doom playing. CoRR abs/1707.03902 (2017). http://arxiv.org/abs/1707.03902
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM, New York (2001). https://doi.org/10.1145/502512.502546
Cuccu, G., Togelius, J., Cudré-Mauroux, P.: Playing atari with six neurons. In: International Conference on Autonomous Agents and MultiAgent Systems, pp. 998–1006. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2019)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766, December 2015. https://doi.org/10.1109/ICCV.2015.316
Franzius, M., Sprekeler, H., Wiskott, L.: Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Comput. Biol. 3(8), 1–18 (2007)
Franzius, M., Wilbert, N., Wiskott, L.: Invariant object recognition and pose estimation with slow feature analysis. Neural Comput. 23, 2289–2323 (2011)
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611, July 2017. https://doi.org/10.1109/CVPR.2017.699
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100. AAAI Press (2016)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647
Hinton, G., Sejnowski, T., Poggio, T.: Unsupervised Learning: Foundations of Neural Computation. A Bradford Book. MCGRAW HILL BOOK Company (1999)
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 341–348. IEEE, Santorini, September 2016
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Kulkarni, T.D., Saeedi, A., Gautam, S., Gershman, S.J.: Deep successor reinforcement learning. CoRR abs/1606.02396 (2016). http://arxiv.org/abs/1606.02396
Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2010. https://doi.org/10.1109/IJCNN.2010.5596468
Legenstein, R., Wilbert, N., Wiskott, L.: Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput. Biol. 6(8), e1000894 (2010)
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724, June 2014
Papoudakis, G., Chatzidimitriou, K.C., Mitkas, P.A.: Deep reinforcement learning for Doom using unsupervised auxiliary tasks. CoRR abs/1807.01960 (2018). http://arxiv.org/abs/1807.01960
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Saxe, A.M., Koh, P.W., Chen, Z., Bhand, M., Suresh, B., Ng, A.Y.: On random weights and unsupervised feature learning. In: International Conference on Machine Learning, pp. 1089–1096. Omnipress (2011)
Schüler, M., Hlynsson, H.D., Wiskott, L.: Gradient-based training of slow feature analysis by differentiable approximate whitening. CoRR abs/1808.08833 (2018). http://arxiv.org/abs/1808.08833
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Wiskott, L.: Learning invariance manifolds. In: Niklasson, L., Bodén, M., Ziemke, T. (eds.) ICANN 1998. PNC, pp. 555–560. Springer, London (1998). https://doi.org/10.1007/978-1-4471-1599-1_83
Wiskott, L., Sejnowski, T.: Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4), 715–770 (2002)
Wohlfarth, K., et al.: Dense cloud classification on multispectralsatellite imagery. In: IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS), August 2018. https://doi.org/10.1109/PRRS.2018.8486379
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 3320–3328. Curran Associates, Inc. (2014)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hakenes, S., Glasmachers, T. (2019). Boosting Reinforcement Learning with Unsupervised Feature Extraction. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)