ABSTRACT
Reinforcement Learning (RL) has experienced rapid advancements in recent years. The widely studied RL algorithms mainly focus on a single input form, such as pixel-based image input or symbolic vector input. These two forms have different characteristics and, in many scenarios, will appear together, while few RL algorithms have studied the problems with mixed input types. Specifically, in the scenario where both pixel and symbolic inputs are available, symbolic input usually offers abstract features with specific semantics, which is more conducive to the agent's focus. Conversely, pixel input provides more comprehensive information, enabling the agent to make well-informed decisions. Tailoring the processing approach based on the properties of these two input types can contribute to solving the problem more effectively. To tackle the above issue, we propose an Internal Logical Induction (ILI) framework that integrates deep RL and rule learning into one system. ILI utilizes the deep RL algorithm to process the pixel input and the rule learning algorithm to induce propositional logic knowledge from symbolic input. To efficiently combine these two mechanisms, we further adopt a reward shaping technique by treating valuable knowledge as intrinsic rewards for the RL procedure. Experimental results demonstrate that the ILI framework outperforms baseline approaches in RL problems with pixel-symbolic input, and its inductive knowledge exhibits transferability advantages when pixel input semantics change.
Supplemental Material
- Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Józefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. 2020. Learning Dexterous In-Hand Manipulation. The International Journal of Robotics Research, Vol. 39, 1 (2020), 3--20.Google ScholarDigital Library
- Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In Advances in Neural Information Processing Systems (NeurIPS). 5048--5058.Google ScholarDigital Library
- Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, Vol. 34, 6 (2017), 26--38.Google ScholarCross Ref
- Masataro Asai and Christian Muise. 2020. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS). In International Joint Conference on Artificial Intelligence (IJCAI). 2676--2682.Google ScholarCross Ref
- Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Zidek, and Remi Munos. 2018. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. In International Conference on Machine Learning (ICML). 501--510.Google Scholar
- Andre Barreto, Will Dabney, Remi Munos, Jonathan J Hunt, Tom Schaul, Hado P van Hasselt, and David Silver. 2017. Successor Features for Transfer in Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 4055--4065.Google Scholar
- Yuri Burda, Harrison Edwards, Amos J. Storkey, and Oleg Klimov. 2019. Exploration by random network distillation. In International Conference on Learning Representations (ICLR).Google Scholar
- Ning Chen, Jun Zhu, Fuchun Sun, and Eric P. Xing. 2012. Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, 12 (2012), 2365--2378.Google ScholarDigital Library
- Nuttapong Chentanez, Andrew Barto, and Satinder Singh. 2004. Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1281--1288.Google Scholar
- Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. 2022. Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4166--4173.Google Scholar
- William W. Cohen. 1995. Fast Effective Rule Induction. In International Conference on Machine Learning (ICML). 115--123.Google Scholar
- Peter Dayan. 1993. Improving Generalization for Temporal Difference Learning: The Successor Representation. In Advances in Neural Information Processing Systems (NeurIPS). 613--624.Google Scholar
- Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. 2022. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In Advances in Neural Information Processing Systems (NeurIPS). 5982--5994.Google Scholar
- Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. 2022. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Fernando Fernández and Manuela M. Veloso. 2006. Probabilistic policy reuse in a reinforcement learning agent. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727.Google Scholar
- Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. PathNet: Evolution Channels Gradient Descent in Super Neural Networks. arXiv preprint arXiv:1701.08734 (2017).Google Scholar
- Eibe Frank and Ian H. Witten. 1998. Generating Accurate Rule Sets Without Global Optimization. In International Conference on Machine Learning (ICML). 144--151.Google Scholar
- Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrac. 2012. Foundations of Rule Learning. Springer.Google Scholar
- Brian R. Gaines and Paul Compton. 1995. Induction of Ripple-Down Rules Applied to Modeling Large Databases. Journal of Intelligent Information Systems, Vol. 5, 3 (1995), 211--228.Google ScholarDigital Library
- Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, Vol. 11, 1 (2009), 10--18.Google ScholarDigital Library
- Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, and David Silver. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In AAAI Conference on Artificial Intelligence (AAAI). 3215--3222.Google Scholar
- Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2016. VIME: Variational Information Maximizing Exploration. In Advances in Neural Information Processing Systems (NeurIPS). 1109--1117.Google ScholarDigital Library
- Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, and Chongjie Zhang. 2021. Generalizable Episodic Memory for Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 4380--4390.Google Scholar
- Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Vol. 4 (1996), 237--285.Google ScholarDigital Library
- Akira Kinose, Masashi Okada, Ryo Okumura, and Tadahiro Taniguchi. 2022. Multi-View Dreaming: Multi-View World Model with Contrastive Learning. arXiv preprint arXiv:2203.11024 (2022).Google Scholar
- Phuc H. Le-Khac, Graham Healy, and Alan F. Smeaton. 2020. Contrastive Representation Learning: A Framework and Review. IEEE Aerospace Conference, Vol. 8 (2020), 193907--193934.Google Scholar
- Kimin Lee, Michael Laskin, Aravind Srinivas, and Pieter Abbeel. 2021. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 6131--6141.Google Scholar
- Minne Li, Lisheng Wu, Jun WANG, and Haitham Bou Ammar. 2019. Multi-View Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1418--1429.Google Scholar
- Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning. In AAAI Conference on Artificial Intelligence (AAAI). 2970--2977.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level Control Through Deep Reinforcement Learning. Nature, Vol. 518, 7540 (2015), 529--533.Google Scholar
- Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee. 2018. Self-Imitation Learning. In International Conference on Machine Learning (ICML). 3878--3887.Google Scholar
- Pouya Ghiasnezhad Omran, Kewen Wang, and Zhe Wang. 2021. An Embedding-Based Approach to Rule Learning in Knowledge Graphs. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2021), 1348--1359.Google ScholarCross Ref
- OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, and Lei Zhang. 2019. Solving Rubik's Cube with a Robot Hand. arXiv preprint arXiv:1910.07113 (2019).Google Scholar
- Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 12026--12035.Google ScholarCross Ref
- Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, and Chai Quek. 2022. Hierarchical Reinforcement Learning: A Comprehensive Survey. ACM Computing Surveys (CSUR), Vol. 54, 5 (2022), 109:1--109:35.Google ScholarDigital Library
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In International Conference on Machine Learning (ICML). 2778--2787.Google ScholarCross Ref
- J. Ross Quinlan. 1990. Learning Logical Definitions from Relations. Machine Learning, Vol. 5 (1990), 239--266.Google ScholarCross Ref
- Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, Vol. 22, 268 (2021), 1--8.Google Scholar
- Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).Google Scholar
- Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. arXiv preprint arXiv:1606.04671 (2016).Google Scholar
- Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. 2015. Universal Value Function Approximators. In International Conference on Machine Learning (ICML). 1312--1320.Google Scholar
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In International Conference on Learning Representations (ICLR).Google Scholar
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, Vol. 529, 7587 (2016), 484--489.Google Scholar
- Robert L Solso, M Kimberly MacLin, and Otto H MacLin. 2005. Cognitive Psychology. Pearson Education New Zealand.Google Scholar
- Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. 2021. The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels. arXiv preprint arXiv:2101.02722 (2021).Google Scholar
- Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron C. Courville, and Marc G. Bellemare. 2020. On Bonus Based Exploration Methods in The Arcade Learning Environment. In International Conference on Learning Representations (ICLR).Google Scholar
- Yunhao Tang. 2020. Self-Imitation Learning via Generalized Lower Bound Q-learning. In Advances in Neural Information Processing Systems (NeurIPS). 13964--13975.Google Scholar
- Yang Wang. 2020. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion. arXiv preprint arXiv:2006.08159 (2020).Google Scholar
- Dennis Wei, Sanjeeb Dash, Tian Gao, and Oktay Gunluk. 2019. Generalized Linear Rule Models. In International Conference on Machine Learning (ICML). 6687--6696.Google Scholar
- Jinglin Xu, Wenbin Li, Xinwang Liu, Dingwen Zhang, Ji Liu, and Junwei Han. 2020. Deep Embedded Complementary and Interactive Information for Multi-View Classification. In AAAI Conference on Artificial Intelligence (AAAI). 6494--6501.Google Scholar
- Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, and Wei Liu. 2020. Towards Playing Full MOBA Games with Deep Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 621--632.Google Scholar
- Haiyan Yin and Sinno Jialin Pan. 2017. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay. In AAAI Conference on Artificial Intelligence (AAAI). 1640--1646.Google Scholar
- Yang Yu. 2018. Towards Sample Efficient Reinforcement Learning. In International Joint Conference on Artificial Intelligence (IJCAI). 5739--5743.Google Scholar
- Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. 2020. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access, Vol. 8 (2020), 58443--58469.Google ScholarCross Ref
- Zhi-Hua Zhou. 2019. Abductive Learning: Towards Bridging Machine Learning and Logical Reasoning. Science China Information Sciences, Vol. 62, 7 (2019), 76101:1--76101:3.Google ScholarCross Ref
- Zhuangdi Zhu, Kaixiang Lin, and Jiayu Zhou. 2020. Transfer Learning in Deep Reinforcement Learning: A Survey. arXiv preprint arXiv:2009.07888 (2020).Google Scholar
- Zeyu Zhu and Huijing Zhao. 2022. A Survey of Deep RL and IL for Autonomous Driving Policy Learning. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 9 (2022), 14043--14065.Google ScholarDigital Library
Index Terms
- Internal Logical Induction for Pixel-Symbolic Reinforcement Learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Combining Rule Induction and Reinforcement Learning: An Agent-based Vehicle Routing
ICMLA '10: Proceedings of the 2010 Ninth International Conference on Machine Learning and ApplicationsReinforcement learning suffers from inefficiency when the number of potential solutions to be searched is large. This paper describes a method of improving reinforcement learning by applying rule induction in multi-agent systems. Knowledge captured by ...
Relational Reinforcement Learning
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions ...
Comments