skip to main content
10.1145/3580305.3599393acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free Access

Internal Logical Induction for Pixel-Symbolic Reinforcement Learning

Published:04 August 2023Publication History

ABSTRACT

Reinforcement Learning (RL) has experienced rapid advancements in recent years. The widely studied RL algorithms mainly focus on a single input form, such as pixel-based image input or symbolic vector input. These two forms have different characteristics and, in many scenarios, will appear together, while few RL algorithms have studied the problems with mixed input types. Specifically, in the scenario where both pixel and symbolic inputs are available, symbolic input usually offers abstract features with specific semantics, which is more conducive to the agent's focus. Conversely, pixel input provides more comprehensive information, enabling the agent to make well-informed decisions. Tailoring the processing approach based on the properties of these two input types can contribute to solving the problem more effectively. To tackle the above issue, we propose an Internal Logical Induction (ILI) framework that integrates deep RL and rule learning into one system. ILI utilizes the deep RL algorithm to process the pixel input and the rule learning algorithm to induce propositional logic knowledge from symbolic input. To efficiently combine these two mechanisms, we further adopt a reward shaping technique by treating valuable knowledge as intrinsic rewards for the RL procedure. Experimental results demonstrate that the ILI framework outperforms baseline approaches in RL problems with pixel-symbolic input, and its inductive knowledge exhibits transferability advantages when pixel input semantics change.

Skip Supplemental Material Section

Supplemental Material

rtfp1414-2min-promo.mp4

mp4

77.9 MB

References

  1. Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Józefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. 2020. Learning Dexterous In-Hand Manipulation. The International Journal of Robotics Research, Vol. 39, 1 (2020), 3--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In Advances in Neural Information Processing Systems (NeurIPS). 5048--5058.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, Vol. 34, 6 (2017), 26--38.Google ScholarGoogle ScholarCross RefCross Ref
  4. Masataro Asai and Christian Muise. 2020. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS). In International Joint Conference on Artificial Intelligence (IJCAI). 2676--2682.Google ScholarGoogle ScholarCross RefCross Ref
  5. Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Zidek, and Remi Munos. 2018. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. In International Conference on Machine Learning (ICML). 501--510.Google ScholarGoogle Scholar
  6. Andre Barreto, Will Dabney, Remi Munos, Jonathan J Hunt, Tom Schaul, Hado P van Hasselt, and David Silver. 2017. Successor Features for Transfer in Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 4055--4065.Google ScholarGoogle Scholar
  7. Yuri Burda, Harrison Edwards, Amos J. Storkey, and Oleg Klimov. 2019. Exploration by random network distillation. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  8. Ning Chen, Jun Zhu, Fuchun Sun, and Eric P. Xing. 2012. Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, 12 (2012), 2365--2378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nuttapong Chentanez, Andrew Barto, and Satinder Singh. 2004. Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1281--1288.Google ScholarGoogle Scholar
  10. Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. 2022. Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4166--4173.Google ScholarGoogle Scholar
  11. William W. Cohen. 1995. Fast Effective Rule Induction. In International Conference on Machine Learning (ICML). 115--123.Google ScholarGoogle Scholar
  12. Peter Dayan. 1993. Improving Generalization for Temporal Difference Learning: The Successor Representation. In Advances in Neural Information Processing Systems (NeurIPS). 613--624.Google ScholarGoogle Scholar
  13. Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. 2022. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In Advances in Neural Information Processing Systems (NeurIPS). 5982--5994.Google ScholarGoogle Scholar
  14. Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. 2022. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. In Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  15. Fernando Fernández and Manuela M. Veloso. 2006. Probabilistic policy reuse in a reinforcement learning agent. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727.Google ScholarGoogle Scholar
  16. Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. PathNet: Evolution Channels Gradient Descent in Super Neural Networks. arXiv preprint arXiv:1701.08734 (2017).Google ScholarGoogle Scholar
  17. Eibe Frank and Ian H. Witten. 1998. Generating Accurate Rule Sets Without Global Optimization. In International Conference on Machine Learning (ICML). 144--151.Google ScholarGoogle Scholar
  18. Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrac. 2012. Foundations of Rule Learning. Springer.Google ScholarGoogle Scholar
  19. Brian R. Gaines and Paul Compton. 1995. Induction of Ripple-Down Rules Applied to Modeling Large Databases. Journal of Intelligent Information Systems, Vol. 5, 3 (1995), 211--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, Vol. 11, 1 (2009), 10--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, and David Silver. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In AAAI Conference on Artificial Intelligence (AAAI). 3215--3222.Google ScholarGoogle Scholar
  22. Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  23. Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2016. VIME: Variational Information Maximizing Exploration. In Advances in Neural Information Processing Systems (NeurIPS). 1109--1117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, and Chongjie Zhang. 2021. Generalizable Episodic Memory for Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 4380--4390.Google ScholarGoogle Scholar
  25. Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Vol. 4 (1996), 237--285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Akira Kinose, Masashi Okada, Ryo Okumura, and Tadahiro Taniguchi. 2022. Multi-View Dreaming: Multi-View World Model with Contrastive Learning. arXiv preprint arXiv:2203.11024 (2022).Google ScholarGoogle Scholar
  27. Phuc H. Le-Khac, Graham Healy, and Alan F. Smeaton. 2020. Contrastive Representation Learning: A Framework and Review. IEEE Aerospace Conference, Vol. 8 (2020), 193907--193934.Google ScholarGoogle Scholar
  28. Kimin Lee, Michael Laskin, Aravind Srinivas, and Pieter Abbeel. 2021. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 6131--6141.Google ScholarGoogle Scholar
  29. Minne Li, Lisheng Wu, Jun WANG, and Haitham Bou Ammar. 2019. Multi-View Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1418--1429.Google ScholarGoogle Scholar
  30. Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning. In AAAI Conference on Artificial Intelligence (AAAI). 2970--2977.Google ScholarGoogle Scholar
  31. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level Control Through Deep Reinforcement Learning. Nature, Vol. 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  32. Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee. 2018. Self-Imitation Learning. In International Conference on Machine Learning (ICML). 3878--3887.Google ScholarGoogle Scholar
  33. Pouya Ghiasnezhad Omran, Kewen Wang, and Zhe Wang. 2021. An Embedding-Based Approach to Rule Learning in Knowledge Graphs. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2021), 1348--1359.Google ScholarGoogle ScholarCross RefCross Ref
  34. OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, and Lei Zhang. 2019. Solving Rubik's Cube with a Robot Hand. arXiv preprint arXiv:1910.07113 (2019).Google ScholarGoogle Scholar
  35. Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 12026--12035.Google ScholarGoogle ScholarCross RefCross Ref
  36. Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, and Chai Quek. 2022. Hierarchical Reinforcement Learning: A Comprehensive Survey. ACM Computing Surveys (CSUR), Vol. 54, 5 (2022), 109:1--109:35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In International Conference on Machine Learning (ICML). 2778--2787.Google ScholarGoogle ScholarCross RefCross Ref
  38. J. Ross Quinlan. 1990. Learning Logical Definitions from Relations. Machine Learning, Vol. 5 (1990), 239--266.Google ScholarGoogle ScholarCross RefCross Ref
  39. Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, Vol. 22, 268 (2021), 1--8.Google ScholarGoogle Scholar
  40. Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).Google ScholarGoogle Scholar
  41. Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. arXiv preprint arXiv:1606.04671 (2016).Google ScholarGoogle Scholar
  42. Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. 2015. Universal Value Function Approximators. In International Conference on Machine Learning (ICML). 1312--1320.Google ScholarGoogle Scholar
  43. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  44. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, Vol. 529, 7587 (2016), 484--489.Google ScholarGoogle Scholar
  45. Robert L Solso, M Kimberly MacLin, and Otto H MacLin. 2005. Cognitive Psychology. Pearson Education New Zealand.Google ScholarGoogle Scholar
  46. Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. 2021. The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels. arXiv preprint arXiv:2101.02722 (2021).Google ScholarGoogle Scholar
  47. Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron C. Courville, and Marc G. Bellemare. 2020. On Bonus Based Exploration Methods in The Arcade Learning Environment. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  48. Yunhao Tang. 2020. Self-Imitation Learning via Generalized Lower Bound Q-learning. In Advances in Neural Information Processing Systems (NeurIPS). 13964--13975.Google ScholarGoogle Scholar
  49. Yang Wang. 2020. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion. arXiv preprint arXiv:2006.08159 (2020).Google ScholarGoogle Scholar
  50. Dennis Wei, Sanjeeb Dash, Tian Gao, and Oktay Gunluk. 2019. Generalized Linear Rule Models. In International Conference on Machine Learning (ICML). 6687--6696.Google ScholarGoogle Scholar
  51. Jinglin Xu, Wenbin Li, Xinwang Liu, Dingwen Zhang, Ji Liu, and Junwei Han. 2020. Deep Embedded Complementary and Interactive Information for Multi-View Classification. In AAAI Conference on Artificial Intelligence (AAAI). 6494--6501.Google ScholarGoogle Scholar
  52. Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, and Wei Liu. 2020. Towards Playing Full MOBA Games with Deep Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 621--632.Google ScholarGoogle Scholar
  53. Haiyan Yin and Sinno Jialin Pan. 2017. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay. In AAAI Conference on Artificial Intelligence (AAAI). 1640--1646.Google ScholarGoogle Scholar
  54. Yang Yu. 2018. Towards Sample Efficient Reinforcement Learning. In International Joint Conference on Artificial Intelligence (IJCAI). 5739--5743.Google ScholarGoogle Scholar
  55. Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. 2020. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access, Vol. 8 (2020), 58443--58469.Google ScholarGoogle ScholarCross RefCross Ref
  56. Zhi-Hua Zhou. 2019. Abductive Learning: Towards Bridging Machine Learning and Logical Reasoning. Science China Information Sciences, Vol. 62, 7 (2019), 76101:1--76101:3.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhuangdi Zhu, Kaixiang Lin, and Jiayu Zhou. 2020. Transfer Learning in Deep Reinforcement Learning: A Survey. arXiv preprint arXiv:2009.07888 (2020).Google ScholarGoogle Scholar
  58. Zeyu Zhu and Huijing Zhao. 2022. A Survey of Deep RL and IL for Autonomous Driving Policy Learning. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 9 (2022), 14043--14065.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Internal Logical Induction for Pixel-Symbolic Reinforcement Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2023
      5996 pages
      ISBN:9798400701030
      DOI:10.1145/3580305

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 August 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24
    • Article Metrics

      • Downloads (Last 12 months)305
      • Downloads (Last 6 weeks)33

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Access Granted

    The conference sponsors are committed to making content openly accessible in a timely manner.
    This article is provided by ACM and the conference, through the ACM OpenTOC service.