skip to main content
10.1145/3577190.3616121acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

Authors Info & Claims
Published:09 October 2023Publication History

ABSTRACT

Recent advances in deep learning and data-driven approaches have facilitated the perception of objects and their environments in a perceptual subsymbolic manner. Thus, these autonomous systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is an increasing demand to further enhance these systems to attain a more conceptual and symbolic understanding of objects to acquire the underlying reasoning behind the learned tasks. Achieving this level of powerful artificial intelligence necessitates considering both explicit teachings provided by humans (e.g., explaining how to act) and implicit teaching obtained through observing human behavior (e.g., through system sensors). Hence, it is imperative to incorporate symbolic and subsymbolic learning approaches to support implicit and explicit interaction models. This integration enables the system to achieve multimodal input and output capabilities. In this Blue Sky paper, we argue for considering these input types, along with human-in-the-loop and incremental learning techniques, to advance the field of artificial intelligence and enable autonomous systems to emulate human learning. We propose several hypotheses and design guidelines aimed at achieving this objective.

References

  1. Heike Adel. 2018. Deep learning methods for knowledge base population. Ph. D. Dissertation. LMU.Google ScholarGoogle Scholar
  2. Abdul Rafey Aftab, Michael von der Beeck, and Michael Feld. 2020. You have a point there: object selection inside an automobile using gaze, head pose and finger pointing. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 595–603.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Abdul Rafey Aftab, Michael Von Der Beeck, Steven Rohrhirsch, Benoit Diotte, and Michael Feld. 2021. Multimodal Fusion Using Deep Learning Applied to Driver’s Referencing of Outside-Vehicle Objects. In 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1108–1115.Google ScholarGoogle Scholar
  4. Masataro Asai and Alex Fukunaga. 2018. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. In Proceedings of the Conference on Artificial Intelligence (AAAI’18). AAAI Press, 6094–6101.Google ScholarGoogle ScholarCross RefCross Ref
  5. Shaibal Barua, Mobyen Uddin Ahmed, and Shahina Begum. 2020. Towards intelligent data analytics: A case study in driver cognitive load classification. Brain Sciences 10, 8 (2020), 1–19.Google ScholarGoogle ScholarCross RefCross Ref
  6. Eddie Brown, David R. Large, Hannah Limerick, and Gary Burnett. 2020. Ultrahapticons: “Haptifying” Drivers’ Mental Models to Transform Automotive Mid-Air Haptic Gesture Infotainment Interfaces. 54–57.Google ScholarGoogle Scholar
  7. Joanna J. Bryson and Andreas Theodorou. 2019. How Society Can Maintain Human-Centric Artificial Intelligence. Springer Singapore, Singapore, 305–323.Google ScholarGoogle Scholar
  8. Ulrich Büker. 1998. Hybrid Object Models: Combining Symbolic and Subsymbolic Object Recognition Strategies. In Proceedings of the International Conference on Information Systems, Analysis and Synthesis (ISAS’98). IIIS, 444–451.Google ScholarGoogle Scholar
  9. Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The EMPATHIC Framework for Task Learning from Implicit Human Feedback. In Proceedings of the 2020 Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 155), Jens Kober, Fabio Ramos, and Claire Tomlin (Eds.). PMLR, 604–626. https://proceedings.mlr.press/v155/cui21a.htmlGoogle ScholarGoogle ScholarCross RefCross Ref
  10. Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence 44, 7 (2021), 3366–3385.Google ScholarGoogle Scholar
  11. Ivan Donadello, Luciano Serafini, and Artur d’Avila Garcez. 2017. Logic Tensor Networks for Semantic Image Interpretation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’17). IJCAI Organization, 1596–1602. https://doi.org/10.24963/ijcai.2017/221Google ScholarGoogle ScholarCross RefCross Ref
  12. Jiang Dong, Dafang Zhuang, Yaohuan Huang, and Jingying Fu. 2009. Advances in multi-sensor data fusion: Algorithms and applications. Sensors 9, 10 (2009), 7771–7784.Google ScholarGoogle ScholarCross RefCross Ref
  13. Michael Feld, Robert Neβ elrath, and Tim Schwartz. 2019. Software platforms and toolkits for building multimodal systems and applications. In The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions-Volume 3. 145–190.Google ScholarGoogle Scholar
  14. Kikuo Fujimura, Lijie Xu, Cuong Tran, Rishabh Bhandari, and Victor Ng-Thow-Hing. 2013. Driver queries using wheel-constrained finger pointing and 3-D head-up display visual feedback. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 56–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ray Fuller. 2005. Towards a general theory of driver behaviour. Accident Analysis and Prevention 37, 3 (5 2005), 461–472.Google ScholarGoogle Scholar
  16. Alexander Gepperth and Barbara Hammer. 2016. Incremental learning algorithms and applications. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’16). ESSAN, 357–368.Google ScholarGoogle Scholar
  17. Milan Gnjatović, Jovica Tasevski, Milutin Nikolić, Dragiša Mišković, Branislav Borovac, and Vlado Delić. 2012. Adaptive multimodal interaction with industrial robot. In 2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics. IEEE, 329–333.Google ScholarGoogle ScholarCross RefCross Ref
  18. Amr Gomaa. 2022. Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces. In Proceedings of the 2022 International Conference on Multimodal Interaction. 690–695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Amr Gomaa, Alexandra Alles, Elena Meiser, Lydia Helene Rupp, Marco Molz, and Guillermo Reyes. 2022. What’s on your mind? A Mental and Perceptual Load Estimation Framework towards Adaptive In-vehicle Interaction while Driving. In Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. 215–225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Amr Gomaa, Guillermo Reyes, Alexandra Alles, Lydia Rupp, and Michael Feld. 2020. Studying person-specific pointing and gaze behavior for multimodal referencing of outside objects from a moving vehicle. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 501–509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Amr Gomaa, Guillermo Reyes, and Michael Feld. 2021. ML-PersRef: A Machine Learning-Based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle. In Proceedings of the 23rd International Conference on Multimodal Interaction. ACM, New York, NY, USA, 318–327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alain Grumbach. 1995. Learning at subsymbolic and symbolic levels. In Neural Networks: Artificial Intelligence and Industrial Applications – Proceedings of the Annual SNN Symposium on Neural Networks, Bert Kappen and Stan Gielen (Eds.). Springer London, 91–94.Google ScholarGoogle Scholar
  23. Lisa Hassel and Eli Hagen. 2005. Adaptation of an automotive dialogue system to users’ expertise. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue. 222–226.Google ScholarGoogle Scholar
  24. Mir Riyanul Islam, Shaibal Barua, Mobyen Uddin Ahmed, Shahina Begum, Pietro Aricò, Gianluca Borghini, and Gianluca Di Flumeri. 2020. A novel mutual information based feature set for drivers’ mental workload evaluation using machine learning. Brain Sciences 10, 8 (8 2020), 1–23.Google ScholarGoogle Scholar
  25. Srinivasan Janarthanam and Oliver Lemon. 2014. Adaptive generation in dialogue systems using dynamic user modeling. Computational Linguistics 40, 4 (2014), 883–920.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Luo Jie, Tatiana Tommasi, and Barbara Caputo. 2011. Multiclass transfer learning from unconstrained priors. In Proceedings of the International Conference on Computer Vision (ICCV’17). IEEE, 1863–1870.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ya Jing, Junbo Wang, Wei Wang, Liang Wang, and Tieniu Tan. 2020. Relational graph neural network for situation recognition. Pattern Recognition 108 (2020), 107544.Google ScholarGoogle ScholarCross RefCross Ref
  28. Christoph Käding, Erik Rodner, Alexander Freytag, and Joachim Denzler. 2016. Fine-tuning deep neural networks in continuous learning scenarios. In Proceedings of the Asian Conference on Computer Vision (ACCV’16 Workshops). Springer, 588–605.Google ScholarGoogle Scholar
  29. AA Karpov and RM Yusupov. 2018. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences 88, 1 (2018), 67–74.Google ScholarGoogle ScholarCross RefCross Ref
  30. Troy Dale Kelley. 2006. Developing a Psychologically Inspired Cognitive Architecture for Robotic Control: The Symbolic and Subsymbolic Robotic Intelligence Control System (SS-RICS). International Journal of Advanced Robotic Systems 3, 3 (Sept. 2006), 219–222. https://doi.org/10.5772/5736Google ScholarGoogle ScholarCross RefCross Ref
  31. Hansol Kim, Kun Ha Suh, and Eui Chul Lee. 2017. Multi-modal user interface combining eye tracking and hand gesture recognition. Journal on Multimodal User Interfaces 11, 3 (2017), 241–250.Google ScholarGoogle ScholarCross RefCross Ref
  32. W Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the fifth international conference on Knowledge capture. 9–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Quanyi Li, Zhenghao Peng, and Bolei Zhou. 2022. Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization. In International Conference on Learning Representations. https://openreview.net/forum?id=0cgU-BZp2kyGoogle ScholarGoogle Scholar
  34. Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning (ICML’17) - Volume 70. ACM, 2208–2217.Google ScholarGoogle Scholar
  35. Udara E Manawadu, Mitsuhiro Kamezaki, Masaaki Ishikawa, Takahiro Kawano, and Shigeki Sugano. 2017. A multimodal human-machine interface enabling situation-Adaptive control inputs for highly automated vehicles. In 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1195–1200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Elena Meiser, Alexandra Alles, Samuel Selter, Marco Molz, Amr Gomaa, and Guillermo Reyes. 2022. In-Vehicle Interface Adaptation to Environment-Induced Cognitive Workload. In Adjunct Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. 83–86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mohammad Mehdi Moniri and Christian Müller. 2012. Multimodal reference resolution for mobile spatial interaction in urban environments. In Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 241–248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robert Neßelrath, Mohammad Mehdi Moniri, and Michael Feld. 2016. Combining speech, gaze, and micro-gestures for the multimodal control of in-car functions. In Proceedings of the 12th International Conference on Intelligent Environments. IEEE, 190–193.Google ScholarGoogle ScholarCross RefCross Ref
  39. Natalia Neverova, Christian Wolf, Graham Taylor, and Florian Nebout. 2015. Moddrop: adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (2015), 1692–1706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Andrzej Nowak, Paul Lukowicz, and Pawel Horodecki. 2018. Assessing artificial intelligence for humanity: Will AI be the our biggest ever advance? Or the biggest threat [Opinion]. IEEE Technology and Society Magazine 37, 4 (2018), 26–34.Google ScholarGoogle ScholarCross RefCross Ref
  41. R. Polikar, L. Upda, S.S. Upda, and V. Honavar. 2001. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 31, 4 (Nov. 2001), 497–508. https://doi.org/10.1109/5326.983933Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Alex Ratner and Christopher Ré. 2018. Knowledge Base Construction in the Machine-learning Era. Queue 16, 3, Article 50 (June 2018), 12 pages. https://doi.org/10.1145/3236386.3243045Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Seth Rogers, C-N Fiechter, and Cynthia Thompson. 2000. Adaptive user interfaces for automotive environments. In Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No. 00TH8511). IEEE, 662–667.Google ScholarGoogle ScholarCross RefCross Ref
  44. Florian Roider and Tom Gross. 2018. I See Your Point: Integrating Gaze to Enhance Pointing Gesture Accuracy While Driving. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 351–358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Florian Roider, Sonja Rümelin, Bastian Pfleging, and Tom Gross. 2017. The effects of situational demands on gaze, speech and gesture input in the vehicle. In Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 94–102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sonja Rümelin, Chadly Marouane, and Andreas Butz. 2013. Free-hand pointing for identification and interaction with distant objects. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 40–47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jeffrey C. Schlimmer and Richard H. Granger. 1986. Incremental learning from noisy data. Machine Learning 1, 3 (1986), 317–354. https://doi.org/10.1007/BF00116895Google ScholarGoogle ScholarCross RefCross Ref
  48. Luciano Serafini and Artur d’Avila Garcez. 2016. Logic tensor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422 (2016).Google ScholarGoogle Scholar
  49. Tevfik Metin Sezgin, Ian Davies, and Peter Robinson. 2009. Multimodal inference for driver-vehicle interaction. In Proceedings of the 11th International Conference on Multimodal Interfaces. ACM, 193–198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jude W. Shavlik. 1994. Combining symbolic and neural learning. Machine Learning 14, 3 (March 1994), 321–331. https://doi.org/10.1007/BF00993982Google ScholarGoogle ScholarCross RefCross Ref
  51. Ben Shneiderman. 2020. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504.Google ScholarGoogle ScholarCross RefCross Ref
  52. Erin T. Solovey, Marin Zec, Enrique Abdon Garcia Perez, Bryan Reimer, and Bruce Mehler. 2014. Classifying Driver Workload Using Physiological and Driving Performance Data: Two Field Studies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). ACM, New York, NY, USA, 4057–4066.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Gido M Van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019).Google ScholarGoogle Scholar
  54. Laura Von Rueden, Sebastian Mayer, Jochen Garcke, Christian Bauckhage, and Jannis Schuecker. 2019. Informed machine learning–towards a taxonomy of explicit integration of knowledge into machine learning. Learning 18 (2019), 19–20.Google ScholarGoogle Scholar
  55. Wei Xu. 2019. Toward human-centered AI: a perspective from human-computer interaction. interactions 26, 4 (2019), 42–46.Google ScholarGoogle Scholar
  56. Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In Human-Computer Interaction – INTERACT 2015. Springer, 570–577.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
            October 2023
            858 pages
            ISBN:9798400700552
            DOI:10.1145/3577190

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 October 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate453of1,080submissions,42%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format