short-paper

Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

Authors:
Amr Gomaa

German Research Center for Artificial Intelligence (DFKI), Germany and Saarland Informatics Campus\t, Germany

German Research Center for Artificial Intelligence (DFKI), Germany and Saarland Informatics Campus\t, Germany

0000-0003-0955-3181
View Profile

,
Michael Feld

German Research Center for Artificial Intelligence (DFKI), Germany

German Research Center for Artificial Intelligence (DFKI), Germany

0000-0001-6755-5287
View Profile

ICMI '23: Proceedings of the 25th International Conference on Multimodal InteractionOctober 2023Pages 689–694https://doi.org/10.1145/3577190.3616121

Published:09 October 2023Publication History

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

Pages 689–694

ABSTRACT

Recent advances in deep learning and data-driven approaches have facilitated the perception of objects and their environments in a perceptual subsymbolic manner. Thus, these autonomous systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is an increasing demand to further enhance these systems to attain a more conceptual and symbolic understanding of objects to acquire the underlying reasoning behind the learned tasks. Achieving this level of powerful artificial intelligence necessitates considering both explicit teachings provided by humans (e.g., explaining how to act) and implicit teaching obtained through observing human behavior (e.g., through system sensors). Hence, it is imperative to incorporate symbolic and subsymbolic learning approaches to support implicit and explicit interaction models. This integration enables the system to achieve multimodal input and output capabilities. In this Blue Sky paper, we argue for considering these input types, along with human-in-the-loop and incremental learning techniques, to advance the field of artificial intelligence and enable autonomous systems to emulate human learning. We propose several hypotheses and design guidelines aimed at achieving this objective.

References

Heike Adel. 2018. Deep learning methods for knowledge base population. Ph. D. Dissertation. LMU.Google Scholar
Abdul Rafey Aftab, Michael von der Beeck, and Michael Feld. 2020. You have a point there: object selection inside an automobile using gaze, head pose and finger pointing. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 595–603.Google ScholarDigital Library
Abdul Rafey Aftab, Michael Von Der Beeck, Steven Rohrhirsch, Benoit Diotte, and Michael Feld. 2021. Multimodal Fusion Using Deep Learning Applied to Driver’s Referencing of Outside-Vehicle Objects. In 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1108–1115.Google Scholar
Masataro Asai and Alex Fukunaga. 2018. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. In Proceedings of the Conference on Artificial Intelligence (AAAI’18). AAAI Press, 6094–6101.Google ScholarCross Ref
Shaibal Barua, Mobyen Uddin Ahmed, and Shahina Begum. 2020. Towards intelligent data analytics: A case study in driver cognitive load classification. Brain Sciences 10, 8 (2020), 1–19.Google ScholarCross Ref
Eddie Brown, David R. Large, Hannah Limerick, and Gary Burnett. 2020. Ultrahapticons: “Haptifying” Drivers’ Mental Models to Transform Automotive Mid-Air Haptic Gesture Infotainment Interfaces. 54–57.Google Scholar
Joanna J. Bryson and Andreas Theodorou. 2019. How Society Can Maintain Human-Centric Artificial Intelligence. Springer Singapore, Singapore, 305–323.Google Scholar
Ulrich Büker. 1998. Hybrid Object Models: Combining Symbolic and Subsymbolic Object Recognition Strategies. In Proceedings of the International Conference on Information Systems, Analysis and Synthesis (ISAS’98). IIIS, 444–451.Google Scholar
Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The EMPATHIC Framework for Task Learning from Implicit Human Feedback. In Proceedings of the 2020 Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 155), Jens Kober, Fabio Ramos, and Claire Tomlin (Eds.). PMLR, 604–626. https://proceedings.mlr.press/v155/cui21a.htmlGoogle ScholarCross Ref
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence 44, 7 (2021), 3366–3385.Google Scholar
Ivan Donadello, Luciano Serafini, and Artur d’Avila Garcez. 2017. Logic Tensor Networks for Semantic Image Interpretation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’17). IJCAI Organization, 1596–1602. https://doi.org/10.24963/ijcai.2017/221Google ScholarCross Ref
Jiang Dong, Dafang Zhuang, Yaohuan Huang, and Jingying Fu. 2009. Advances in multi-sensor data fusion: Algorithms and applications. Sensors 9, 10 (2009), 7771–7784.Google ScholarCross Ref
Michael Feld, Robert Neβ elrath, and Tim Schwartz. 2019. Software platforms and toolkits for building multimodal systems and applications. In The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions-Volume 3. 145–190.Google Scholar
Kikuo Fujimura, Lijie Xu, Cuong Tran, Rishabh Bhandari, and Victor Ng-Thow-Hing. 2013. Driver queries using wheel-constrained finger pointing and 3-D head-up display visual feedback. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 56–62.Google ScholarDigital Library
Ray Fuller. 2005. Towards a general theory of driver behaviour. Accident Analysis and Prevention 37, 3 (5 2005), 461–472.Google Scholar
Alexander Gepperth and Barbara Hammer. 2016. Incremental learning algorithms and applications. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’16). ESSAN, 357–368.Google Scholar
Milan Gnjatović, Jovica Tasevski, Milutin Nikolić, Dragiša Mišković, Branislav Borovac, and Vlado Delić. 2012. Adaptive multimodal interaction with industrial robot. In 2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics. IEEE, 329–333.Google ScholarCross Ref
Amr Gomaa. 2022. Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces. In Proceedings of the 2022 International Conference on Multimodal Interaction. 690–695.Google ScholarDigital Library
Amr Gomaa, Alexandra Alles, Elena Meiser, Lydia Helene Rupp, Marco Molz, and Guillermo Reyes. 2022. What’s on your mind? A Mental and Perceptual Load Estimation Framework towards Adaptive In-vehicle Interaction while Driving. In Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. 215–225.Google ScholarDigital Library
Amr Gomaa, Guillermo Reyes, Alexandra Alles, Lydia Rupp, and Michael Feld. 2020. Studying person-specific pointing and gaze behavior for multimodal referencing of outside objects from a moving vehicle. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 501–509.Google ScholarDigital Library
Amr Gomaa, Guillermo Reyes, and Michael Feld. 2021. ML-PersRef: A Machine Learning-Based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle. In Proceedings of the 23rd International Conference on Multimodal Interaction. ACM, New York, NY, USA, 318–327.Google ScholarDigital Library
Alain Grumbach. 1995. Learning at subsymbolic and symbolic levels. In Neural Networks: Artificial Intelligence and Industrial Applications – Proceedings of the Annual SNN Symposium on Neural Networks, Bert Kappen and Stan Gielen (Eds.). Springer London, 91–94.Google Scholar
Lisa Hassel and Eli Hagen. 2005. Adaptation of an automotive dialogue system to users’ expertise. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue. 222–226.Google Scholar
Mir Riyanul Islam, Shaibal Barua, Mobyen Uddin Ahmed, Shahina Begum, Pietro Aricò, Gianluca Borghini, and Gianluca Di Flumeri. 2020. A novel mutual information based feature set for drivers’ mental workload evaluation using machine learning. Brain Sciences 10, 8 (8 2020), 1–23.Google Scholar
Srinivasan Janarthanam and Oliver Lemon. 2014. Adaptive generation in dialogue systems using dynamic user modeling. Computational Linguistics 40, 4 (2014), 883–920.Google ScholarDigital Library
Luo Jie, Tatiana Tommasi, and Barbara Caputo. 2011. Multiclass transfer learning from unconstrained priors. In Proceedings of the International Conference on Computer Vision (ICCV’17). IEEE, 1863–1870.Google ScholarDigital Library
Ya Jing, Junbo Wang, Wei Wang, Liang Wang, and Tieniu Tan. 2020. Relational graph neural network for situation recognition. Pattern Recognition 108 (2020), 107544.Google ScholarCross Ref
Christoph Käding, Erik Rodner, Alexander Freytag, and Joachim Denzler. 2016. Fine-tuning deep neural networks in continuous learning scenarios. In Proceedings of the Asian Conference on Computer Vision (ACCV’16 Workshops). Springer, 588–605.Google Scholar
AA Karpov and RM Yusupov. 2018. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences 88, 1 (2018), 67–74.Google ScholarCross Ref
Troy Dale Kelley. 2006. Developing a Psychologically Inspired Cognitive Architecture for Robotic Control: The Symbolic and Subsymbolic Robotic Intelligence Control System (SS-RICS). International Journal of Advanced Robotic Systems 3, 3 (Sept. 2006), 219–222. https://doi.org/10.5772/5736Google ScholarCross Ref
Hansol Kim, Kun Ha Suh, and Eui Chul Lee. 2017. Multi-modal user interface combining eye tracking and hand gesture recognition. Journal on Multimodal User Interfaces 11, 3 (2017), 241–250.Google ScholarCross Ref
W Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the fifth international conference on Knowledge capture. 9–16.Google ScholarDigital Library
Quanyi Li, Zhenghao Peng, and Bolei Zhou. 2022. Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization. In International Conference on Learning Representations. https://openreview.net/forum?id=0cgU-BZp2kyGoogle Scholar
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning (ICML’17) - Volume 70. ACM, 2208–2217.Google Scholar
Udara E Manawadu, Mitsuhiro Kamezaki, Masaaki Ishikawa, Takahiro Kawano, and Shigeki Sugano. 2017. A multimodal human-machine interface enabling situation-Adaptive control inputs for highly automated vehicles. In 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1195–1200.Google ScholarDigital Library
Elena Meiser, Alexandra Alles, Samuel Selter, Marco Molz, Amr Gomaa, and Guillermo Reyes. 2022. In-Vehicle Interface Adaptation to Environment-Induced Cognitive Workload. In Adjunct Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. 83–86.Google ScholarDigital Library
Mohammad Mehdi Moniri and Christian Müller. 2012. Multimodal reference resolution for mobile spatial interaction in urban environments. In Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 241–248.Google ScholarDigital Library
Robert Neßelrath, Mohammad Mehdi Moniri, and Michael Feld. 2016. Combining speech, gaze, and micro-gestures for the multimodal control of in-car functions. In Proceedings of the 12th International Conference on Intelligent Environments. IEEE, 190–193.Google ScholarCross Ref
Natalia Neverova, Christian Wolf, Graham Taylor, and Florian Nebout. 2015. Moddrop: adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (2015), 1692–1706.Google ScholarDigital Library
Andrzej Nowak, Paul Lukowicz, and Pawel Horodecki. 2018. Assessing artificial intelligence for humanity: Will AI be the our biggest ever advance? Or the biggest threat [Opinion]. IEEE Technology and Society Magazine 37, 4 (2018), 26–34.Google ScholarCross Ref
R. Polikar, L. Upda, S.S. Upda, and V. Honavar. 2001. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 31, 4 (Nov. 2001), 497–508. https://doi.org/10.1109/5326.983933Google ScholarDigital Library
Alex Ratner and Christopher Ré. 2018. Knowledge Base Construction in the Machine-learning Era. Queue 16, 3, Article 50 (June 2018), 12 pages. https://doi.org/10.1145/3236386.3243045Google ScholarDigital Library
Seth Rogers, C-N Fiechter, and Cynthia Thompson. 2000. Adaptive user interfaces for automotive environments. In Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No. 00TH8511). IEEE, 662–667.Google ScholarCross Ref
Florian Roider and Tom Gross. 2018. I See Your Point: Integrating Gaze to Enhance Pointing Gesture Accuracy While Driving. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 351–358.Google ScholarDigital Library
Florian Roider, Sonja Rümelin, Bastian Pfleging, and Tom Gross. 2017. The effects of situational demands on gaze, speech and gesture input in the vehicle. In Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 94–102.Google ScholarDigital Library
Sonja Rümelin, Chadly Marouane, and Andreas Butz. 2013. Free-hand pointing for identification and interaction with distant objects. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 40–47.Google ScholarDigital Library
Jeffrey C. Schlimmer and Richard H. Granger. 1986. Incremental learning from noisy data. Machine Learning 1, 3 (1986), 317–354. https://doi.org/10.1007/BF00116895Google ScholarCross Ref
Luciano Serafini and Artur d’Avila Garcez. 2016. Logic tensor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422 (2016).Google Scholar
Tevfik Metin Sezgin, Ian Davies, and Peter Robinson. 2009. Multimodal inference for driver-vehicle interaction. In Proceedings of the 11th International Conference on Multimodal Interfaces. ACM, 193–198.Google ScholarDigital Library
Jude W. Shavlik. 1994. Combining symbolic and neural learning. Machine Learning 14, 3 (March 1994), 321–331. https://doi.org/10.1007/BF00993982Google ScholarCross Ref
Ben Shneiderman. 2020. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504.Google ScholarCross Ref
Erin T. Solovey, Marin Zec, Enrique Abdon Garcia Perez, Bryan Reimer, and Bruce Mehler. 2014. Classifying Driver Workload Using Physiological and Driving Performance Data: Two Field Studies. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). ACM, New York, NY, USA, 4057–4066.Google ScholarDigital Library
Gido M Van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019).Google Scholar
Laura Von Rueden, Sebastian Mayer, Jochen Garcke, Christian Bauckhage, and Jannis Schuecker. 2019. Informed machine learning–towards a taxonomy of explicit integration of knowledge into machine learning. Learning 18 (2019), 19–20.Google Scholar
Wei Xu. 2019. Toward human-centered AI: a perspective from human-computer interaction. interactions 26, 4 (2019), 42–46.Google Scholar
Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In Human-Computer Interaction – INTERACT 2015. Springer, 570–577.Google ScholarDigital Library

Index Terms

Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
    2. HCI theory, concepts and models
  2. Interaction design
    1. Interaction design process and methods
      1. User centered design

Recommendations

Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces
ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction

With the recently increasing capabilities of modern vehicles, novel approaches for interaction emerged that go beyond traditional touch-based and voice command approaches. Therefore, hand gestures, head pose, eye gaze, and speech have been extensively ...
Read More
Artificial Intelligence for Humankind: A Panel on How to Create Truly Interactive and Human-Centered AI for the Benefit of Individuals and Society
Human-Computer Interaction – INTERACT 2021
Abstract
This panel discusses the role of human-computer interaction (HCI) in the conception, design, and implementation of human-centered artificial intelligence (AI). For us, it is important that AI and machine learning (ML) are ethical and create value ...
Read More
Towards a Human-Centred Artificial Intelligence Maturity Model
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Artificial intelligence (AI) is becoming a central building block of computational systems. Following the long traditions of human-centered design, Human-Centered AI (HCAI) emphasises the importance of putting humans and various societal considerations ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
October 2023
858 pages
ISBN:9798400700552
DOI:10.1145/3577190
Editors:
Elisabeth André
University of Augsburg
,
Mohamed Chetouani
Sorbonne University
,
Dominique Vaufreydaz
Univ. Grenoble Alpes
,
Gale Lucas
USC Institute for Creative Technologies
,
Tanja Schultz
University of Bremen
,
Louis-Philippe Morency
Carnegie Mellon University
,
Alessandro Vinciarelli
University of Glasgow
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Adaptive Models
Data Fusion
Human-Centered Artificial Intelligence
Multimodal Interaction
Personalization
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces

Artificial Intelligence for Humankind: A Panel on How to Create Truly Interactive and Human-Centered AI for the Benefit of Individuals and Society

Towards a Human-Centred Artificial Intelligence Maturity Model