Abstract:
Robot behaviour models in socially assistive robotics are typically trained using high-level features, such as a user’s engagement, such that inaccuracies in the feature ...Show MoreMetadata
Abstract:
Robot behaviour models in socially assistive robotics are typically trained using high-level features, such as a user’s engagement, such that inaccuracies in the feature extraction can have a significant effect on a robot’s subsequent performance. In this paper, we study whether a behaviour model can be meaningfully represented using an end-to-end approach, where multimodal input, concretely visual data and activity information, is directly processed by a neural network. This paper concretely analyses the different building blocks of such a model, such that the aim is to identify a suitable architecture that can meaningfully combine the different modalities for guiding a robot’s behaviour. We conduct the analysis in the context of a sequence learning game, such that we compare different vision-only models that are then combined with an activity processing network into a joint multimodal model. The results of our evaluation on a dedicated dataset from the sequence learning game demonstrate that a multimodal end-to-end behaviour model has potential for assistive robotics — we report an F1 score of around 0.88 across different dataset-based test scenarios — but the real-life transferability strongly depends on whether the data is diverse enough for capturing meaningful variations in real-world scenarios, such as users being at different distances from a robot.
Published in: 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)
Date of Conference: 26-30 August 2024
Date Added to IEEE Xplore: 30 October 2024
ISBN Information: