ART-based fusion of multi-modal perception for robots

doi:10.1016/j.neucom.2012.08.035

Neurocomputing

Volume 107, 1 May 2013, Pages 11-22

https://doi.org/10.1016/j.neucom.2012.08.035 Get rights and content

Abstract

Robotic application scenarios in uncontrolled environments pose high demands on mobile robots. This is especially true if human–robot interaction or robot–robot interaction is involved. Here, potential interaction partners need to be identified. To tackle challenges like this, robots make use of different sensory systems. In many cases, these robots have to deal with erroneous data from different sensory systems which often are processed separately. A possible strategy to improve identification results is to combine different processing results of complementary sensors. Their relation is often hard coded and difficult to learn incrementally if new kinds of objects or events occur. In this paper, we present a new fusion strategy which we call the Simplified Fusion ARTMAP (SiFuAM) which is very flexible and therefore can be easily adapted to new domains or sensor configurations. As our approach is based on the Adaptive Resonance Theory (ART) it is inherently capable of incremental on-line learning. We show its applicability in different robotic scenarios and platforms and give an overview of its performance.

Introduction

Artificial systems operating in natural environments are faced with different challenges: for example, a speech recognition system is affected by noise or a mobile robot could be hampered by an obstacle on its path. The quality of information that a robotic system can gather about its environment is crucial for its success in solving the described tasks.

In the examples mentioned above, the integration of data originating from different sensors can improve the quality of the overall classification performance of the system. For example, if a mobile robot is able to combine visual and laser scan data to distinguish between a pillar and a human blocking its way, it could adapt its behavior strategy accordingly. It could, for example, ask the human to move out of the way, while the same strategy would fail in case of the pillar.

Not only the identification of persons as possible interaction partners for the robot is interesting but also the recognition of other robots or specific objects. An example could be a robot that operates in an extraterrestrial scenario like in a lunar environment. Special limitations of extraterrestrial applications are the communication latency, the small communication time window per day, and the data bandwidth limitation which all constrain the applicability of remote-controlled systems. This increases the demand for robots that are able to perform tasks autonomously. To solve a task in an autonomous way the robot has to decide which behavior is best to approach the task. This requires the robot to have information about its environment such as the availability of tools, further equipment, or other robots, and hence object identification is a prerequisite here.

Robot systems that can be applied in such terrestrial or extraterrestrial scenarios usually possess multiple sensor modalities such as a camera, a laser scanner, and others (e.g., see [1]). If the data originating from these sensors can be combined in an appropriate way, the reaction of the robot can be more reliable due to the higher quality of information for the classification. Furthermore, the environment of the robot is not always static, for instance, new obstacles, persons, or other robots might appear. In this case, the robot should be able to deal with the new situation and adapt its behavior, that is, it should learn to identify these new objects.

In this paper, we propose and discuss an algorithm for the integration of data originating from different sensors to solve classification tasks like those already mentioned. In particular, we discuss a robotic system that has to classify objects in its environment. The developed system is based on the so-called Adaptive Resonance Theory (ART) (see , e.g., [2]), and hence it is capable of incremental and fast on-line learning. We tested the approach on data originating from a laser range finder and the visual data from a camera on two different robotic platforms and different preprocessing features.

The paper is structured as follows: in Section 2, we discuss related work on this topic. The following Section 3 describes the theoretical background. Afterwards, we present our approach in Section 4. Section 5 presents the evaluation of our approach based on data originating from two different robotic systems. Finally, we summarize the most important results in Section 6 and give an outlook on possible future work.

Section snippets

Related work

Established information fusion architectures applied in robotics [3], [4] resort to predefined rules to find corresponding object representations in multiple sensor modalities. On the one hand, these approaches have the advantage that the results from different sensors (e.g., a laser scanner or a face detector [3]) or different sub-architectures (e.g., speech recognition system and visual tracking system [4]) can be used to integrate information for higher-level processing systems like

Theoretical background

ART networks are a family of competitive unsupervised neural networks sharing important properties such as the ability for fast incremental on-line learning. They have a common structure comprising a minimum of two layers: the comparison layer $F 1$ and the recognition layer $F 2$ . Some members of this family such as Fuzzy ART possess an additional input layer $F 0$ . During learning, templates representing presented input vectors are formed. These templates are called categories. They are encoded by the

Simplified Fusion ARTMAP

Our approach is based on the Fusion ARTMAP architecture. As described in the previous section, Fusion ARTMAP learns a mapping from input data to categories representing a target $\vec{b}$ containing real-valued elements. In principle, Fusion ARTMAP could also be used for classification, for instance, by representing each class as one dimension in the target vector and using a high value only for the correct class label. However, this approach has some disadvantages. If for example, a new class is

Evaluation

Our approach was evaluated on three different data sets originating from two different platforms. The first two data sets were recorded on the same indoor robot platform and the third one on an outdoor system. In this section, we briefly describe the two robot platforms as well as the concrete setups in which the data sets were collected. Furthermore, a detailed description is given of the features which are calculated for the classification on the data in the different setups. Finally, for

Conclusion and future work

As shown in the previous section, the SiFuAM is able to learn a good classification on test data sets from different sensors with different features originating from two completely different robot platforms. The system was tested on three different data sets with smaller, medium, and larger size. The SiFuAM performs well on all test data sets and outperforms the SFAM on one of them. For the second and third data set, the performance of the networks were similar, but on the first data set the

Acknowledgments

This work was partially funded by the German Research Foundation (DFG and Excellence Cluster 277 “Cognitive Interaction Technology”. It is also supported by the Federal Ministry of Economics and Technology (BMWi) on the basis of a decision by the German Bundestag, Grant no. 50RA1113 and 50RA1114.

Elmar Berghöfer received his Diplom degree in computer science from the University of Bielefeld, Germany, in 2011. His diploma thesis concerns object recognition based on multi-modal sensor information. He joined the German Research Center for Artificial Intelligence DFKI, Robotics Innovation Center, in Bremen, Germany in 2011. His research interests include, the field of cognitive robotics, object classification, sensor fusion, and on-line learning.

References (30)

G.A. Carpenter et al.
A massively parallel architecture for a self-organizing neural pattern recognition machine
Comput. Vis. Graph. Image Process.
(1987)
J. Fritsch et al.
Multi-modal anchoring for human–robot interaction
Robot. Autonom. Syst.
(2003)
C. Martin et al.
Multi-modal sensor fusion using a probabilistic aggregation scheme for people detection and tracking
Robot. Autonom. Syst.
(2006)
G.A. Carpenter et al.
Fuzzy ARTfast stable learning and categorization of analog patterns by an adaptive resonance system
Neural Networks
(1991)
J.R. Williamson
Gaussian ARTMAPa neural network for fast incremental learning of noisy multidimensional maps
Neural Networks
(1996)
G.A. Carpenter et al.
ARTMAPsupervised real-time learning and classification of nonstationary data by a self-organizing neural network
Neural Networks
(1991)
S. Wachsmuth, F. Siepmann, D. Schulze, A. Swadzba, ToBI—Team of Bielefeld: The Human–Robot Interaction System for...
H. Jacobsson, N. Hawes, G.-J. Kruijff, J. Wyatt, Crossmodal content binding in information-processing architectures,...
R. Cutler, L. Davis, Look who's talking: speaker detection using video and audio correlation, in: Proceedings of the...
D. Stork, G. Wolff, E. Levine, Neural network lipreading system for improved speech recognition, in: Proceedings of the...

X. Jin, S. Gupta, A. Ray, T. Damarla, Multimodal sensor fusion for personnel detection, in: 2011 Proceedings of the...

N. Nguyen, N. Nasrabadi, T. Tran, Robust multi-sensor classification via joint sparse representation, in: 2011...

R. Damarla, D. Ufford, Personnel detection using ground sensors, in: Proceedings of SPIE, vol. 6562, SPIE, 2007, pp....

H. Xing et al.

Ground target detection, classification and sensor fusion in distributed fiber seismic sensor network

Adv. Sensor Syst. Appl. III

(2007)

I.T. Podolak, K. Bartocha, A hierarchical classifier with growing neural gas clustering, in: Proceedings of the Ninth...

Cited by (10)

A fast ellipse extended target PHD filter using box-particle implementation
2018, Mechanical Systems and Signal Processing
Citation Excerpt :
In addition, to reduce the computational complexity, we developed fuzzy ART partition [19]. This partition is based on the fuzzy ART which is the neural network architecture and has a distinct merit of rapid stable learning [20–22]. Recently, based on fuzzy ART partition, we suggested an improved version, named the modified Bayesian ART (MB-ART) partition [23].
This paper presents a box-particle implementation of the ellipse extended target probability hypothesis density (ET-PHD) filter, called the ellipse extended target box particle PHD (EET-BP-PHD) filter, where the extended targets are described as a Poisson model developed by Gilholm et al. and the term “box” is here equivalent to the term “interval” used in interval analysis. The proposed EET-BP-PHD filter is capable of dynamically tracking multiple ellipse extended targets and estimating the target states and the number of targets, in the presence of clutter measurements, false alarms and missed detections. To derive the PHD recursion of the EET-BP-PHD filter, a suitable measurement likelihood is defined for a given partitioning cell, and the main implementation steps are presented along with the necessary box approximations and manipulations. The limitations and capabilities of the proposed EET-BP-PHD filter are illustrated by simulation examples. The simulation results show that a box-particle implementation of the ET-PHD filter can avoid the high number of particles and reduce computational burden, compared to a particle implementation of that for extended target tracking.
Learning point-to-point movements on an elastic limb using dynamic movement primitives
2015, Robotics and Autonomous Systems
Citation Excerpt :
These elastic features are essential for interaction safety as well as to acquire information during object contact e.g. in human–robot interaction. This information can then be fused with additional sensor modalities for higher level planning [11]. To acquire the unknown trajectory of the gear side a reinforcement learning technique called Policy Improvement with Path Integrals (PI2) [12] will be used.
Compliance is an important feature for robots in human–machine interaction to increase the safety of humans during such interactions. While active compliance is online adaptable and easy to control its dynamic response is determined by the sampling rate and the response rate of the controller. In contrast, passive compliance is inherent to the system. It responds naturally fast to any perturbation exerted on the robot independently of the controller frequency.
This paper introduces an extension to the DMP framework which facilitates the generation of trajectories for passive compliant joint drives using reinforcement learning. The compliance of the limb is preserved entirely during motion as well as at the goal position. The proposed approach is evaluated with a simulation of a 2 DOF robot limb with passive compliant joint drives. Experiments are presented for point-to-point movements. The results demonstrate that the proposed approach is capable of generating trajectories for point-to-point movements for the gear-side of a compliant limb such that the drive-side follows a desired trajectory.
Multimodal fusion engine for an intelligent assistance robot using ontology
2015, Procedia Computer Science
With the increasing emergence of ambient intelligence, sensors and wireless network technologies, robotic assistance becomes a very active area of research in autonomous intelligent systems. Robotic systems would be integrated in the environment as physical autonomous entities. These entities will be able to interact independently with the ambient environment and provide services such as assistance to people at homes, offices, buildings and public spaces. Furthermore, robots as cognitive entities will be able to coordinate their activities with other physical or logical entities, to move, to feel and explore the surrounding environment, decide and act to meet the situations they may encounter. These cognitive operations will be part of a smart network which can provide individually or collectively, new features and various support services anywhere and anytime. The aim of this research work is to build a multimodal fusion engine using the semantic web. This multimodal system will be applied on a wheelchair with a manipulated arm to help people with disabilities interact with their main tool of movement and their environment. This work focuses on building a multimodal interaction fusion engine to better understand the multimodal inputs using the concept of ontology.
TPPFAM: Use of threshold and posterior probability for category reduction in fuzzy ARTMAP
2014, Neurocomputing
Citation Excerpt :
These supervised learning systems have many merits, such as online real-time and incremental learning [3], the use of few training epochs to achieve reasonably accuracy [4], allowing learning new data without forgetting past data, and tacking the so-called “plasticity-stability dilemma” [5], and so on. The FAM and its modified versions have been widely used in many application fields, including robotic application scenarios [6], information fusion [7], data mining [8], image classification, genetic abnormality diagnosis [9], etc. Though large [10] or noisy databases [11] can cause FAM generating too many categories, i.e., category proliferation, the problem is most serious when the network architecture is trained with data of overlapping classes [12].
The issue of category proliferation caused by the overlapping classes in fuzzy ARTMAP (FAM) is addressed in this paper. A new FAM-based neural architecture called TTPFAM is proposed, which can reduce category proliferation by performing a threshold filtering mechanism before a new category created during training, and improve the classification accuracy by combining prediction distributed by the dynamic Q-max rule and posterior probability estimated during testing. The TPPFAM can produce a small size of neural network architecture without degradation of the classification accuracy. The algorithm is evaluated in terms of the classification accuracy and the number of categories by experiments on both artificial and real data, and the results show that the performance of TPPFAM is better than that of the other models.
Supervised adapted factor analysis algorithm for multimodal data classification
2017, Proceedings of the 2017 12th IEEE Conference on Industrial Electronics and Applications, ICIEA 2017
Joint learning of cross-modal classifier and factor analysis for multimedia data classification
2016, Neural Computing and Applications

View all citing articles on Scopus

Denis Schulze studied computer science at the University of Bielefeld and received his Diplom in October 2008. The topic of his diploma thesis was “Bildvorhersage für einen mobilen Roboter mit Hilfe von Vorwärtsmodellen” (Image Prediction for a Mobile Robot using Forward Models). It was supervised by Prof. Dr.-Ing. Ralf Möller and Dr.-Ing. Wolfram Schenck. Now he is a member of the Applied Informatics group, working at the CITEC project “Action Selection Based on Multi-Modal Anchoring.”

Christian Rauch obtained his Diplom degree in communication engineering in 2008 and his MEng degree in information and communication technology in 2011, both at the University of Applied Sciences for Telecommunications in Leipzig, Germany. He joined the Robotics Research Group at the University of Bremen in 2011. His research interests cover the area of robotics in general and in particular machine learning and signal processing.

Marko Tscherepanow received his Diplom degree in computer science from Ilmenau University of Technology, Germany, in 2003. Afterwards, he joined the Applied Informatics Group at Bielefeld University, Germany. In December 2007, he finished his PhD thesis entitled “Image Analysis Methods for Location Proteomics”. His current research interests include incremental on-line learning, ART neural networks, evolutionary optimisation, feature selection, and the application of such methods in cognitive robotics.

Tim Köhler studied computer science with a focus on robotics and neural networks at Bielefeld University, Germany. After receiving his Diplom in computer science, he worked as research assistant at the Cognitive Psychology Group and at the Computer Engineering Group (both at Bielefeld University). In his PhD studies in biorobotics he focused on the transfer of (neuro-) biological concepts to the robotic domain. Since 2010 he is working as postdoc in the area of space and underwater robotics at the German Research Center for Artificial Intelligence (DFKI) in Bremen, Germany. His main research interests are biorobotics and cognitive science.

Sven Wachsmuth is holding a lecturer position at Bielefeld University and is currently heading the Central Lab Facilities of the Center of Excellence Cognitive Interaction Technology (CITEC). He received the PhD degree in computer science from Bielefeld University in 2001. In 2003, he spent a sabbatical year at the Computer Science Department of the University of Toronto. His research interests are in Human-Robot Interaction, especially looking at high-level computer vision problems, and system integration and evaluation aspects. He has been a member of the RoboCup@Home organization committee 2010-2011 and is member of the technical committee in 2012.

View full text

ART-based fusion of multi-modal perception for robots

Abstract

Introduction

Section snippets

Related work

Theoretical background

Simplified Fusion ARTMAP

Evaluation

Conclusion and future work

Acknowledgments

Comput. Vis. Graph. Image Process.

Robot. Autonom. Syst.

Robot. Autonom. Syst.

Neural Networks

Neural Networks

Neural Networks

Ground target detection, classification and sensor fusion in distributed fiber seismic sensor network

Adv. Sensor Syst. Appl. III