Elsevier

Neurocomputing

Volume 107, 1 May 2013, Pages 11-22
Neurocomputing

ART-based fusion of multi-modal perception for robots

https://doi.org/10.1016/j.neucom.2012.08.035Get rights and content

Abstract

Robotic application scenarios in uncontrolled environments pose high demands on mobile robots. This is especially true if human–robot interaction or robot–robot interaction is involved. Here, potential interaction partners need to be identified. To tackle challenges like this, robots make use of different sensory systems. In many cases, these robots have to deal with erroneous data from different sensory systems which often are processed separately. A possible strategy to improve identification results is to combine different processing results of complementary sensors. Their relation is often hard coded and difficult to learn incrementally if new kinds of objects or events occur. In this paper, we present a new fusion strategy which we call the Simplified Fusion ARTMAP (SiFuAM) which is very flexible and therefore can be easily adapted to new domains or sensor configurations. As our approach is based on the Adaptive Resonance Theory (ART) it is inherently capable of incremental on-line learning. We show its applicability in different robotic scenarios and platforms and give an overview of its performance.

Introduction

Artificial systems operating in natural environments are faced with different challenges: for example, a speech recognition system is affected by noise or a mobile robot could be hampered by an obstacle on its path. The quality of information that a robotic system can gather about its environment is crucial for its success in solving the described tasks.

In the examples mentioned above, the integration of data originating from different sensors can improve the quality of the overall classification performance of the system. For example, if a mobile robot is able to combine visual and laser scan data to distinguish between a pillar and a human blocking its way, it could adapt its behavior strategy accordingly. It could, for example, ask the human to move out of the way, while the same strategy would fail in case of the pillar.

Not only the identification of persons as possible interaction partners for the robot is interesting but also the recognition of other robots or specific objects. An example could be a robot that operates in an extraterrestrial scenario like in a lunar environment. Special limitations of extraterrestrial applications are the communication latency, the small communication time window per day, and the data bandwidth limitation which all constrain the applicability of remote-controlled systems. This increases the demand for robots that are able to perform tasks autonomously. To solve a task in an autonomous way the robot has to decide which behavior is best to approach the task. This requires the robot to have information about its environment such as the availability of tools, further equipment, or other robots, and hence object identification is a prerequisite here.

Robot systems that can be applied in such terrestrial or extraterrestrial scenarios usually possess multiple sensor modalities such as a camera, a laser scanner, and others (e.g., see [1]). If the data originating from these sensors can be combined in an appropriate way, the reaction of the robot can be more reliable due to the higher quality of information for the classification. Furthermore, the environment of the robot is not always static, for instance, new obstacles, persons, or other robots might appear. In this case, the robot should be able to deal with the new situation and adapt its behavior, that is, it should learn to identify these new objects.

In this paper, we propose and discuss an algorithm for the integration of data originating from different sensors to solve classification tasks like those already mentioned. In particular, we discuss a robotic system that has to classify objects in its environment. The developed system is based on the so-called Adaptive Resonance Theory (ART) (see , e.g., [2]), and hence it is capable of incremental and fast on-line learning. We tested the approach on data originating from a laser range finder and the visual data from a camera on two different robotic platforms and different preprocessing features.

The paper is structured as follows: in Section 2, we discuss related work on this topic. The following Section 3 describes the theoretical background. Afterwards, we present our approach in Section 4. Section 5 presents the evaluation of our approach based on data originating from two different robotic systems. Finally, we summarize the most important results in Section 6 and give an outlook on possible future work.

Section snippets

Related work

Established information fusion architectures applied in robotics [3], [4] resort to predefined rules to find corresponding object representations in multiple sensor modalities. On the one hand, these approaches have the advantage that the results from different sensors (e.g., a laser scanner or a face detector [3]) or different sub-architectures (e.g., speech recognition system and visual tracking system [4]) can be used to integrate information for higher-level processing systems like

Theoretical background

ART networks are a family of competitive unsupervised neural networks sharing important properties such as the ability for fast incremental on-line learning. They have a common structure comprising a minimum of two layers: the comparison layer F1 and the recognition layer F2. Some members of this family such as Fuzzy ART possess an additional input layer F0. During learning, templates representing presented input vectors are formed. These templates are called categories. They are encoded by the

Simplified Fusion ARTMAP

Our approach is based on the Fusion ARTMAP architecture. As described in the previous section, Fusion ARTMAP learns a mapping from input data to categories representing a target b containing real-valued elements. In principle, Fusion ARTMAP could also be used for classification, for instance, by representing each class as one dimension in the target vector and using a high value only for the correct class label. However, this approach has some disadvantages. If for example, a new class is

Evaluation

Our approach was evaluated on three different data sets originating from two different platforms. The first two data sets were recorded on the same indoor robot platform and the third one on an outdoor system. In this section, we briefly describe the two robot platforms as well as the concrete setups in which the data sets were collected. Furthermore, a detailed description is given of the features which are calculated for the classification on the data in the different setups. Finally, for

Conclusion and future work

As shown in the previous section, the SiFuAM is able to learn a good classification on test data sets from different sensors with different features originating from two completely different robot platforms. The system was tested on three different data sets with smaller, medium, and larger size. The SiFuAM performs well on all test data sets and outperforms the SFAM on one of them. For the second and third data set, the performance of the networks were similar, but on the first data set the

Acknowledgments

This work was partially funded by the German Research Foundation (DFG and Excellence Cluster 277 “Cognitive Interaction Technology”. It is also supported by the Federal Ministry of Economics and Technology (BMWi) on the basis of a decision by the German Bundestag, Grant no. 50RA1113 and 50RA1114.

Elmar Berghöfer received his Diplom degree in computer science from the University of Bielefeld, Germany, in 2011. His diploma thesis concerns object recognition based on multi-modal sensor information. He joined the German Research Center for Artificial Intelligence DFKI, Robotics Innovation Center, in Bremen, Germany in 2011. His research interests include, the field of cognitive robotics, object classification, sensor fusion, and on-line learning.

References (30)

  • X. Jin, S. Gupta, A. Ray, T. Damarla, Multimodal sensor fusion for personnel detection, in: 2011 Proceedings of the...
  • N. Nguyen, N. Nasrabadi, T. Tran, Robust multi-sensor classification via joint sparse representation, in: 2011...
  • R. Damarla, D. Ufford, Personnel detection using ground sensors, in: Proceedings of SPIE, vol. 6562, SPIE, 2007, pp....
  • H. Xing et al.

    Ground target detection, classification and sensor fusion in distributed fiber seismic sensor network

    Adv. Sensor Syst. Appl. III

    (2007)
  • I.T. Podolak, K. Bartocha, A hierarchical classifier with growing neural gas clustering, in: Proceedings of the Ninth...
  • Cited by (10)

    • A fast ellipse extended target PHD filter using box-particle implementation

      2018, Mechanical Systems and Signal Processing
      Citation Excerpt :

      In addition, to reduce the computational complexity, we developed fuzzy ART partition [19]. This partition is based on the fuzzy ART which is the neural network architecture and has a distinct merit of rapid stable learning [20–22]. Recently, based on fuzzy ART partition, we suggested an improved version, named the modified Bayesian ART (MB-ART) partition [23].

    • Learning point-to-point movements on an elastic limb using dynamic movement primitives

      2015, Robotics and Autonomous Systems
      Citation Excerpt :

      These elastic features are essential for interaction safety as well as to acquire information during object contact e.g. in human–robot interaction. This information can then be fused with additional sensor modalities for higher level planning [11]. To acquire the unknown trajectory of the gear side a reinforcement learning technique called Policy Improvement with Path Integrals (PI2) [12] will be used.

    • TPPFAM: Use of threshold and posterior probability for category reduction in fuzzy ARTMAP

      2014, Neurocomputing
      Citation Excerpt :

      These supervised learning systems have many merits, such as online real-time and incremental learning [3], the use of few training epochs to achieve reasonably accuracy [4], allowing learning new data without forgetting past data, and tacking the so-called “plasticity-stability dilemma” [5], and so on. The FAM and its modified versions have been widely used in many application fields, including robotic application scenarios [6], information fusion [7], data mining [8], image classification, genetic abnormality diagnosis [9], etc. Though large [10] or noisy databases [11] can cause FAM generating too many categories, i.e., category proliferation, the problem is most serious when the network architecture is trained with data of overlapping classes [12].

    • Supervised adapted factor analysis algorithm for multimodal data classification

      2017, Proceedings of the 2017 12th IEEE Conference on Industrial Electronics and Applications, ICIEA 2017
    View all citing articles on Scopus

    Elmar Berghöfer received his Diplom degree in computer science from the University of Bielefeld, Germany, in 2011. His diploma thesis concerns object recognition based on multi-modal sensor information. He joined the German Research Center for Artificial Intelligence DFKI, Robotics Innovation Center, in Bremen, Germany in 2011. His research interests include, the field of cognitive robotics, object classification, sensor fusion, and on-line learning.

    Denis Schulze studied computer science at the University of Bielefeld and received his Diplom in October 2008. The topic of his diploma thesis was “Bildvorhersage für einen mobilen Roboter mit Hilfe von Vorwärtsmodellen” (Image Prediction for a Mobile Robot using Forward Models). It was supervised by Prof. Dr.-Ing. Ralf Möller and Dr.-Ing. Wolfram Schenck. Now he is a member of the Applied Informatics group, working at the CITEC project “Action Selection Based on Multi-Modal Anchoring.”

    Christian Rauch obtained his Diplom degree in communication engineering in 2008 and his MEng degree in information and communication technology in 2011, both at the University of Applied Sciences for Telecommunications in Leipzig, Germany. He joined the Robotics Research Group at the University of Bremen in 2011. His research interests cover the area of robotics in general and in particular machine learning and signal processing.

    Marko Tscherepanow received his Diplom degree in computer science from Ilmenau University of Technology, Germany, in 2003. Afterwards, he joined the Applied Informatics Group at Bielefeld University, Germany. In December 2007, he finished his PhD thesis entitled “Image Analysis Methods for Location Proteomics”. His current research interests include incremental on-line learning, ART neural networks, evolutionary optimisation, feature selection, and the application of such methods in cognitive robotics.

    Tim Köhler studied computer science with a focus on robotics and neural networks at Bielefeld University, Germany. After receiving his Diplom in computer science, he worked as research assistant at the Cognitive Psychology Group and at the Computer Engineering Group (both at Bielefeld University). In his PhD studies in biorobotics he focused on the transfer of (neuro-) biological concepts to the robotic domain. Since 2010 he is working as postdoc in the area of space and underwater robotics at the German Research Center for Artificial Intelligence (DFKI) in Bremen, Germany. His main research interests are biorobotics and cognitive science.

    Sven Wachsmuth is holding a lecturer position at Bielefeld University and is currently heading the Central Lab Facilities of the Center of Excellence Cognitive Interaction Technology (CITEC). He received the PhD degree in computer science from Bielefeld University in 2001. In 2003, he spent a sabbatical year at the Computer Science Department of the University of Toronto. His research interests are in Human-Robot Interaction, especially looking at high-level computer vision problems, and system integration and evaluation aspects. He has been a member of the RoboCup@Home organization committee 2010-2011 and is member of the technical committee in 2012.

    View full text