Elsevier

Robotics and Autonomous Systems

Volume 110, December 2018, Pages 12-32
Robotics and Autonomous Systems

Context-aware 3D object anchoring for mobile robots

https://doi.org/10.1016/j.robot.2018.08.016Get rights and content

Highlights

  • Anchoring system keeps track of objects for robot planning, learning and execution.

  • Builds online graph-based 3D world model of objects in the environment.

  • Can integrate arbitrary object recognition method.

  • Exploits context between objects to improve object recognition results.

Abstract

A world model representing the elements in a robot’s environment needs to maintain a correspondence between the objects being observed and their internal representations, which is known as the anchoring problem. Anchoring is a key aspect for an intelligent robot operation, since it enables high-level functions such as task planning and execution. This work presents an anchoring system that continually integrates new observations from a 3D object recognition algorithm into a probabilistic world model. Our system takes advantage of the contextual relations inherent to human-made spaces in order to improve the classification results of the baseline object recognition system. To achieve that, the system builds a graph-based world model containing the objects in the scene (both in the current and previously perceived observations), which is exploited by a Probabilistic Graphical Model (PGM) in order to leverage contextual information during recognition. The world model also enables the system to exploit information about objects beyond the current field of view of the robot sensors. Most importantly, this is done in an online fashion, overcoming both the disadvantages of single-shot recognition systems (e.g., limited sensor aperture) and offline recognition systems that require prior registration of all frames of a scene (e.g., dynamic scenes, unsuitability for plan-based robot control). We also propose a novel way to include the outcome of local object recognition methods in the PGM, which results in a decrease in the usually high model learning complexity and an increase in the system performance. The system performance has been assessed with a dataset collected by a mobile robot from restaurant-like settings, obtaining positive results for both its data association and object recognition capabilities. The system has been successfully used in the RACE robotic architecture.

Introduction

Mobile robots need to create and maintain internal representations of their surroundings for planning and executing tasks involving elements in them. Traditional tasks like navigation or localization have already well-suited solutions for building such representations, for creating metric [1], [2], topological [3], [4] or hybrid maps [5], [6]. More sophisticated world models arose for dealing with higher-level tasks, called semantic maps [7], [8], [9], which codify information from the exploration of the environment, but also consider semantic knowledge (or meta information) about the elements that can be found in the robot workspace, their properties, and their relations. Unlike the metric and topological maps case, the algorithms for building and exploiting these models are not that well-defined, and there is still significant room to explore.

One of the most critical steps during the building of these maps is creating and maintaining the correspondence between the object percepts detected in the workspace and their conceptual representation in the world model. This is known as the anchoring problem: “We call anchoring the process of creating and maintaining the correspondence between symbols and sensor data that refer to the same physical objects. The anchoring problem is the problem of how to perform anchoring in an artificial system”. [10, p. 86f]. If we move our discourse away from semantic maps, anchoring would be still necessary for any system pursuing a plan-based robot control, where symbols are just object identifiers or labels used by the planner. However, notice the potential of semantic maps where symbols are further linked to concepts codifying functionalities and relations.

The anchoring process links object percepts to their conceptually defined categories, e.g.: spoon, knife, mug, etc. This linking is accomplished by an object recognition method, which assigns a category to the percept. The kind of recognition algorithms traditionally exploited for this aim are referred to as local object recognition methods within this paper, since they work by individually classifying the perceived percepts according to their local features, e.g., size, shape, appearance, etc. [11]. However, it is well-known that local recognition methods are prone to provide ambiguous results [12], [13], [14], which can lead to wrong linkings in the anchoring process. This results in a incoherent world model and in failing task executions.

In this work we contribute an anchoring system with a distinctive feature: it does not simply copy the classification results from a local object recognition system, but instead uses the relations between objects (their spatial context) to improve those classification results. This is motivated by the fact that objects rarely occur in independent configurations at identically distributed locations. Rather, there is a coherent structure inherent to most real-world scenes. For example, a longish object in front of a monitor has a high probability of being a keyboard, whereas an object with the same local appearance next to a bread knife is more likely a cutting board.

In cases where local appearance features are not sufficient, contextual features can disambiguate object appearance in object recognition tasks [14]. Jointly considering context-based object categorization and anchoring as parts of a context-aware anchoring system benefits both subproblems: Anchoring receives better and more stable object classification results, while context-based object categorization can use contextual relations with anchored objects that extend beyond the sensor aperture.

As an example of this, consider Fig. 1. Due to the position of the robot and the aperture of the RGB-D camera, only part of the table scene is visible in the current sensor frame, so the full context is not available. This poses a problem for most existing context-aware object recognition systems, which fall roughly into one of the following two categories. The first category is single-frame recognition systems, which recognize objects relying on single observations of the scene in the form of RGB, depth or RGB-D images [15], [16], [17], [18], [19], [20]. Regarding the exploitation of contextual information, single-frame systems are seriously limited by the sensor aperture and occlusions, given that they are able to observe only a portion of the objects and relations appearing in the whole scene. The second category are offline recognition systems, which register a number of observations prior to the recognition process in order to obtain a wider view of the scene [21], [22], [23], [24], [25], [26], [27], [28], [29]. This solves the problems caused by sensor aperture and occlusions; however, the need to finish recording the sensor data before running the object recognition process prevents online operation, which is a requirement for most plan-based robot control systems.

Our approach is to continually process single frames using a local object recognition method and integrate the object recognition results into a persistent probabilistic world model. We then use a Conditional Random Field (CRF) [30] to exploit contextual relations between objects in the current scene as well as relations with previously perceived objects from the world model to improve the recognition results.

This approach has the following advantages:

  • Our system can exploit contextual relations with objects that are currently out of view while still being capable of online operation (meaning that the output of the system is updated as soon as new sensor data comes in).

  • The world model allows us to consistently assign the same object ID to an object without requiring that the object be constantly tracked. By anchoring symbolic object IDs to the objects reported by the local object recognition method, the planner and plan executor can refer to an object by the same symbol even after the object has disappeared from view for a prolonged period.

  • The proposed system can integrate any state-of-the-art local object recognition system that processes single frames and supplement it with additional context information, achieving a significant boost in classification accuracy at a low computational overhead.

The next section relates our work to the state of the art. Section 3 describes the role of our system within the RACE project. In Section 4, we describe the proposed context-aware anchoring system, and experimental results are reported in Section 5. Finally, Section 6 contains the conclusions and possible future work.

Section snippets

State of the art

This section starts with the discussion of related works concerning the two traditional ways to exploit contextual relations for object recognition: based on (offline) full-scene (Section 2.1), or on single-frame processing (Section 2.2). Next, the relation between the presented work and the emerging field of Dense 3D Semantic Mapping (Section 2.3) is discussed. Finally, a number of relevant works addressing anchoring and world modeling are reported (see Section 2.4).

Anchoring in the RACE architecture

The anchoring system presented here has been successfully employed in the context of the RACE project [64]. In RACE, all high-level modules communicate via the so-called blackboard, which is implemented on top of an RDF database. The elements stored on the blackboard are called fluents, i.e., temporally valid ground facts of a Description Logic (DL) ontology. Fluents have both a start and finish time between which they are valid. Since the main objective of the RACE project was enabling a robot

Context-aware anchoring

The proposed context-aware anchoring system is a combination of: (i) a local object recognition method, (ii) an anchoring process, and (iii) a Conditional Random Field (see Fig. 3). The local object recognition method1 (Section 4.1)

System evaluation

To evaluate our system, we collected a dataset of 15 different scenes. After a description of the dataset and the cross validation scheme, we first present a separate evaluation of the object association part of the system (Section 5.3). This is followed by an evaluation of the complete system (Section 5.4).

Conclusions

This work has presented an anchoring system able to exploit the contextual relations of the objects in the scene to achieve more coherent and stable results, aiming to build suitable representations of the robot workspace. This is a critical aspect of these systems when running in robotic architectures, since a wrong linking between an object and its category would end up with failures, for example, during task planing or execution. To achieve that we rely on a probabilistic, context-based

Acknowledgments

This work is supported by the European projects RACE, Germany (FP7-ICT-2011-7, grant agreement number 287752) and MoveCare, Italy (H2020-ICT-2016-1, grant agreement number732158), by the WISER project (reference DPI2017-84827-R) funded by the Spanish Government, Spain and financed by European Regional Development’s funds (FEDER), Spain, and by a postdoc contract from the I-PPIT-UMA program financed by the University of Málaga, Spain .

Martin Günther is a researcher at DFKI, the German Research Center for Artificial Intelligence. He earned his Diploma in Computer Science from Technical University Dresden, Germany, in 2008. From 2009 until 2015 he worked as a research associate at the Knowledge-Based Systems group at Osnabrück University, Germany. His research interests include 3D perception, semantic mapping, context-aware object tracking and anchoring, active perception and goal-directed action in autonomous robot control.

References (80)

  • Ruiz-SarmientoJosé Raúl et al.

    A survey on learning approaches for undirected graphical models. Application to scene object recognition

    Internat. J. Approx. Reason.

    (2017)
  • ElfringJos et al.

    Semantic world modeling using probabilistic multiple hypothesis anchoring

    Robot. Auton. Syst.

    (2013)
  • Ruiz-SarmientoJosé Raúl et al.

    Building multiversal semantic maps for mobile robot operation

    Knowl.-Based Syst.

    (2017)
  • EstermanMichael et al.

    Avoiding non-independence in fMRI data analysis: Leave one subject out

    NeuroImage

    (2010)
  • ElfesA.

    Sonar-based real-world mapping and navigation

    IEEE J. Robot. Automat.

    (1987)
  • ThrunSebastian

    Learning occupancy grid maps with forward sensor models

    Auton. Robot.

    (2003)
  • RanganathanA. et al.

    Bayesian inference in the space of topological maps

    IEEE T. Robot.

    (2006)
  • ZhangJianguo et al.

    Local features and kernels for classification of texture and object categories: A comprehensive study

    Int. J. Comput. Vision

    (2007)
  • DivvalaS.K. et al.

    An empirical study of context in object detection

  • RenXiaofeng et al.

    RGB-(D) scene labeling: Features and algorithms

  • XiangYu et al.

    Semantic context modeling with maximal margin conditional random fields for automatic image annotation

  • TorralbaAntonio B. et al.

    Context-based vision system for place and object recognition

  • HoiemDerek et al.

    Putting objects in perspective

    Int. J. Comput. Vision

    (2008)
  • GalleguillosCarolina et al.

    Object categorization using co-occurrence, location and appearance

  • ShottonJamie et al.

    TextonBoost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context

    Int. J. Comput. Vision

    (2009)
  • KoppulaHema Swetha et al.

    Semantic labeling of 3D point clouds for indoor scenes

  • AnandAbhishek et al.

    Contextually guided semantic labeling and search for three-dimensional point clouds

    Int. J. Robot. Res.

    (2013)
  • XiongXuehan et al.

    Using context to create semantic 3D models of indoor environments

  • ValentinJ.P.C. et al.

    Mesh based semantic modelling for indoor and outdoor scenes

  • Ruiz-SarmientoJosé Raúl et al.

    Mobile robot object recognition through the synergy of probabilistic graphical models and semantic knowledge

  • AlbertiMarina et al.

    Relational approaches for joint object classification and scene similarity measurement in indoor environments

  • KunzeLars et al.

    Combining top-down spatial reasoning and bottom-up object class recognition for scene understanding

  • ThippurAkshaya et al.

    A comparison of qualitative and metric spatial relation models for scene understanding

  • KollerD. et al.

    Probabilistic Graphical Models: Principles and techniques

    (2009)
  • LimGi Hyun et al.

    Knowledge-based incremental bayesian learning for object recognition

  • LaiKevin et al.

    A large-scale hierarchical multi-view RGB-D object dataset

  • SocherRichard et al.

    Convolutional-recursive deep learning for 3D object classification

  • EitelAndreas et al.

    Multimodal deep learning for robust RGB-D object recognition

  • SongShuran et al.

    SUN RGB-D: A RGB-D scene understanding benchmark suite

  • SilbermanNathan et al.

    Indoor segmentation and support inference from RGBD images

  • Cited by (13)

    • A survey of Semantic Reasoning frameworks for robotic systems

      2023, Robotics and Autonomous Systems
      Citation Excerpt :

      Second, the use of cliques to define a joint distribution makes parameterizing the model by hand more difficult. The second limitation leads to the convention of learning clique potentials from training examples, which requires apriori data to converge to reasonable parameters [109–111]. Uses and Applications: Markov networks have been used to model spatial and contextual relations between objects.

    • Ontology-based conditional random fields for object recognition

      2019, Knowledge-Based Systems
      Citation Excerpt :

      This approach could be extended by integrating the CNN scores in the unary factors instead of replacing them, which permits the CRF to model complementary higher-level features not computed by CNNs while keeping its complexity low. This promising idea was explored in [64,65] in the context of object recognition by a mobile robot, but using an off-the-shelf model instead of a CNN. The performance of obCRFs has been tested within theRobot@Home and Cornell-RGBD datasets, which are briefly introduced in Section 6.1.

    • Object Anchoring for Autonomous Robots Using the Spatio-Temporal-Semantic Environment Representation SEEREP

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    Martin Günther is a researcher at DFKI, the German Research Center for Artificial Intelligence. He earned his Diploma in Computer Science from Technical University Dresden, Germany, in 2008. From 2009 until 2015 he worked as a research associate at the Knowledge-Based Systems group at Osnabrück University, Germany. His research interests include 3D perception, semantic mapping, context-aware object tracking and anchoring, active perception and goal-directed action in autonomous robot control.

    Jose-Raul Ruiz-Sarmiento received the B.Sc in “Computer Science Engineering” from the University of Málaga in July 2009, one year later obtained the M.Sc in “Mechatronics”, and in November 2016 completed his Ph.D. As part of such Ph.D., in 2014 he was a visitor in the Knowledge-Based Systems Research Group, at Osnabrück Universtiy, Germany. Since he joined the MAPIR group in September 2008, he has been involved in different national and European projects, and has developed a number of open-source tools related to its research lines: object and room recognition, semantic mapping, and machine learning, all in the scope of robotics. His research activity has produced more than 20 publications.

    Cipriano Galindo received the Ph.D. degree (2006) in Computer Science from the University of Málaga, Spain. Since 2009 he is full time associate professor at the same University. In 2004–2005 he was at the Applied Autonomous Sensor Systems, Örebro University (Sweden), working on semantic maps and intelligent systems. His research focuses on service robotics, telepresence, and Quality of Life Technologies (QoLT), being (co)author of 20 JCR papers, 38 international conferences and 2 books.

    Javier Gonzalez-Jimenez received the B.S. degree in Electrical Engineering from the University of Seville in 1987. Then, he joined the Department of “Ingenieria de Sistemas y Automatica” at the University of Málaga in 1988 and received the Ph.D. from this University in 1993. In 1990–1991 he was at the Field Robotics Center, Robotics Institute, Carnegie Mellon University (USA) working on mobile robots as part of his Ph.D. Since 1996 he has been leading Spanish and European projects on mobile robotics and perception. Currently, he is the head of the MAPIR group and full professor at the University of Málaga. His research interests include mobile robot autonomous navigation, computer vision and robotic olfaction. In these fields he has published three books and more than 200 papers.

    Joachim Hertzberg is a full professor for computer science at Osnabrück University, Germany, heading the Knowledge-Based Systems lab. Since 2011, he is also head of the Osnabrück branch of the Robotics Innovation Center of the German Research Center for Artificial Intelligence (DFKI). He has graduated in Computer Science (diploma U. Bonn, 1982; Dr.rer.nat. 1986, U. Bonn; habilitation 1995, U. Hamburg). Former affiliations include GMD and Fraunhofer AIS in Sankt Augustin. His areas of research are AI and Mobile Robotics, with contributions to action planning, plan-based robot control, sensor data interpretation, semantic mapping, reasoning about action, constraint-based reasoning, and various applications of these. In his research fields, he has been the PI in a number of national and European projects. At Osnabrück University, he served as the Dean of the School of Mathematics and Computer Science. Awards for his work include the EurAI fellowship received in 2014.

    View full text