Unsupervised semantic clustering and localization for mobile robotics tasks

https://doi.org/10.1016/j.robot.2020.103567Get rights and content

Abstract

Due to its vast applicability, the semantic interpretation of regions or entities increasingly attracts the attention of scholars within the robotics community. The paper at hand introduces a novel unsupervised technique to semantically identify the position of an autonomous agent in unknown environments. When the robot explores a certain path for the first time, community detection is achieved through graph-based segmentation. This allows the agent to semantically define its surroundings in future traverses even if the environment’s lighting conditions are changed. The proposed semantic clustering technique exploits the Louvain community detection algorithm, which constitutes a novel and efficient method for identifying groups of measurements with consistent similarity. The produced communities are combined with metric information, as provided by the robot’s odometry through a hierarchical agglomerative clustering method. The suggested algorithm is evaluated in indoors and outdoors datasets creating topological maps capable of assisting semantic localization. We demonstrate that the system categorizes the places correctly when the robot revisits an environment despite the possible lighting variation.

Introduction

Contemporary research in robotics endows modern autonomous systems with the ability to semantically recognize and segment regions/entities. This affords them with increased flexibility and the ability to interpret and interact with the environment on a higher level. Apart from places, robots are capable of recognizing objects and classify them in semantic areas. Overall, semantic interpretation is considered an active and fundamental research field, which relies mostly on computer vision methods for place recognition [1].

Localization (operated either on metric or topological maps) stands for a critical ability incorporated into nowadays robots, yet semantics, which can constitute an essential foundation for this capacity, is still underdeveloped. Moreover, the successful communication between humans and robots can be established only through the ability of the second to sense and classify their own surroundings by precisely recalling spatial memories [2]. Metric maps are mainly used for small-scale spaces [3] and they are organized in a geometric manner, while the relative conceptual remains hidden. The reveal of this hidden information can be achieved by reorganizing it on the basis of a topological map [4]. This is directly related with topology, the branch of mathematics studying characteristics invariant in continuous deformations [5], [6]. A scene’s description based on topological maps retains information regarding the semantic region it belongs. These graphs are a fundamental element of a semantic map, enabling abstraction to metric maps [3].

In order to construct a topological map, the similarity between the acquired camera measurements needs to be computed. Zhang et al. [7] identified the mechanisms for quantifying such similarities, as to the degree of abstracted information from the respective images, into the following categories. Image-based approaches rely on pixel differences between consecutive images to determine changes in a scene. Techniques based on local-features detect and try to associate various key-points, so as to measure the similarity between different frames [8], [9]. Furthermore, in histogram-based approaches, images are compared by means of features’ statistics [3]. Lastly, a typical approach for simplifying the information regarding a place is to address the problem with a Bag-of-Words (BoW) representation. The method of BoW describes the input measurements as a quantized set of local features, thus reducing the searching space to gain efficiency [10]. However, a scene contains randomly scattered features, thus a meaningful way to describe it is via histograms of visual words (visual word vectors) [11].

This paper addresses the problem of semantically mapping an environment. With the term semantic, we refer to the signs and the things to which they refer [1]. Thus, within the scope of this work, the term semantic mapping is used to indicate the identification and the recoding of visual signs and symbols that contain meaningful information. Our goal is to produce an unsupervised system to segment the robot’s trajectory into different semantic regions. Then, each time the robot passes through the same area, it gets self-localized within one of the computed semantic divisions by exclusively using visual information. This essentially means that a specific label for each trajectory segment is not necessarily required to achieve semantic localization. On the contrary, our method is able to identify distinct regions that preserve semantic consistency, however, a meaningful name can be assigned to each one of them with minimum effort after the map has being completed. We make use of the BoW image representation as a means of uniformly describing the captured images and shaping semantic clusters based on their similarities. The semantic interpretation of the robot’s path is achieved by applying the Louvain Community Detection Algorithm (LCDA) [12]. Semantically important properties of a particular environment are identified with great tolerance over various lighting conditions. Extending our previous work [13], we improved the development of the places’ semantic representation by introducing an additional hierarchical agglomerative clustering method to incorporate the information of a metric map. Using only LCDA, many snapshots of the same semantic region are redundantly grouped into different clusters due to dissimilarities occurring by observing the same scene from arbitrary orientations. This is a common problem for any appearance-based method that utilizes monocular and unidirectional cameras since the view of the environment changes significantly when different content is observed [8], [14], [15]. By means of geometrical information, such over-segmented areas can be consolidated, thus improving the quality of the map. A its final stage the map contains both semantic and metric information (topological map).

The rest of the paper is structured as follows. In Section 2, we discuss representative related work in the field of semantic mapping construction. Section 3 contains a detailed explanation of our approach. In Section 4, we present the results of our experiments, while in the last section, we draw conclusions and present suggestions for future work.

Section snippets

Related literature

Generally speaking, one may argue that a semantic map is a kind of topological map with purely semantic data about the places that a robot encounters together with objects they contain; those maps have been used for robot detection, mapping, navigation and classification [16], [17], [18], [19]. Such semantic informational systems can be obtained using either supervised or unsupervised methods. In what follows, we list some of the most representative techniques for both approaches.

Approach

In this section, our approach for semantic clustering and localization is detailed. We begin by describing the selected mechanism for representing the input images in a rotation/scale-invariant manner, as well as the algorithm for clustering the feature vectors according to their similarity. The formulated semantic groups are then refined using temporal and metric information with the aim to produce the topological map. Finally, we propose two different approaches for recognizing the exact

Experimental results

In this section, the proposed method is evaluated both in terms of semantic clustering and semantic localization. We applied our method on the following datasets: COLD-Freiburg [36], COLD-Saarbrücken [36], KTH-IDOL2 [43], and New College [44], which are comprised of explicit semantic regions. Two of the above, viz. COLD-Freiburg part A sequence1 cloudy1 and KTH-IDOL2 cloudy1, were used as validation sets to assess the effect of the system’s parameters on the performance. A different set of 5

Conclusions

In this paper a complete architecture for semantic segmentation and localization has been described. The available visual and odomentry data were combined in a clustering pipeline to produce the topological map of a previously unexplored environment. The results that emerged were constructive, concluding that by properly formulating the semantic map of an area, the correct categorization is achieved even under different lighting conditions.

Nowadays, the scientific interest has been fixed on

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is co-financed by Greece and the European Union (European Social Fund — ESF ) through the Operational Program «Human Resources Development, Education and Lifelong Learning» in the context of the project “Strengthening Human Resources Research Potential via Doctorate Research” (MIS-5000432), implemented by the State Scholarships Foundation (IKϒ).

Vasiliki Balaska received her diploma from the Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece in 2017. Currently, she is a Ph.D. student in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering in Democritus University of Thrace (DUTH), Greece, working on development of semantic mapping methods and their role in robotics. Her research is co-financed by Greece and the European Union (European

References (53)

  • H. Zhang, B. Li, D. Yang, Keyframe detection for appearance-based visual SLAM, in: Proceedings of the IEEE/RSJ...
  • F. Fraundorfer, C. Engels, D. Nistér, Topological mapping, localization and navigation using image collections, in:...
  • H. Korrapati, Y. Mezouar, P. Martinet, Efficient topological mapping with image sequence partitioning, in: Proceedings...
  • SivicJ. et al.

    Video Google: A text retrieval approach to object matching in videos

  • Fazl-ErsiE. et al.

    Histogram of oriented uniform patterns for robust place recognition and categorization

    Int. J. Robot. Res.

    (2012)
  • BlondelV.D. et al.

    Fast unfolding of communities in large networks

    J. Stat. Mech. Theory Exp.

    (2008)
  • BalaskaV. et al.

    Graph-based semantic segmentation

  • LynenS. et al.

    Trajectory-based place-recognition for efficient large scale localization

    Int. J. Comput. Vis.

    (2017)
  • BampisL. et al.

    Fast loop-closure detection using visual-word-vectors from image sequences

    Int. J. Robot. Res.

    (2018)
  • A.K. Krishnan, K. Krishna, A visual exploration algorithm using semantic cues that constructs image based hybrid maps,...
  • RanganathanA. et al.

    Semantic Modeling of Places Using Objects

    (2007)
  • C. Nieto-Granda, J.G. Rogers, A.J. Trevor, H.I. Christensen, Semantic map partitioning in indoor environments using...
  • L. Shi, S. Kodagoda, G. Dissanayake, Application of semi-supervised learning with voronoi graph for place...
  • M. Demir, H. Isil Bozma, Video summarization via segments summary graphs, in: Proceedings of IEEE International...
  • P. Uršič, M. Kristan, D. Skočaj, A. Leonardis, Room classification using a hierarchical representation of space, in:...
  • T. Yeh, T. Darrell, Dynamic visual category learning, in: Proceedings of the IEEE Conference on Computer Vision and...
  • Cited by (28)

    • Self-localization based on terrestrial and satellite semantics

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The autonomous generation of semantic maps aims to combine the strengths of Simultaneous Localization and Mapping (SLAM) techniques and object recognition to recover semantic-spatial knowledge about the environment, which can be used for task planning and exploration of “unknown” areas (Kostavelis et al., 2016; Galindo et al., 2008). Due to its applications in unmanned cars, agricultural robotics and any other application involving human–robot interaction, it is indeed a particularly active and growing research area (Balaska et al., 2020; Papapetros et al., 2020). Mapping and self-localization are fundamental tasks for autonomous vehicles, especially in environments with dynamic objects, variable illumination and challenging atmospheric conditions (Cadena et al., 2016).

    • Enhancing satellite semantic maps with ground-level imagery

      2021, Robotics and Autonomous Systems
      Citation Excerpt :

      In what follows, we list some of the most representative techniques. The generation of semantic maps was firstly applied for the semantic representation of indoors surroundings [19–26]. Then, the works of Boukas and Gasteratos [27] and Boukas et al. [28] focused on the semantic recording of the robot’s environment during planetary missions.

    View all citing articles on Scopus

    Vasiliki Balaska received her diploma from the Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece in 2017. Currently, she is a Ph.D. student in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering in Democritus University of Thrace (DUTH), Greece, working on development of semantic mapping methods and their role in robotics. Her research is co-financed by Greece and the European Union (European Social Fund — ESF) through the Operational Program Human Resources Development, Education and Lifelong Learning in the context of the project Strengthening Human Resources Research Potential via Doctorate Research (MIS-5000432), implemented by the State Scholarships Foundation (IKϒ), for three years. More details about her are available at http://robotics.pme.duth.gr/robotics/active/.

    Loukas Bampis received his diploma in Electrical and Computer Engineering in 2013 and Ph.D. in Machine Vision and Embedded Systems in 2018 from the Democritus University of Thrace (DUTH), Greece. He is currently a Postdoctoral Fellow in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering, DUTH. His work has been supported through several research projects funded by the European Space Agency, the European Commission and the Greek government. His research interests include real-time Localization and Place Recognition techniques using hardware accelerators and parallel processing. More details about him are available at http://robotics.pme.duth.gr/bampis.

    Moses Boudourides is in the Faculty of Northwestern University School of Professional Studies Data Science Program https://sps.northwestern.edu/masters/datascience/faculty.php and Affiliated Faculty at the Science of Networks in Communities (SONIC) at Northwestern University (http://sonic.northwestern.edu/people/affiliated-faculty/moses-boudourides/). Currently, 2019–2020, he is a Visiting Professor of Mathematics at the New York University Abu Dhabi. Previously, until 2017, he was Professor of Mathematics at the University of Patras in Greece. His research interests and publications are on dynamical systems, social network analysis, social media data analysis, digital humanities and computational social science. Boudourides was recently awarded a Robert K. Merton Visiting Research Fellowship from the Institute for Analytical Sociology (IAS) at Linkping University in Sweden.

    Antonios Gasteratos is a Professor and Head of Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece. He is also the director of the Laboratory of Robotics and Automation (LRA), DUTH and teaches the courses of Robotics, Automatic Control Systems, Electronics, Mechatronics and Computer Vision. He holds a B.Eng. and a Ph.D. from the Department of Electrical and Computer Engineering, DUTH, Greece. During 1999–2000 he was a Visiting Researcher at the Laboratory of Integrated Advanced Robotics (LIRALab), DIST, University of Genoa, Italy. He has served as a reviewer for numerous Scientific Journals and International Conferences. He is a Subject Editor at Electronics Letters and an Associate Editor at the International Journal of Optomechatronics and he has organized/co-organized several international conferences. His research interests are mainly in mechatronics and in robot vision. He has published more than 220 papers in books, journals and conferences. He is a senior member of the IEEE. More details about him are available at http://robotics.pme.duth.gr/antonis.

    View full text