Unsupervised semantic clustering and localization for mobile robotics tasks

doi:10.1016/j.robot.2020.103567

Robotics and Autonomous Systems

Volume 131, September 2020, 103567

https://doi.org/10.1016/j.robot.2020.103567 Get rights and content

Abstract

Due to its vast applicability, the semantic interpretation of regions or entities increasingly attracts the attention of scholars within the robotics community. The paper at hand introduces a novel unsupervised technique to semantically identify the position of an autonomous agent in unknown environments. When the robot explores a certain path for the first time, community detection is achieved through graph-based segmentation. This allows the agent to semantically define its surroundings in future traverses even if the environment’s lighting conditions are changed. The proposed semantic clustering technique exploits the Louvain community detection algorithm, which constitutes a novel and efficient method for identifying groups of measurements with consistent similarity. The produced communities are combined with metric information, as provided by the robot’s odometry through a hierarchical agglomerative clustering method. The suggested algorithm is evaluated in indoors and outdoors datasets creating topological maps capable of assisting semantic localization. We demonstrate that the system categorizes the places correctly when the robot revisits an environment despite the possible lighting variation.

Introduction

Contemporary research in robotics endows modern autonomous systems with the ability to semantically recognize and segment regions/entities. This affords them with increased flexibility and the ability to interpret and interact with the environment on a higher level. Apart from places, robots are capable of recognizing objects and classify them in semantic areas. Overall, semantic interpretation is considered an active and fundamental research field, which relies mostly on computer vision methods for place recognition [1].

Localization (operated either on metric or topological maps) stands for a critical ability incorporated into nowadays robots, yet semantics, which can constitute an essential foundation for this capacity, is still underdeveloped. Moreover, the successful communication between humans and robots can be established only through the ability of the second to sense and classify their own surroundings by precisely recalling spatial memories [2]. Metric maps are mainly used for small-scale spaces [3] and they are organized in a geometric manner, while the relative conceptual remains hidden. The reveal of this hidden information can be achieved by reorganizing it on the basis of a topological map [4]. This is directly related with topology, the branch of mathematics studying characteristics invariant in continuous deformations [5], [6]. A scene’s description based on topological maps retains information regarding the semantic region it belongs. These graphs are a fundamental element of a semantic map, enabling abstraction to metric maps [3].

In order to construct a topological map, the similarity between the acquired camera measurements needs to be computed. Zhang et al. [7] identified the mechanisms for quantifying such similarities, as to the degree of abstracted information from the respective images, into the following categories. Image-based approaches rely on pixel differences between consecutive images to determine changes in a scene. Techniques based on local-features detect and try to associate various key-points, so as to measure the similarity between different frames [8], [9]. Furthermore, in histogram-based approaches, images are compared by means of features’ statistics [3]. Lastly, a typical approach for simplifying the information regarding a place is to address the problem with a Bag-of-Words (BoW) representation. The method of BoW describes the input measurements as a quantized set of local features, thus reducing the searching space to gain efficiency [10]. However, a scene contains randomly scattered features, thus a meaningful way to describe it is via histograms of visual words (visual word vectors) [11].

This paper addresses the problem of semantically mapping an environment. With the term semantic, we refer to the signs and the things to which they refer [1]. Thus, within the scope of this work, the term semantic mapping is used to indicate the identification and the recoding of visual signs and symbols that contain meaningful information. Our goal is to produce an unsupervised system to segment the robot’s trajectory into different semantic regions. Then, each time the robot passes through the same area, it gets self-localized within one of the computed semantic divisions by exclusively using visual information. This essentially means that a specific label for each trajectory segment is not necessarily required to achieve semantic localization. On the contrary, our method is able to identify distinct regions that preserve semantic consistency, however, a meaningful name can be assigned to each one of them with minimum effort after the map has being completed. We make use of the BoW image representation as a means of uniformly describing the captured images and shaping semantic clusters based on their similarities. The semantic interpretation of the robot’s path is achieved by applying the Louvain Community Detection Algorithm (LCDA) [12]. Semantically important properties of a particular environment are identified with great tolerance over various lighting conditions. Extending our previous work [13], we improved the development of the places’ semantic representation by introducing an additional hierarchical agglomerative clustering method to incorporate the information of a metric map. Using only LCDA, many snapshots of the same semantic region are redundantly grouped into different clusters due to dissimilarities occurring by observing the same scene from arbitrary orientations. This is a common problem for any appearance-based method that utilizes monocular and unidirectional cameras since the view of the environment changes significantly when different content is observed [8], [14], [15]. By means of geometrical information, such over-segmented areas can be consolidated, thus improving the quality of the map. A its final stage the map contains both semantic and metric information (topological map).

The rest of the paper is structured as follows. In Section 2, we discuss representative related work in the field of semantic mapping construction. Section 3 contains a detailed explanation of our approach. In Section 4, we present the results of our experiments, while in the last section, we draw conclusions and present suggestions for future work.

Section snippets

Related literature

Generally speaking, one may argue that a semantic map is a kind of topological map with purely semantic data about the places that a robot encounters together with objects they contain; those maps have been used for robot detection, mapping, navigation and classification [16], [17], [18], [19]. Such semantic informational systems can be obtained using either supervised or unsupervised methods. In what follows, we list some of the most representative techniques for both approaches.

Approach

In this section, our approach for semantic clustering and localization is detailed. We begin by describing the selected mechanism for representing the input images in a rotation/scale-invariant manner, as well as the algorithm for clustering the feature vectors according to their similarity. The formulated semantic groups are then refined using temporal and metric information with the aim to produce the topological map. Finally, we propose two different approaches for recognizing the exact

Experimental results

In this section, the proposed method is evaluated both in terms of semantic clustering and semantic localization. We applied our method on the following datasets: COLD-Freiburg [36], COLD-Saarbrücken [36], KTH-IDOL2 [43], and New College [44], which are comprised of explicit semantic regions. Two of the above, viz. COLD-Freiburg part A sequence1 cloudy1 and KTH-IDOL2 cloudy1, were used as validation sets to assess the effect of the system’s parameters on the performance. A different set of 5

Conclusions

In this paper a complete architecture for semantic segmentation and localization has been described. The available visual and odomentry data were combined in a clustering pipeline to produce the topological map of a previously unexplored environment. The results that emerged were constructive, concluding that by properly formulating the semantic map of an area, the correct categorization is achieved even under different lighting conditions.

Nowadays, the scientific interest has been fixed on

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is co-financed by Greece and the European Union (European Social Fund — ESF ) through the Operational Program «Human Resources Development, Education and Lifelong Learning» in the context of the project “Strengthening Human Resources Research Potential via Doctorate Research” (MIS-5000432), implemented by the State Scholarships Foundation (IK $ϒ$ ).

References (53)

KostavelisI. et al.
Semantic mapping for mobile robotics tasks: A survey
Robot. Auton. Syst.
(2015)
KostavelisI. et al.
Robot navigation via spatial and temporal coherent semantic maps
Eng. Appl. Artif. Intell.
(2016)
RusuR.B. et al.
Towards 3D point cloud based object maps for household environments
Robot. Auton. Syst.
(2008)
KostavelisI. et al.
Semantic maps from multiple visual cues
Expert Syst. Appl.
(2017)
BayH. et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008)
KenshimovC. et al.
Deep learning features exception for cross-season visual place recognition
Pattern Recognit. Lett.
(2017)
H. Karaoguz, H. Bozma, Reliable topological place detection in bubble space, in: Proceedings of the IEEE International...
LowryS. et al.
Visual place recognition: A survey
IEEE Trans. Robot.
(2015)
Ö. Erkent, I. Bozma, Place representation in topological maps based on bubble space, in: Proceedings of the IEEE...
A. Pronobis, P. Jensfelt, Large-scale semantic mapping and reasoning with heterogeneous modalities, in: Proceedings of...

H. Zhang, B. Li, D. Yang, Keyframe detection for appearance-based visual SLAM, in: Proceedings of the IEEE/RSJ...

F. Fraundorfer, C. Engels, D. Nistér, Topological mapping, localization and navigation using image collections, in:...

H. Korrapati, Y. Mezouar, P. Martinet, Efficient topological mapping with image sequence partitioning, in: Proceedings...

SivicJ. et al.

Video Google: A text retrieval approach to object matching in videos

Fazl-ErsiE. et al.

Histogram of oriented uniform patterns for robust place recognition and categorization

Int. J. Robot. Res.

(2012)

BlondelV.D. et al.

Fast unfolding of communities in large networks

J. Stat. Mech. Theory Exp.

(2008)

BalaskaV. et al.

Graph-based semantic segmentation

LynenS. et al.

Trajectory-based place-recognition for efficient large scale localization

Int. J. Comput. Vis.

(2017)

BampisL. et al.

Fast loop-closure detection using visual-word-vectors from image sequences

Int. J. Robot. Res.

(2018)

A.K. Krishnan, K. Krishna, A visual exploration algorithm using semantic cues that constructs image based hybrid maps,...

RanganathanA. et al.

Semantic Modeling of Places Using Objects

(2007)

C. Nieto-Granda, J.G. Rogers, A.J. Trevor, H.I. Christensen, Semantic map partitioning in indoor environments using...

L. Shi, S. Kodagoda, G. Dissanayake, Application of semi-supervised learning with voronoi graph for place...

M. Demir, H. Isil Bozma, Video summarization via segments summary graphs, in: Proceedings of IEEE International...

P. Uršič, M. Kristan, D. Skočaj, A. Leonardis, Room classification using a hierarchical representation of space, in:...

T. Yeh, T. Darrell, Dynamic visual category learning, in: Proceedings of the IEEE Conference on Computer Vision and...

Cited by (28)

Self-localization based on terrestrial and satellite semantics
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
The autonomous generation of semantic maps aims to combine the strengths of Simultaneous Localization and Mapping (SLAM) techniques and object recognition to recover semantic-spatial knowledge about the environment, which can be used for task planning and exploration of “unknown” areas (Kostavelis et al., 2016; Galindo et al., 2008). Due to its applications in unmanned cars, agricultural robotics and any other application involving human–robot interaction, it is indeed a particularly active and growing research area (Balaska et al., 2020; Papapetros et al., 2020). Mapping and self-localization are fundamental tasks for autonomous vehicles, especially in environments with dynamic objects, variable illumination and challenging atmospheric conditions (Cadena et al., 2016).
Owing to its vast applicability, the semantic interpretation of regions or entities is increasingly attracting the attention of scholars in the robotics community. Recent research in robot vision has equipped, modern autonomous systems with the ability to semantically recognize and segment entities from scenes with the aim to effectively interpret the environment. Extending this notion, the semantic representation of the surroundings is considered to be a fundamental property for robot self-localization, especially in the absence of any georeferencing signal. In this paper, we present a robust algorithm to locate the position of an autonomous agent within a georeferenced map through particle filtering. Specifically, the proposed approach consists of (i) a motion model of metric data from visual odometry, (ii) an observation model of graph-based descriptors with semantic and metric information and (iii) a re-sampling model, based on the stochastic universal sampling. The above components are evaluated under an extensive set of experiments revealing the robustness and accuracy of our final self-localization system.
Modest-vocabulary loop-closure detection with incremental bag of tracked words
2021, Robotics and Autonomous Systems
A key feature in the context of simultaneous localization and mapping is loop-closure detection, a process determining whether the current robot’s environment perception coincides with previous observation. However, in long-term operations, both computational efficiency and memory requirements involved in an autonomous robot operation in uncontrolled environments, are of particular importance. The majority of approaches scale linearly with the environment’s size in terms of storage and query time. The article at hand presents an efficient appearance-based loop-closure detection pipeline, which encodes the traversed trajectory by a low amount of unique visual words generated on-line through feature tracking. The incrementally constructed visual vocabulary is referred to as the “Bag of Tracked Words.” A nearest-neighbor voting scheme is utilized to query the database and assign probabilistic scores to all visited locations. Exploiting the inherent temporal coherency in the loop-closure task, the produced scores are processed through a Bayesian filter to estimate the belief state about the robot’s location on the map. Also, a geometrical verification step ensures consistency between image matches. Management is also applied to the resulting vocabulary to reduce its growth rate and constraint the system’s computational complexity while improving its voting distinctiveness. The proposed approach’s performance is experimentally evaluated on several publicly available and challenging datasets, including hand-held, car-mounted, aerial, and ground trajectories. Results demonstrate the method’s adaptability, which retains high operational frequency in environments of up to 13 km and high recall rates for perfect precision, outperforming other state-of-the-art techniques. The system’s effectiveness is owed to the reduced vocabulary size, which is at least one order of magnitude smaller than other contemporary approaches. An open research-oriented source code has been made publicly available, which is dubbed as “BoTW-LCD.”
Enhancing satellite semantic maps with ground-level imagery
2021, Robotics and Autonomous Systems
Citation Excerpt :
In what follows, we list some of the most representative techniques. The generation of semantic maps was firstly applied for the semantic representation of indoors surroundings [19–26]. Then, the works of Boukas and Gasteratos [27] and Boukas et al. [28] focused on the semantic recording of the robot’s environment during planetary missions.
The paper at hand introduces a novel system for producing an enhanced semantic map that leverages a reconstruction approach of street-view scenes using computer vision and machine learning techniques. Focusing on the recognition and localization of objects/entities, the composed map combines semantic information from publicly available, yet of lower accuracy, satellite images, with more detailed data from ground-level camera measurements. This merging is achieved by utilizing odometry information from a street-moving vehicle and the 3D reconstruction of its recorded view. Then, the 3D semantic segmentation results are georeferenced and superimposed on the semantic map from the satellite images. In such a way, areas that require fine semantic accuracy can be improved, while the rest are left with the segmentation results of the satellite information. Every part of the proposed system is individually evaluated. We additionally test the overall approach on a case-study of georeferencing new labels of traffic signs, which are detected through a specifically designed classification network over a publicly available dataset collected around the city of Berlin.
A Survey of Machine Learning Approaches for Mobile Robot Control
2024, Robotics
A Survey on Technical Challenges of Assistive Robotics for Elder People in Domestic Environments: The ASPiDA Concept
2023, IEEE Transactions on Medical Robotics and Bionics
Semantic Communities from Graph-Inspired Visual Representations of Cityscapes
2023, Automation

View all citing articles on Scopus

Vasiliki Balaska received her diploma from the Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece in 2017. Currently, she is a Ph.D. student in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering in Democritus University of Thrace (DUTH), Greece, working on development of semantic mapping methods and their role in robotics. Her research is co-financed by Greece and the European Union (European Social Fund — ESF) through the Operational Program Human Resources Development, Education and Lifelong Learning in the context of the project Strengthening Human Resources Research Potential via Doctorate Research (MIS-5000432), implemented by the State Scholarships Foundation (IK $ϒ$ ), for three years. More details about her are available at http://robotics.pme.duth.gr/robotics/active/.

Loukas Bampis received his diploma in Electrical and Computer Engineering in 2013 and Ph.D. in Machine Vision and Embedded Systems in 2018 from the Democritus University of Thrace (DUTH), Greece. He is currently a Postdoctoral Fellow in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering, DUTH. His work has been supported through several research projects funded by the European Space Agency, the European Commission and the Greek government. His research interests include real-time Localization and Place Recognition techniques using hardware accelerators and parallel processing. More details about him are available at http://robotics.pme.duth.gr/bampis.

Moses Boudourides is in the Faculty of Northwestern University School of Professional Studies Data Science Program https://sps.northwestern.edu/masters/datascience/faculty.php and Affiliated Faculty at the Science of Networks in Communities (SONIC) at Northwestern University (http://sonic.northwestern.edu/people/affiliated-faculty/moses-boudourides/). Currently, 2019–2020, he is a Visiting Professor of Mathematics at the New York University Abu Dhabi. Previously, until 2017, he was Professor of Mathematics at the University of Patras in Greece. His research interests and publications are on dynamical systems, social network analysis, social media data analysis, digital humanities and computational social science. Boudourides was recently awarded a Robert K. Merton Visiting Research Fellowship from the Institute for Analytical Sociology (IAS) at Linkping University in Sweden.

Antonios Gasteratos is a Professor and Head of Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece. He is also the director of the Laboratory of Robotics and Automation (LRA), DUTH and teaches the courses of Robotics, Automatic Control Systems, Electronics, Mechatronics and Computer Vision. He holds a B.Eng. and a Ph.D. from the Department of Electrical and Computer Engineering, DUTH, Greece. During 1999–2000 he was a Visiting Researcher at the Laboratory of Integrated Advanced Robotics (LIRALab), DIST, University of Genoa, Italy. He has served as a reviewer for numerous Scientific Journals and International Conferences. He is a Subject Editor at Electronics Letters and an Associate Editor at the International Journal of Optomechatronics and he has organized/co-organized several international conferences. His research interests are mainly in mechatronics and in robot vision. He has published more than 220 papers in books, journals and conferences. He is a senior member of the IEEE. More details about him are available at http://robotics.pme.duth.gr/antonis.

View full text

Unsupervised semantic clustering and localization for mobile robotics tasks

Abstract

Introduction

Section snippets

Related literature

Approach

Experimental results

Conclusions

Declaration of Competing Interest

Acknowledgments

Robot. Auton. Syst.

Eng. Appl. Artif. Intell.

Robot. Auton. Syst.

Expert Syst. Appl.

Comput. Vis. Image Underst.

Pattern Recognit. Lett.

Visual place recognition: A survey

IEEE Trans. Robot.

Video Google: A text retrieval approach to object matching in videos

Histogram of oriented uniform patterns for robust place recognition and categorization

Int. J. Robot. Res.

Fast unfolding of communities in large networks

J. Stat. Mech. Theory Exp.

Graph-based semantic segmentation

Trajectory-based place-recognition for efficient large scale localization

Int. J. Comput. Vis.

Fast loop-closure detection using visual-word-vectors from image sequences

Int. J. Robot. Res.

Semantic Modeling of Places Using Objects