Unsupervised semantic clustering and localization for mobile robotics tasks
Introduction
Contemporary research in robotics endows modern autonomous systems with the ability to semantically recognize and segment regions/entities. This affords them with increased flexibility and the ability to interpret and interact with the environment on a higher level. Apart from places, robots are capable of recognizing objects and classify them in semantic areas. Overall, semantic interpretation is considered an active and fundamental research field, which relies mostly on computer vision methods for place recognition [1].
Localization (operated either on metric or topological maps) stands for a critical ability incorporated into nowadays robots, yet semantics, which can constitute an essential foundation for this capacity, is still underdeveloped. Moreover, the successful communication between humans and robots can be established only through the ability of the second to sense and classify their own surroundings by precisely recalling spatial memories [2]. Metric maps are mainly used for small-scale spaces [3] and they are organized in a geometric manner, while the relative conceptual remains hidden. The reveal of this hidden information can be achieved by reorganizing it on the basis of a topological map [4]. This is directly related with topology, the branch of mathematics studying characteristics invariant in continuous deformations [5], [6]. A scene’s description based on topological maps retains information regarding the semantic region it belongs. These graphs are a fundamental element of a semantic map, enabling abstraction to metric maps [3].
In order to construct a topological map, the similarity between the acquired camera measurements needs to be computed. Zhang et al. [7] identified the mechanisms for quantifying such similarities, as to the degree of abstracted information from the respective images, into the following categories. Image-based approaches rely on pixel differences between consecutive images to determine changes in a scene. Techniques based on local-features detect and try to associate various key-points, so as to measure the similarity between different frames [8], [9]. Furthermore, in histogram-based approaches, images are compared by means of features’ statistics [3]. Lastly, a typical approach for simplifying the information regarding a place is to address the problem with a Bag-of-Words (BoW) representation. The method of BoW describes the input measurements as a quantized set of local features, thus reducing the searching space to gain efficiency [10]. However, a scene contains randomly scattered features, thus a meaningful way to describe it is via histograms of visual words (visual word vectors) [11].
This paper addresses the problem of semantically mapping an environment. With the term semantic, we refer to the signs and the things to which they refer [1]. Thus, within the scope of this work, the term semantic mapping is used to indicate the identification and the recoding of visual signs and symbols that contain meaningful information. Our goal is to produce an unsupervised system to segment the robot’s trajectory into different semantic regions. Then, each time the robot passes through the same area, it gets self-localized within one of the computed semantic divisions by exclusively using visual information. This essentially means that a specific label for each trajectory segment is not necessarily required to achieve semantic localization. On the contrary, our method is able to identify distinct regions that preserve semantic consistency, however, a meaningful name can be assigned to each one of them with minimum effort after the map has being completed. We make use of the BoW image representation as a means of uniformly describing the captured images and shaping semantic clusters based on their similarities. The semantic interpretation of the robot’s path is achieved by applying the Louvain Community Detection Algorithm (LCDA) [12]. Semantically important properties of a particular environment are identified with great tolerance over various lighting conditions. Extending our previous work [13], we improved the development of the places’ semantic representation by introducing an additional hierarchical agglomerative clustering method to incorporate the information of a metric map. Using only LCDA, many snapshots of the same semantic region are redundantly grouped into different clusters due to dissimilarities occurring by observing the same scene from arbitrary orientations. This is a common problem for any appearance-based method that utilizes monocular and unidirectional cameras since the view of the environment changes significantly when different content is observed [8], [14], [15]. By means of geometrical information, such over-segmented areas can be consolidated, thus improving the quality of the map. A its final stage the map contains both semantic and metric information (topological map).
The rest of the paper is structured as follows. In Section 2, we discuss representative related work in the field of semantic mapping construction. Section 3 contains a detailed explanation of our approach. In Section 4, we present the results of our experiments, while in the last section, we draw conclusions and present suggestions for future work.
Section snippets
Related literature
Generally speaking, one may argue that a semantic map is a kind of topological map with purely semantic data about the places that a robot encounters together with objects they contain; those maps have been used for robot detection, mapping, navigation and classification [16], [17], [18], [19]. Such semantic informational systems can be obtained using either supervised or unsupervised methods. In what follows, we list some of the most representative techniques for both approaches.
Approach
In this section, our approach for semantic clustering and localization is detailed. We begin by describing the selected mechanism for representing the input images in a rotation/scale-invariant manner, as well as the algorithm for clustering the feature vectors according to their similarity. The formulated semantic groups are then refined using temporal and metric information with the aim to produce the topological map. Finally, we propose two different approaches for recognizing the exact
Experimental results
In this section, the proposed method is evaluated both in terms of semantic clustering and semantic localization. We applied our method on the following datasets: COLD-Freiburg [36], COLD-Saarbrücken [36], KTH-IDOL2 [43], and New College [44], which are comprised of explicit semantic regions. Two of the above, viz. COLD-Freiburg part A sequence1 cloudy1 and KTH-IDOL2 cloudy1, were used as validation sets to assess the effect of the system’s parameters on the performance. A different set of 5
Conclusions
In this paper a complete architecture for semantic segmentation and localization has been described. The available visual and odomentry data were combined in a clustering pipeline to produce the topological map of a previously unexplored environment. The results that emerged were constructive, concluding that by properly formulating the semantic map of an area, the correct categorization is achieved even under different lighting conditions.
Nowadays, the scientific interest has been fixed on
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is co-financed by Greece and the European Union (European Social Fund — ESF ) through the Operational Program «Human Resources Development, Education and Lifelong Learning» in the context of the project “Strengthening Human Resources Research Potential via Doctorate Research” (MIS-5000432), implemented by the State Scholarships Foundation (IK).
Vasiliki Balaska received her diploma from the Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece in 2017. Currently, she is a Ph.D. student in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering in Democritus University of Thrace (DUTH), Greece, working on development of semantic mapping methods and their role in robotics. Her research is co-financed by Greece and the European Union (European
References (53)
- et al.
Semantic mapping for mobile robotics tasks: A survey
Robot. Auton. Syst.
(2015) - et al.
Robot navigation via spatial and temporal coherent semantic maps
Eng. Appl. Artif. Intell.
(2016) - et al.
Towards 3D point cloud based object maps for household environments
Robot. Auton. Syst.
(2008) - et al.
Semantic maps from multiple visual cues
Expert Syst. Appl.
(2017) - et al.
Speeded-up robust features (SURF)
Comput. Vis. Image Underst.
(2008) - et al.
Deep learning features exception for cross-season visual place recognition
Pattern Recognit. Lett.
(2017) - H. Karaoguz, H. Bozma, Reliable topological place detection in bubble space, in: Proceedings of the IEEE International...
- et al.
Visual place recognition: A survey
IEEE Trans. Robot.
(2015) - Ö. Erkent, I. Bozma, Place representation in topological maps based on bubble space, in: Proceedings of the IEEE...
- A. Pronobis, P. Jensfelt, Large-scale semantic mapping and reasoning with heterogeneous modalities, in: Proceedings of...
Video Google: A text retrieval approach to object matching in videos
Histogram of oriented uniform patterns for robust place recognition and categorization
Int. J. Robot. Res.
Fast unfolding of communities in large networks
J. Stat. Mech. Theory Exp.
Graph-based semantic segmentation
Trajectory-based place-recognition for efficient large scale localization
Int. J. Comput. Vis.
Fast loop-closure detection using visual-word-vectors from image sequences
Int. J. Robot. Res.
Semantic Modeling of Places Using Objects
Cited by (28)
Self-localization based on terrestrial and satellite semantics
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :The autonomous generation of semantic maps aims to combine the strengths of Simultaneous Localization and Mapping (SLAM) techniques and object recognition to recover semantic-spatial knowledge about the environment, which can be used for task planning and exploration of “unknown” areas (Kostavelis et al., 2016; Galindo et al., 2008). Due to its applications in unmanned cars, agricultural robotics and any other application involving human–robot interaction, it is indeed a particularly active and growing research area (Balaska et al., 2020; Papapetros et al., 2020). Mapping and self-localization are fundamental tasks for autonomous vehicles, especially in environments with dynamic objects, variable illumination and challenging atmospheric conditions (Cadena et al., 2016).
Modest-vocabulary loop-closure detection with incremental bag of tracked words
2021, Robotics and Autonomous SystemsEnhancing satellite semantic maps with ground-level imagery
2021, Robotics and Autonomous SystemsCitation Excerpt :In what follows, we list some of the most representative techniques. The generation of semantic maps was firstly applied for the semantic representation of indoors surroundings [19–26]. Then, the works of Boukas and Gasteratos [27] and Boukas et al. [28] focused on the semantic recording of the robot’s environment during planetary missions.
A Survey on Technical Challenges of Assistive Robotics for Elder People in Domestic Environments: The ASPiDA Concept
2023, IEEE Transactions on Medical Robotics and Bionics
Vasiliki Balaska received her diploma from the Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece in 2017. Currently, she is a Ph.D. student in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering in Democritus University of Thrace (DUTH), Greece, working on development of semantic mapping methods and their role in robotics. Her research is co-financed by Greece and the European Union (European Social Fund — ESF) through the Operational Program Human Resources Development, Education and Lifelong Learning in the context of the project Strengthening Human Resources Research Potential via Doctorate Research (MIS-5000432), implemented by the State Scholarships Foundation (IK), for three years. More details about her are available at http://robotics.pme.duth.gr/robotics/active/.
Loukas Bampis received his diploma in Electrical and Computer Engineering in 2013 and Ph.D. in Machine Vision and Embedded Systems in 2018 from the Democritus University of Thrace (DUTH), Greece. He is currently a Postdoctoral Fellow in the Laboratory of Robotics and Automation (LRA), Department of Production and Management Engineering, DUTH. His work has been supported through several research projects funded by the European Space Agency, the European Commission and the Greek government. His research interests include real-time Localization and Place Recognition techniques using hardware accelerators and parallel processing. More details about him are available at http://robotics.pme.duth.gr/bampis.
Moses Boudourides is in the Faculty of Northwestern University School of Professional Studies Data Science Program https://sps.northwestern.edu/masters/datascience/faculty.php and Affiliated Faculty at the Science of Networks in Communities (SONIC) at Northwestern University (http://sonic.northwestern.edu/people/affiliated-faculty/moses-boudourides/). Currently, 2019–2020, he is a Visiting Professor of Mathematics at the New York University Abu Dhabi. Previously, until 2017, he was Professor of Mathematics at the University of Patras in Greece. His research interests and publications are on dynamical systems, social network analysis, social media data analysis, digital humanities and computational social science. Boudourides was recently awarded a Robert K. Merton Visiting Research Fellowship from the Institute for Analytical Sociology (IAS) at Linkping University in Sweden.
Antonios Gasteratos is a Professor and Head of Department of Production and Management Engineering, Democritus University of Thrace (DUTH), Greece. He is also the director of the Laboratory of Robotics and Automation (LRA), DUTH and teaches the courses of Robotics, Automatic Control Systems, Electronics, Mechatronics and Computer Vision. He holds a B.Eng. and a Ph.D. from the Department of Electrical and Computer Engineering, DUTH, Greece. During 1999–2000 he was a Visiting Researcher at the Laboratory of Integrated Advanced Robotics (LIRALab), DIST, University of Genoa, Italy. He has served as a reviewer for numerous Scientific Journals and International Conferences. He is a Subject Editor at Electronics Letters and an Associate Editor at the International Journal of Optomechatronics and he has organized/co-organized several international conferences. His research interests are mainly in mechatronics and in robot vision. He has published more than 220 papers in books, journals and conferences. He is a senior member of the IEEE. More details about him are available at http://robotics.pme.duth.gr/antonis.