Elsevier

Computers & Graphics

Volume 84, November 2019, Pages 199-211
Computers & Graphics

Special Section on SVR 2019
Geometrical and statistical incremental semantic modeling on mobile devices

https://doi.org/10.1016/j.cag.2019.09.003Get rights and content

Highlights

  • A technique that performs semantic modeling and tracking of primitives on sparse point clouds on desktop and mobile devices.

  • The method allows the modeling and tracking of planes, spheres, and cylinders.

  • A dataset with sparse point clouds of primitives.

  • A guideline to evaluate computer vision techniques on mobile devices.

Abstract

Improvements on mobile devices allowed tracking applications to be executed on such platforms. However, there still remain several challenges in the field of mobile tracking, such as the extraction of high-level semantic information from point clouds. This task is more challenging when using monocular visual SLAM systems that output noisy sparse data. In this paper, we propose a primitive modeling method using both geometric and statistical analyses for sparse point clouds that can be executed on mobile devices. The main idea is to use the incremental mapping process of SLAM systems for analyzing the geometric relationship between the point cloud and the estimated shapes over time and selecting only reliably-modeled shapes. Besides that, a statistical evaluation that assesses if the modeling is random is incorporated to filter wrongly-detected primitives in unstable estimations. Our evaluation indicates that the proposed method was able to improve both precision and consistency of correct detection when compared with existing techniques. The mobile version execution is 8.5 to 9.9 times slower in comparison with the desktop implementation. However, it uses up to 30.5% of CPU load, which allows it to run on a separate thread, in parallel with the visual SLAM technique. Additional evaluations show that CPU load, energy consumption and RAM memory usage were not a concern when running our method on mobile devices.

Introduction

Several applications on computer vision and augmented reality require camera pose tracking. However, determining the device position in relation to the real environment can demand a lot of computational power and memory depending on the approach and the required information. Along with all the improvement on processing power and memory on mobile devices itself, several tracking techniques that are capable to run on such devices were released in the last couple of years. There are examples in the academy and in the industry. The most distinguished ones are ARCore1 and ARKit,2 by Google and Apple, respectively.

Although several techniques have been released, there are still many remaining research issues in the field of tracking. One is to extract and track high-level semantic information from the environment. Different types of semantics can be collected, from geometric primitives of objects to the model of a piece of furniture. This process is referred to as semantic modeling. Shape parameters of geometric primitives and the relationship among them are valuable knowledge to be estimated especially in man-made environments such as a house and a factory. This data can also be gathered in different ways: from the input image, the scene map or a combination of both.

There are several benefits of detecting primitives from the scene map. For instance, objects are usually over-represented when defined using a point cloud because it is not necessary to have so many points to describe them. Therefore, an implicit representation can replace redundant points, which is particularly helpful when targeting devices with memory restrictions, such as robots or unmanned aerial vehicles (UAV). Additionally, a tracking system can use these primitives to denoise the reconstructed map or constrain its optimization [1], which can reduce tracking errors. Furthermore, it can be used to provide haptic feedback on augmented reality applications [2].

Mobile devices are reducing the gap to desktop computers in terms of processing power and memory space [3]. This improvement allowed the development of more complex algorithms for mobile devices, including tracking techniques, which are one of the foundations of augmented reality. This can be linked to the improvement and popularization of augmented reality solutions for such devices.

Some of these recent advancements in tracking techniques involve removing the necessity of any external marker, improving the execution time to achieve real-time performance and having a more stable tracking that is not harmed by jitter or drift. For instance, the improvement of the device’s sensors allowed the development of visual-inertial trackers. However, there is not much progress regarding the extraction of different kinds of semantics from the 3D map of the scene.

Considering all the benefits mentioned above, the main goal of this study is to model and track primitives aiming to obtain semantics from sparse point clouds. In order to do that, we present Geometric and Statistical Incremental Semantic Tracking, or simply GS-IST, which is a method for incrementally modeling and tracking planes, spheres, and cylinders on sparse point clouds. In summary, this method uses the generating process of point clouds on SLAM effectively and relies on geometric and statistical analyses to filter unreliable shapes. Additionally, this method was ported and evaluated on mobile devices, showing that it was feasible to extract and track basic primitives on such platforms. This paper is an extended version of the work published in Roberto et al. 2018 [4] and its main contributions are summarized as follows:

  • A technique that uses geometric and statistical evaluation to incrementally perform semantic modeling and tracking of primitives on sparse point clouds (Section 3);

  • Evaluations of the proposed method in comparison with existing techniques, showing that it improves semantic modeling precision. This evaluation also includes a dataset with sparse point clouds of primitives and a metric precision evaluation that was not available in the preliminary work (Section 4);

  • The port and evaluation of the proposed approach to mobile devices and a guideline to evaluate computer vision techniques on such platform, which is also an enhancement from our previous paper (Section 5).

Section snippets

Related works

Automatic reconstruction of 3D object shapes is useful for several applications, such as blueprint generation for architecture. Although useful for 3D measurement and visualization, there are some aspects to be improved. For instance, the scene is usually represented by using a point cloud or a mesh computed from it. The latter simply consists of connected neighbor points with little information about the semantic structure.

Several methods have been proposed in the literature to determine

Geometric and statistical incremental semantic tracking

Previous evaluations indicate that Efficient RANSAC achieves good results when detecting primitives in a sparse point cloud [31]. However, it still requires improvements in consistency and precision for use in various applications. Therefore, it is possible to use the primitive estimation from Efficient RANSAC and the generating process of point cloud from visual SLAM systems to perform an incremental semantic modeling. This approach can improve both the precision and stability of the primitive

GS-IST evaluation

GS-IST was implemented in C++ using OpenCV3 and Efficient RANSAC4 as libraries. This evaluation compared GS-IST with Efficient RANSAC regarding precision, recall and F0.5-Score. To perform a fair comparison, we disabled in Efficient RANSAC the primitives we are not targeting in our method (toruses and cones). Since there was no dataset with the generating process of point clouds,

Mobile implementation

In order to evaluate how GS-IST would perform running on mobile devices, the technique was ported to the Android platform. It was also necessary to compile the libraries used in the project to Android: OpenCV and Boost7. Efficient RANSAC was treated as a library as well, but the compilation process was a little bit different due to the fact that it was added to the project in order to facilitate modifications in the source code.

Conclusion

This work presents a new technique that detects and tracks geometric primitives, called GS-IST. This method uses the generating process of sparse point clouds of visual SLAM systems and applies geometrical and statistical analyses to incrementally estimate and track planes, spheres and cylinders. The evaluation indicated that GS-IST improved precision in all test cases, outperforming existing methods in this criteria. The developed approach focuses on precision and for that, it compromises

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors would like to thank Clemens Arth and Dieter Schmalstieg for the valuable discussions. This work was partially funded by JSPS KAKENHI (grant number JP17H01768).

References (38)

  • G. Roth et al.

    Extracting geometric primitives

    CVGIP: Image Underst

    (1993)
  • R. Roberto et al.

    Tracking for mobile devices: a systematic mapping study

    Comput Gr

    (2016)
  • D. Ramadasan et al.

    Dcslam: A dynamically constrained real-time slam

    Proceedings of the 2015 IEEE international conference on image processing (ICIP)

    (2015)
  • A. Hettiarachchi et al.

    Annexing reality: Enabling opportunistic use of everyday objects as tangible proxies in augmented reality

    Proceedings of the 2016 CHI conference on human factors in computing systems. CHI ’16

    (2016)
  • M. Halpern et al.

    Mobile CPU’s rise to power: quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction

    Proceedings of the 2016 IEEE international symposium on high performance computer architecture (HPCA)

    (2016)
  • R. Roberto et al.

    Incremental structural modeling based on geometric and statistical analyses

    Proceedings of the 2018 IEEE Winter conference on applications of computer vision (WACV)

    (2018)
  • D. Holz et al.

    Real-time plane segmentation using RGB-d cameras

    RoboCup 2011: robot Soccer World Cup XV

    (2012)
  • B. Drost et al.

    Local hough transform for 3d primitive detection

    Proceedings of the 2015 international conference on 3D vision

    (2015)
  • D. Lopez-Escogido et al.

    Automatic extraction of geometric models from 3d point cloud datasets

    Proceedings of the 2014 11th international conference on electrical engineering, computing science and automatic control (CCE)

    (2014)
  • T. Nguyen et al.

    Structural modeling from depth images

    IEEE Trans Visual Comput Gr

    (2015)
  • B. Oehler et al.

    Efficient multi-resolution plane segmentation of 3d point clouds

    Proceedings of the 4th International Conference, ICIRA 2011 intelligent robotics and applications, Aachen, Germany, December 6–8, 2011, Proceedings, Part II

    (2011)
  • KimY.M. et al.

    Interactive acquisition of residential floor plans

    Proceedings of the 2012 IEEE international conference on robotics and automation

    (2012)
  • LiuY.J. et al.

    Cylinder detection in large-scale point cloud of pipeline plant

    IEEE Trans Visual Comput Gr

    (2013)
  • QiuR. et al.

    Pipe-run extraction and reconstruction from point clouds

    Proceedings of the computer vision – ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part III

    (2014)
  • LiY. et al.

    Globfit: consistently fitting primitives by discovering global relations

    ACM Trans Graph

    (2011)
  • PangG. et al.

    Automatic 3d industrial point cloud modeling and recognition

    Proceedings of the 2015 14th IAPR international conference on machine vision applications (MVA)

    (2015)
  • PangG. et al.

    Training-based object recognition in cluttered 3d point clouds

    Proceedings of the 2013 international conference on 3D vision - 3DV 2013

    (2013)
  • HuangJ. et al.

    Detecting objects in scene point cloud: a combinational approach

    Proceedings of the 2013 international conference on 3D vision - 3DV 2013

    (2013)
  • A. Stanescu et al.

    Semantic segmentation of geometric primitives in dense 3d point clouds

    Proceedings of the 2018 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct)

    (2018)
  • View full text