Special Section on SVR 2019Geometrical and statistical incremental semantic modeling on mobile devices
Graphical abstract
Introduction
Several applications on computer vision and augmented reality require camera pose tracking. However, determining the device position in relation to the real environment can demand a lot of computational power and memory depending on the approach and the required information. Along with all the improvement on processing power and memory on mobile devices itself, several tracking techniques that are capable to run on such devices were released in the last couple of years. There are examples in the academy and in the industry. The most distinguished ones are ARCore1 and ARKit,2 by Google and Apple, respectively.
Although several techniques have been released, there are still many remaining research issues in the field of tracking. One is to extract and track high-level semantic information from the environment. Different types of semantics can be collected, from geometric primitives of objects to the model of a piece of furniture. This process is referred to as semantic modeling. Shape parameters of geometric primitives and the relationship among them are valuable knowledge to be estimated especially in man-made environments such as a house and a factory. This data can also be gathered in different ways: from the input image, the scene map or a combination of both.
There are several benefits of detecting primitives from the scene map. For instance, objects are usually over-represented when defined using a point cloud because it is not necessary to have so many points to describe them. Therefore, an implicit representation can replace redundant points, which is particularly helpful when targeting devices with memory restrictions, such as robots or unmanned aerial vehicles (UAV). Additionally, a tracking system can use these primitives to denoise the reconstructed map or constrain its optimization [1], which can reduce tracking errors. Furthermore, it can be used to provide haptic feedback on augmented reality applications [2].
Mobile devices are reducing the gap to desktop computers in terms of processing power and memory space [3]. This improvement allowed the development of more complex algorithms for mobile devices, including tracking techniques, which are one of the foundations of augmented reality. This can be linked to the improvement and popularization of augmented reality solutions for such devices.
Some of these recent advancements in tracking techniques involve removing the necessity of any external marker, improving the execution time to achieve real-time performance and having a more stable tracking that is not harmed by jitter or drift. For instance, the improvement of the device’s sensors allowed the development of visual-inertial trackers. However, there is not much progress regarding the extraction of different kinds of semantics from the 3D map of the scene.
Considering all the benefits mentioned above, the main goal of this study is to model and track primitives aiming to obtain semantics from sparse point clouds. In order to do that, we present Geometric and Statistical Incremental Semantic Tracking, or simply GS-IST, which is a method for incrementally modeling and tracking planes, spheres, and cylinders on sparse point clouds. In summary, this method uses the generating process of point clouds on SLAM effectively and relies on geometric and statistical analyses to filter unreliable shapes. Additionally, this method was ported and evaluated on mobile devices, showing that it was feasible to extract and track basic primitives on such platforms. This paper is an extended version of the work published in Roberto et al. 2018 [4] and its main contributions are summarized as follows:
- •
A technique that uses geometric and statistical evaluation to incrementally perform semantic modeling and tracking of primitives on sparse point clouds (Section 3);
- •
Evaluations of the proposed method in comparison with existing techniques, showing that it improves semantic modeling precision. This evaluation also includes a dataset with sparse point clouds of primitives and a metric precision evaluation that was not available in the preliminary work (Section 4);
- •
The port and evaluation of the proposed approach to mobile devices and a guideline to evaluate computer vision techniques on such platform, which is also an enhancement from our previous paper (Section 5).
Section snippets
Related works
Automatic reconstruction of 3D object shapes is useful for several applications, such as blueprint generation for architecture. Although useful for 3D measurement and visualization, there are some aspects to be improved. For instance, the scene is usually represented by using a point cloud or a mesh computed from it. The latter simply consists of connected neighbor points with little information about the semantic structure.
Several methods have been proposed in the literature to determine
Geometric and statistical incremental semantic tracking
Previous evaluations indicate that Efficient RANSAC achieves good results when detecting primitives in a sparse point cloud [31]. However, it still requires improvements in consistency and precision for use in various applications. Therefore, it is possible to use the primitive estimation from Efficient RANSAC and the generating process of point cloud from visual SLAM systems to perform an incremental semantic modeling. This approach can improve both the precision and stability of the primitive
GS-IST evaluation
GS-IST was implemented in C++ using OpenCV3 and Efficient RANSAC4 as libraries. This evaluation compared GS-IST with Efficient RANSAC regarding precision, recall and F0.5-Score. To perform a fair comparison, we disabled in Efficient RANSAC the primitives we are not targeting in our method (toruses and cones). Since there was no dataset with the generating process of point clouds,
Mobile implementation
In order to evaluate how GS-IST would perform running on mobile devices, the technique was ported to the Android platform. It was also necessary to compile the libraries used in the project to Android: OpenCV and Boost7. Efficient RANSAC was treated as a library as well, but the compilation process was a little bit different due to the fact that it was added to the project in order to facilitate modifications in the source code.
Conclusion
This work presents a new technique that detects and tracks geometric primitives, called GS-IST. This method uses the generating process of sparse point clouds of visual SLAM systems and applies geometrical and statistical analyses to incrementally estimate and track planes, spheres and cylinders. The evaluation indicated that GS-IST improved precision in all test cases, outperforming existing methods in this criteria. The developed approach focuses on precision and for that, it compromises
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The authors would like to thank Clemens Arth and Dieter Schmalstieg for the valuable discussions. This work was partially funded by JSPS KAKENHI (grant number JP17H01768).
References (38)
- et al.
Extracting geometric primitives
CVGIP: Image Underst
(1993) - et al.
Tracking for mobile devices: a systematic mapping study
Comput Gr
(2016) - et al.
Dcslam: A dynamically constrained real-time slam
Proceedings of the 2015 IEEE international conference on image processing (ICIP)
(2015) - et al.
Annexing reality: Enabling opportunistic use of everyday objects as tangible proxies in augmented reality
Proceedings of the 2016 CHI conference on human factors in computing systems. CHI ’16
(2016) - et al.
Mobile CPU’s rise to power: quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction
Proceedings of the 2016 IEEE international symposium on high performance computer architecture (HPCA)
(2016) - et al.
Incremental structural modeling based on geometric and statistical analyses
Proceedings of the 2018 IEEE Winter conference on applications of computer vision (WACV)
(2018) - et al.
Real-time plane segmentation using RGB-d cameras
RoboCup 2011: robot Soccer World Cup XV
(2012) - et al.
Local hough transform for 3d primitive detection
Proceedings of the 2015 international conference on 3D vision
(2015) - et al.
Automatic extraction of geometric models from 3d point cloud datasets
Proceedings of the 2014 11th international conference on electrical engineering, computing science and automatic control (CCE)
(2014) - et al.
Structural modeling from depth images
IEEE Trans Visual Comput Gr
(2015)
Efficient multi-resolution plane segmentation of 3d point clouds
Proceedings of the 4th International Conference, ICIRA 2011 intelligent robotics and applications, Aachen, Germany, December 6–8, 2011, Proceedings, Part II
Interactive acquisition of residential floor plans
Proceedings of the 2012 IEEE international conference on robotics and automation
Cylinder detection in large-scale point cloud of pipeline plant
IEEE Trans Visual Comput Gr
Pipe-run extraction and reconstruction from point clouds
Proceedings of the computer vision – ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part III
Globfit: consistently fitting primitives by discovering global relations
ACM Trans Graph
Automatic 3d industrial point cloud modeling and recognition
Proceedings of the 2015 14th IAPR international conference on machine vision applications (MVA)
Training-based object recognition in cluttered 3d point clouds
Proceedings of the 2013 international conference on 3D vision - 3DV 2013
Detecting objects in scene point cloud: a combinational approach
Proceedings of the 2013 international conference on 3D vision - 3DV 2013
Semantic segmentation of geometric primitives in dense 3d point clouds
Proceedings of the 2018 IEEE international symposium on mixed and augmented reality adjunct (ISMAR-Adjunct)
Cited by (4)
Foreword to the Special Section on the Symposium on Virtual and Augmented Reality 2019 (SVR 2019)
2020, Computers and Graphics (Pergamon)Editorial Note
2019, Computers and Graphics (Pergamon)An approach With Iterative and Incremental Development (IID) for mobile applications
2023, Designing and Developing Innovative Mobile Applications