ABSTRACT
In this paper, we show how the quality of augmentation in mobile Mixed Reality applications can be improved using a cloud-based image segmentation approach with synthetic training data. Many modern Augmented Reality frameworks are based on visual inertial odometry on mobile devices and therefore have limited access to tracking hardware (e.g., depth sensor). Consequently, tracking still suffers from drift that makes it difficult to utilize in use cases that require a higher precision. To improve tracking quality, we propose a cloud tracking approach that uses machine learning based image segmentation to recognize known objects in a real scene, which allows us to estimate a precise camera pose. Augmented Reality applications that utilize our web service can use the resulting camera pose to correct drift from time to time, while still using local tracking between key frames. Moreover, the device's position in the real world, when starting the application, is usually used as reference coordinate system. Therefore, we simplify the authoring of MR applications significantly due to a well-defined coordinate system, which is context-based and not dependend on the starting position of a user. We present all steps from web-based initialization over the generation of synthetic training data up to usage in production. In addition, we describe the underlying algorithms in detail. Finally, we show a mobile Mixed Reality application, which is based on this novel approach and discuss its advantages.
- 2017. Apple ARKit. (2017). https://developer.apple.com/arkit/.Google Scholar
- 2017. Apple CoreML. (2017). https://developer.apple.com/documentation/coreml.Google Scholar
- 2017. Google ARCore. (2017). https://developers.google.com/ar/.Google Scholar
- 2018. Blender. (2018). https://www.blender.org/.Google Scholar
- 2018. Node.js. (2018). https://nodejs.org/en/.Google Scholar
- 2018. Unity Game Engine. (2018). https://unity3d.com/.Google Scholar
- Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39, 12 (2017), 2481--2495.Google Scholar
- Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, and Theoharis Theoharis. 2017. Looking beyond appearances: Synthetic training data for deep CNNs in re-identification. Computer Vision and Image Understanding (2017).Google Scholar
- Andreas Dietze, Marcel Klomann, Yvonne Jung, Michael Englert, Sebastian Rieger, Achim Rehberger, Silvan Hau, and Paul Grimm. 2017. SMULGRAS: A Platform for Smart Multicodal Graphics Search. In Proceedings Web3D '17. ACM, New York, USA, 17:1--17:9. Google ScholarDigital Library
- Bert M Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle. 1994. Review and analysis of solutions of the three point perspective pose estimation problem. International journal of computer vision 13, 3 (1994), 331--356. Google ScholarDigital Library
- Tadanobu Inoue, Subhajit Chaudhury, Giovanni De Magistris, and Sakyasingha Dasgupta. 2017. Transfer learning from synthetic to real images using variational autoencoders for robotic applications. arXiv preprint arXiv:1709.06762 (2017).Google Scholar
- Alex Kendall and Roberto Cipolla. 2015. Modelling Uncertainty in Deep Learning for Camera Relocalization. CoRR abs/1509.05909 (2015). http://arxiv.org/abs/1509.05909Google Scholar
- Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Convolutional networks for real-time 6-DOF camera relocalization. CoRR abs/1505.07427 (2015). http://arxiv.org/abs/1505.07427Google Scholar
- Marcel Klomann, Michael Englert, Achim Rehberger, Andreas Dietze, Timo Geier, Sebastian Rieger, Paul Grimm, and Yvonne Jung. 2017. NetFlinCS: A Hybrid Cloud-based Framework to Allow Context-based Detection and Surveillance. In Proceedings VSMM '17. IEEE. 8 p.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.Google ScholarCross Ref
- E. Marchand, H. Uchiyama, and F. Spindler. 2016. Pose Estimation for Augmented Reality: A Hands-On Survey. IEEE Transactions on Visualization and Computer Graphics 22, 12 (2016), 2633--2651. Google ScholarDigital Library
- David Nistér, Oleg Naroditsky, and James Bergen. 2004. Visual Odometry. In Proceedings CVPR 2004. IEEE, 652--659.Google Scholar
- Benjamin Planche, Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner, Terrence Chen, Andreas Hutter, Sergey Zakharov, Harald Kosch, and Jan Ernst. 2017. Depthsynth: Real-time realistic synthetic data generation from cad models for 2.5 d recognition. arXiv preprint arXiv:1702.08558 (2017).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Darko Stanimirovic, Nina Damasky, Sabine Webel, Dirk Koriath, Andrea Spillner, and Daniel Kurz. 2014. A Mobile Augmented Reality System to Assist Auto Mechanics. In Intl. Symposium on Mixed and Augmented Reality (ISMAR). IEEE, New York, USA.Google ScholarCross Ref
- Baochen Sun and Kate Saenko. 2014. From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains.. In BMVC, Vol. 1. 3.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. CVPR.Google Scholar
- Sabine Webel, Uli Bockholt, Timo Engelke, Nirit Gavish, Manuel Olbrich, and Carsten Preusche. 2013. An Augmented Reality Training Platform for Assembly and Maintenance Skills. Robot. Auton. Syst. 61, 4 (2013), 398--403. Google ScholarDigital Library
- J. Wu, L. Ma, and X. Hu. 2017. Delving deeper into convolutional neural networks for camera relocalization. In 2017 IEEE International Conference on Robotics and Automation (ICRA). 5644--5651.Google Scholar
Index Terms
- Improving mobile MR applications using a cloud-based image segmentation approach with synthetic training data
Recommendations
Enhancing the AR Experience with Machine Learning Services
Web3D '19: Proceedings of the 24th International Conference on 3D Web TechnologyIn this paper, we present and evaluate a web service that offers cloud-based machine learning services to improve Augmented Reality applications on mobile and web clients with special regards to tracking quality and registration of complex scenes that ...
An integrated stereo-based approach to automatic vehicle guidance
ICCV '95: Proceedings of the Fifth International Conference on Computer VisionProposes a new approach for vision-based longitudinal and lateral vehicle control. The novel feature of this approach is the use of binocular vision. We integrate two modules consisting of a new, domain-specific, efficient binocular stereo algorithm, ...
Augmented Reality Camera Tracking with Homographies
The authors present a computer vision system for robust real-time tracking of natural features for augmented reality. This is based on the computation of a homography or projective transformation between the current image and a previously captured image ...
Comments