ABSTRACT
We present an augmented reality system for large scale 3D reconstruction and recognition in outdoor scenes. Unlike existing prior work, which tries to reconstruct scenes using active depth cameras, we use a purely passive stereo setup, allowing for outdoor use and extended sensing range. Our system not only produces a map of the 3D environment in real-time, it also allows the user to draw (or 'paint') with a laser pointer directly onto the reconstruction to segment the model into objects. Given these examples our system then learns to segment other parts of the 3D map during online acquisition. Unlike typical object recognition systems, ours therefore very much places the user 'in the loop' to segment particular objects of interest, rather than learning from predefined databases. The laser pointer additionally helps to 'clean up' the stereo reconstruction and final 3D map, interactively. Using our system, within minutes, a user can capture a full 3D map, segment it into objects of interest, and refine parts of the model during capture. We provide full technical details of our system to aid replication, as well as quantitative evaluation of system components. We demonstrate the possibility of using our system for helping the visually impaired navigate through spaces. Beyond this use, our system can be used for playing large-scale augmented reality games, shared online to augment streetview data, and used for more detailed car and person navigation.
Supplemental Material
- Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. Building Rome in a Day. CACM (2011). Google ScholarDigital Library
- Chen, D. M., Baatz, G., Köser, K., Tsai, S. S., Vedantham, R., Pylvänäinen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. City-scale landmark identification on mobile devices. In CVPR (2011), 737--744. Google ScholarDigital Library
- Curless, B., and Levoy, M. A volumetric method for building complex models from range images. In SIGGRAPH (1996), 303--312. Google ScholarDigital Library
- Davison, A. J., Reid, I. D., Molton, N. D., and Stasse, O. MonoSLAM: Real-Time Single Camera SLAM. PAMI 29, 6 (2007). Google ScholarDigital Library
- Engel, J., Schöps, T., and Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In ECCV (2014).Google ScholarCross Ref
- Engel, J., Sturm, J., and Cremers, D. Semi-Dense Visual Odometry for a Monocular Camera. In ICCV (2013). Google ScholarDigital Library
- Fischler, M. A., and Bolles, R. C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. CACM 24, 6 (1981). Google ScholarDigital Library
- Froissard, B., Konik, H., Trmeau, A., and Dinet, . Contribution of augmented reality solutions to assist visually impaired people in their mobility. In Universal Access in Human-Computer Interaction. Design for All and Accessibility Practice. Springer, 2014, 182--191.Google Scholar
- Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. Reconstructing Building Interiors from Images. In ICCV (2009).Google ScholarCross Ref
- Geiger, A., Ziegler, J., and Stiller, C. StereoScan: Dense 3d Reconstruction in Real-time. In IVS (2011).Google ScholarCross Ref
- Habbecke, M., and Kobbelt, L. LaserBrush: A Flexible Device for 3D Reconstruction of Indoor Scenes. In SPM (2008). Google ScholarDigital Library
- Hane, C., Zach, C., Cohen, A., Angst, R., and Pollefeys, M. Joint 3d scene reconstruction and class segmentation. In CVPR (2013), 97--104. Google ScholarDigital Library
- Hartley, R., and Zisserman, A. Multiple view geometry in computer vision. Cambridge university press, 2003. Google ScholarDigital Library
- Hicks, S. L., Wilson, I., van Rheede, J. J., MacLaren, R. E., Downes, S. M., and Kennard, C. Improved mobility with depth-based residual vision glasses. Investigative Ophthalmology & Visual Science 55, 5 (2014).Google Scholar
- Huang, A. S., Bachrach, A., Henry, P., Krainin, M., Maturana, D., Fox, D., and Roy, N. Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera. In ISRR (2011).Google Scholar
- Iannacci, F., Turnquist, E., Avrahami, D., and Patel, S. N. The Haptic Laser: Multi-Sensation Tactile Feedback for At-a-Distance Physical Space Perception and Interaction. In CHI (2011). Google ScholarDigital Library
- Jr., D. R. O., and Nielsen, T. Laser Pointer Interaction. In CHI (2001).Google Scholar
- Klein, G., and Murray, D. W. Parallel tracking and mapping for small ar workspaces. In ISMAR (2007). Google ScholarDigital Library
- Ladicky, L., Russell, C., Kohli, P., and Torr, P. H. S. Associative Hierarchical CRFs for Object Class Image Segmentation. In ICCV (2009).Google ScholarCross Ref
- Mariotti, S. P. Global Data on Visual Impairments 2010. Tech. rep., World Health Organization, 2010.Google Scholar
- Munoz, D., Bagnell, J. A., and Hebert, M. Stacked Hierarchical Labeling. In ECCV (2010). Google ScholarDigital Library
- Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., and Fitzgibbon, A. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In ISMAR (2011). Google ScholarDigital Library
- Newcombe, R. A., Lovegrove, S. J., and Davison, A. J. DTAM: Dense Tracking and Mapping in Real-Time. In ICCV (2011). Google ScholarDigital Library
- Nguyen, T., Grasset, R., Schmalstieg, D., and Reitmayr, G. Interactive syntactic modeling with a single-point laser range finder and camera. In ISMAR (2013).Google ScholarCross Ref
- Nießner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. Real-time 3d reconstruction at scale using voxel hashing. TOG 32, 6 (2013), 169. Google ScholarDigital Library
- Qin, Y., Shi, Y., Jiang, H., and Yu, C. Structured Laser Pointer: Enabling Wrist-Rolling Movements as a New Interactive Dimension. In AVI (2010). Google ScholarDigital Library
- Rosten, E., and Drummond, T. Machine learning for high-speed corner detection. In ECCV (2006). Google ScholarDigital Library
- Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., and Davison, A. J. SLAM++: SLAM at the Level of Objects. In CVPR (2013).Google Scholar
- Sengupta, S., Greveson, E., Shahrokni, A., and Torr, P. H. S. Urban 3d semantic modelling using stereo vision. In ICRA (2013), 580--585.Google ScholarCross Ref
- Taneja, A., Ballan, L., and Pollefeys, M. City-scale change detection in cadastral 3d models using images. In CVPR (2013), 113--120. Google ScholarDigital Library
- Triggs, B., McLauchlan, P. F., Hartley, R. I., and Fitzgibbon, A. W. Bundle adjustment - a modern synthesis. In Workshop on Vision Algorithms (1999). Google ScholarDigital Library
- Valentin, J., Vineet, V., Cheng, M.-M., Kim, D., Shotton, J., Kohli, P., Niessner, M., Criminisi, A., Izadi, S., and Torr, P. H. S. SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips. ACM TOG (2015).Google ScholarDigital Library
- Valentin, J. P. C., Sengupta, S., Warrell, J., Shahrokni, A., and Torr, P. H. S. Mesh based semantic modelling for indoor and outdoor scenes. In CVPR (2013), 2067--2074. Google ScholarDigital Library
- Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., and Mcdonald, J. Robust real-time visual odometry for dense rgb-d mapping. In ICRA (2013).Google ScholarCross Ref
- Wienss, C., Nikitin, I., Goebbels, G., Troche, K., Göbel, M., Nikitina, L., and Müller, S. Sceptre -- An Infrared Laser Tracking System for Virtual Environments. In VRST (2006). Google ScholarDigital Library
- Xiong, X., Munoz, D., Bagnell, J. A., and Hebert, M. 3-D Scene Analysis via Sequenced Predictions over Points and Regions. In ICRA (2011).Google ScholarCross Ref
Index Terms
- The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces
Recommendations
Handling occlusions in video-based augmented reality using depth information
Augmented Reality (AR) composes virtual objects with real scenes in a mixed environment where human–computer interaction has more semantic meanings. To seamlessly merge virtual objects with real scenes, correct occlusion handling is a significant ...
Garden: A Mixed Reality Experience Combining Virtual Reality and 3D Reconstruction
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing SystemsGarden is a Mixed Reality (MR) experience that combines both Virtual Reality (VR) and Augmented Reality (AR), and lets players transform their environment into a virtual garden they can play in. This is done by doing both stereoscopic rendering and 3D ...
Fast depth densification for occlusion-aware augmented reality
Current AR systems only track sparse geometric features but do not compute depth for all pixels. For this reason, most AR effects are pure overlays that can never be occluded by real objects. We present a novel algorithm that propagates sparse depth to ...
Comments