research-article

The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces

Authors:

Morten Lidegaard,

Matthias Nießner,

Stuart Golodetz,

Stephen L. Hicks,

Patrick Pérez,

Philip H.S. TorrAuthors Info & Claims

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Pages 3317 - 3326

https://doi.org/10.1145/2702123.2702222

Published: 18 April 2015 Publication History

Abstract

We present an augmented reality system for large scale 3D reconstruction and recognition in outdoor scenes. Unlike existing prior work, which tries to reconstruct scenes using active depth cameras, we use a purely passive stereo setup, allowing for outdoor use and extended sensing range. Our system not only produces a map of the 3D environment in real-time, it also allows the user to draw (or 'paint') with a laser pointer directly onto the reconstruction to segment the model into objects. Given these examples our system then learns to segment other parts of the 3D map during online acquisition. Unlike typical object recognition systems, ours therefore very much places the user 'in the loop' to segment particular objects of interest, rather than learning from predefined databases. The laser pointer additionally helps to 'clean up' the stereo reconstruction and final 3D map, interactively. Using our system, within minutes, a user can capture a full 3D map, segment it into objects of interest, and refine parts of the model during capture. We provide full technical details of our system to aid replication, as well as quantitative evaluation of system components. We demonstrate the possibility of using our system for helping the visually impaired navigate through spaces. Beyond this use, our system can be used for playing large-scale augmented reality games, shared online to augment streetview data, and used for more detailed car and person navigation.

Supplementary Material

suppl.mov (pn0525-file3.m4v)

Supplemental video

Download
76.03 MB

MP4 File (p3317-miksik.mp4)

Download
163.38 MB

References

[1]

Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., and Szeliski, R. Building Rome in a Day. CACM (2011).

Digital Library

[2]

Chen, D. M., Baatz, G., Köser, K., Tsai, S. S., Vedantham, R., Pylvänäinen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. City-scale landmark identification on mobile devices. In CVPR (2011), 737--744.

Digital Library

[3]

Curless, B., and Levoy, M. A volumetric method for building complex models from range images. In SIGGRAPH (1996), 303--312.

Digital Library

[4]

Davison, A. J., Reid, I. D., Molton, N. D., and Stasse, O. MonoSLAM: Real-Time Single Camera SLAM. PAMI 29, 6 (2007).

Digital Library

[5]

Engel, J., Schöps, T., and Cremers, D. LSD-SLAM: Large-Scale Direct Monocular SLAM. In ECCV (2014).

[6]

Engel, J., Sturm, J., and Cremers, D. Semi-Dense Visual Odometry for a Monocular Camera. In ICCV (2013).

Digital Library

[7]

Fischler, M. A., and Bolles, R. C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. CACM 24, 6 (1981).

Digital Library

[8]

Froissard, B., Konik, H., Trmeau, A., and Dinet, . Contribution of augmented reality solutions to assist visually impaired people in their mobility. In Universal Access in Human-Computer Interaction. Design for All and Accessibility Practice. Springer, 2014, 182--191.

[9]

Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. Reconstructing Building Interiors from Images. In ICCV (2009).

[10]

Geiger, A., Ziegler, J., and Stiller, C. StereoScan: Dense 3d Reconstruction in Real-time. In IVS (2011).

[11]

Habbecke, M., and Kobbelt, L. LaserBrush: A Flexible Device for 3D Reconstruction of Indoor Scenes. In SPM (2008).

Digital Library

[12]

Hane, C., Zach, C., Cohen, A., Angst, R., and Pollefeys, M. Joint 3d scene reconstruction and class segmentation. In CVPR (2013), 97--104.

Digital Library

[13]

Hartley, R., and Zisserman, A. Multiple view geometry in computer vision. Cambridge university press, 2003.

Digital Library

[14]

Hicks, S. L., Wilson, I., van Rheede, J. J., MacLaren, R. E., Downes, S. M., and Kennard, C. Improved mobility with depth-based residual vision glasses. Investigative Ophthalmology & Visual Science 55, 5 (2014).

[15]

Huang, A. S., Bachrach, A., Henry, P., Krainin, M., Maturana, D., Fox, D., and Roy, N. Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera. In ISRR (2011).

[16]

Iannacci, F., Turnquist, E., Avrahami, D., and Patel, S. N. The Haptic Laser: Multi-Sensation Tactile Feedback for At-a-Distance Physical Space Perception and Interaction. In CHI (2011).

Digital Library

[17]

Jr., D. R. O., and Nielsen, T. Laser Pointer Interaction. In CHI (2001).

[18]

Klein, G., and Murray, D. W. Parallel tracking and mapping for small ar workspaces. In ISMAR (2007).

Digital Library

[19]

Ladicky, L., Russell, C., Kohli, P., and Torr, P. H. S. Associative Hierarchical CRFs for Object Class Image Segmentation. In ICCV (2009).

[20]

Mariotti, S. P. Global Data on Visual Impairments 2010. Tech. rep., World Health Organization, 2010.

[21]

Munoz, D., Bagnell, J. A., and Hebert, M. Stacked Hierarchical Labeling. In ECCV (2010).

Digital Library

[22]

Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., and Fitzgibbon, A. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In ISMAR (2011).

Digital Library

[23]

Newcombe, R. A., Lovegrove, S. J., and Davison, A. J. DTAM: Dense Tracking and Mapping in Real-Time. In ICCV (2011).

Digital Library

[24]

Nguyen, T., Grasset, R., Schmalstieg, D., and Reitmayr, G. Interactive syntactic modeling with a single-point laser range finder and camera. In ISMAR (2013).

[25]

Nießner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. Real-time 3d reconstruction at scale using voxel hashing. TOG 32, 6 (2013), 169.

Digital Library

[26]

Qin, Y., Shi, Y., Jiang, H., and Yu, C. Structured Laser Pointer: Enabling Wrist-Rolling Movements as a New Interactive Dimension. In AVI (2010).

Digital Library

[27]

Rosten, E., and Drummond, T. Machine learning for high-speed corner detection. In ECCV (2006).

Digital Library

[28]

Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., and Davison, A. J. SLAM++: SLAM at the Level of Objects. In CVPR (2013).

[29]

Sengupta, S., Greveson, E., Shahrokni, A., and Torr, P. H. S. Urban 3d semantic modelling using stereo vision. In ICRA (2013), 580--585.

[30]

Taneja, A., Ballan, L., and Pollefeys, M. City-scale change detection in cadastral 3d models using images. In CVPR (2013), 113--120.

Digital Library

[31]

Triggs, B., McLauchlan, P. F., Hartley, R. I., and Fitzgibbon, A. W. Bundle adjustment - a modern synthesis. In Workshop on Vision Algorithms (1999).

Digital Library

[32]

Valentin, J., Vineet, V., Cheng, M.-M., Kim, D., Shotton, J., Kohli, P., Niessner, M., Criminisi, A., Izadi, S., and Torr, P. H. S. SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips. ACM TOG (2015).

Digital Library

[33]

Valentin, J. P. C., Sengupta, S., Warrell, J., Shahrokni, A., and Torr, P. H. S. Mesh based semantic modelling for indoor and outdoor scenes. In CVPR (2013), 2067--2074.

Digital Library

[34]

Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., and Mcdonald, J. Robust real-time visual odometry for dense rgb-d mapping. In ICRA (2013).

[35]

Wienss, C., Nikitin, I., Goebbels, G., Troche, K., Göbel, M., Nikitina, L., and Müller, S. Sceptre -- An Infrared Laser Tracking System for Virtual Environments. In VRST (2006).

Digital Library

[36]

Xiong, X., Munoz, D., Bagnell, J. A., and Hebert, M. 3-D Scene Analysis via Sequenced Predictions over Points and Regions. In ICRA (2011).

Cited By

Sörös GJackson JVogt MSalazar MKadlubsky AVinje J(2024)An Open Spatial Computing Platform2024 IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom)10.1109/MetaCom62920.2024.00046(239-246)Online publication date: 12-Aug-2024
https://doi.org/10.1109/MetaCom62920.2024.00046
Borhani ZOrtega F(2024)Enhancing Replicability in XR HCI Studies: A Survey-Based Approach2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct64951.2024.00020(42-46)Online publication date: 21-Oct-2024
https://doi.org/10.1109/ISMAR-Adjunct64951.2024.00020
Fol CShi NOverney NMurtiyoso ARemondino FGriess V(2024)3D dataset generation using virtual reality for forest biodiversityInternational Journal of Digital Earth10.1080/17538947.2024.242298417:1Online publication date: 3-Nov-2024
https://doi.org/10.1080/17538947.2024.2422984
Show More Cited By

Index Terms

The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Handling occlusions in video-based augmented reality using depth information

Augmented Reality (AR) composes virtual objects with real scenes in a mixed environment where human–computer interaction has more semantic meanings. To seamlessly merge virtual objects with real scenes, correct occlusion handling is a significant ...
Garden: A Mixed Reality Experience Combining Virtual Reality and 3D Reconstruction
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Garden is a Mixed Reality (MR) experience that combines both Virtual Reality (VR) and Augmented Reality (AR), and lets players transform their environment into a virtual garden they can play in. This is done by doing both stereoscopic rendering and 3D ...
Fast depth densification for occlusion-aware augmented reality

Current AR systems only track sparse geometric features but do not compute depth for all pixels. For this reason, most AR effects are pure overlays that can never be occluded by real objects. We present a novel algorithm that propagates sparse depth to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

April 2015

4290 pages

ISBN:9781450331456

DOI:10.1145/2702123

General Chairs:
Bo Begole
Huawei, USA
,
Jinwoo Kim
Yonsei University, Korea
,
Program Chairs:
Kori Inkpen
Microsoft Research, USA
,
Woontack Woo
KAIST, Korea

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '15

Sponsor:

SIGCHI

CHI '15: CHI Conference on Human Factors in Computing Systems

April 18 - 23, 2015

Seoul, Republic of Korea

Acceptance Rates

CHI '15 Paper Acceptance Rate 486 of 2,120 submissions, 23%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
820
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sörös GJackson JVogt MSalazar MKadlubsky AVinje J(2024)An Open Spatial Computing Platform2024 IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom)10.1109/MetaCom62920.2024.00046(239-246)Online publication date: 12-Aug-2024
https://doi.org/10.1109/MetaCom62920.2024.00046
Borhani ZOrtega F(2024)Enhancing Replicability in XR HCI Studies: A Survey-Based Approach2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)10.1109/ISMAR-Adjunct64951.2024.00020(42-46)Online publication date: 21-Oct-2024
https://doi.org/10.1109/ISMAR-Adjunct64951.2024.00020
Fol CShi NOverney NMurtiyoso ARemondino FGriess V(2024)3D dataset generation using virtual reality for forest biodiversityInternational Journal of Digital Earth10.1080/17538947.2024.242298417:1Online publication date: 3-Nov-2024
https://doi.org/10.1080/17538947.2024.2422984
Zhi SSucar EMouton AHaughton ILaidlow TDavison A(2023)iLabel: Revealing Objects in Neural FieldsIEEE Robotics and Automation Letters10.1109/LRA.2022.32314988:2(832-839)Online publication date: Feb-2023
https://doi.org/10.1109/LRA.2022.3231498
Liu RZhang JPeng KZheng JCao KChen YYang KStiefelhagen R(2023)Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00200(1849-1859)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00200
Sharmila BNedumaran D(2023)Background Features-Based Novel Visual Ego-Motion EstimationComputer Vision and Machine Intelligence Paradigms for SDGs10.1007/978-981-19-7169-3_16(175-189)Online publication date: 1-Jan-2023
https://doi.org/10.1007/978-981-19-7169-3_16
Jones MVon Feldt MAndrus N(2022)Outside Where? A Survey of Climates and Built Environments in Studies of HCI outdoorsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3507656(1-15)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3507656
Zhang JYang KConstantinescu APeng KMuller KStiefelhagen R(2022)Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation AssistanceIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.316114123:10(19173-19186)Online publication date: Oct-2022
https://doi.org/10.1109/TITS.2022.3161141
Tian BLuo LZhao HZhou G(2022)VIBUS: Data-efficient 3D scene parsing with VIewpoint Bottleneck and Uncertainty-Spectrum modelingISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2022.10.013194(302-318)Online publication date: Dec-2022
https://doi.org/10.1016/j.isprsjprs.2022.10.013
Będkowski JBędkowski J(2022)LiDAR MetricsLarge-Scale Simultaneous Localization and Mapping10.1007/978-981-19-1972-5_8(171-229)Online publication date: 14-Jun-2022
https://doi.org/10.1007/978-981-19-1972-5_8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents