skip to main content
research-article

Tanks and temples: benchmarking large-scale scene reconstruction

Published: 20 July 2017 Publication History

Abstract

We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.

Supplementary Material

MP4 File (papers-0083.mp4)

References

[1]
Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016).
[2]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2011. Building Rome in a day. Communications of the ACM 54, 10 (2011).
[3]
Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2010. Bundle adjustment in the large. In ECCV.
[4]
Matthew Berger, Joshua A. Levine, Luis Gustavo Nonato, Gabriel Taubin, and Cláudio T. Silva. 2013. A benchmark for surface reconstruction. ACM Transactions on Graphics 32, 2 (2013).
[5]
Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. International Journal of Robotics Research 35, 10 (2016).
[6]
Sungjoon Choi, Qian-Yi Zhou, and Vladlen Koltun. 2015. Robust reconstruction of indoor scenes. In CVPR.
[7]
Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2017. Direct sparse odometry. Pattern Analysis and Machine Intelligence 39 (2017).
[8]
Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV.
[9]
Jan-Michael Frahm, Marc Pollefeys, Svetlana Lazebnik, David Gallup, Brian Clipp, Rahul Raguram, Changchang Wu, Christopher Zach, and Tim Johnson. 2010. Fast robust large-scale mapping from video and Internet photo collections. 65, 6 (2010).
[10]
Simon Fuhrmann, Fabian Langguth, Nils Moehrle, Michael Waechter, and Michael Goesele. 2015. MVE - An image-based reconstruction environment. Computers & Graphics 53 (2015).
[11]
Yasutaka Furukawa. 2011. CMVS and PMVS2. http://www.di.ens.fr/cmvs. (2011).
[12]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2009. Reconstructing building interiors from images. In ICCV.
[13]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards Internet-scale multi-view stereo. In CVPR.
[14]
Yasutaka Furukawa and Carlos Hernández. 2015. Multi-view stereo: A tutorial. Foundations and Trends in Computer Graphics and Vision 9, 1--2 (2015).
[15]
Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multiview stereopsis. Pattern Analysis and Machine Intelligence 32, 8 (2010).
[16]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013).
[17]
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In ICCV.
[18]
Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In ICRA.
[19]
Richard Hartley and Andrew Zisserman. 2000. Multiple view geometry in computer vision. Cambridge University Press.
[20]
Jared Heinly, Johannes L. Schönberger, Enrique Dunn, and Jan-Michael Frahm. 2015. Reconstructing the world* in six days. In CVPR.
[21]
Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In ICCV.
[22]
Wenzel Jakob. 2010. Mitsuba renderer. http://www.mitsuba-renderer.org. (2010).
[23]
Michal Jancosek and Tomas Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In CVPR.
[24]
Kalin Kolev, Petri Tanskanen, Pablo Speciale, and Marc Pollefeys. 2014. Turning mobile phones into 3D scanners. In CVPR.
[25]
Fabian Langguth, Kalyan Sunkavalli, Sunil Hadap, and Michael Goesele. 2016. Shading-aware multi-view stereo. In ECCV.
[26]
Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.
[27]
Andrew Mastin, Jeremy Kepner, and John Fisher. 2009. Automatic registration of LIDAR and optical images of urban scenes. In CVPR.
[28]
Paul Merrell, Philippos Mordohai, Jan-Michael Frahm, and Marc Pollefeys. 2007. Evaluation of large scale scene reconstruction. In ICCV Workshops.
[29]
Pierre Moulon, Pascal Monasse, Renaud Marlet, and others. 2016. OpenMVG: An open multiple view geometry library. https://github.com/openMVG/openMVG. (2016).
[30]
Richard A. Newcombe, Steven Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense tracking and mapping in real-time. In ICCV.
[31]
Marc Pollefeys, David Nistér, Jan-Michael Frahm, Amir Akbarzadeh, Philippos Mordohai, Brian Clipp, Chris Engels, David Gallup, Seon Joo Kim, Paul Merrell, C. Salmi, Sudipta N. Sinha, B. Talton, Liang Wang, Qingxiong Yang, Henrik Stewénius, Ruigang Yang, Greg Welch, and Herman Towles. 2008. Detailed real-time urban 3D reconstruction from video. International Journal of Computer Vision 78, 2--3 (2008).
[32]
Johannes L. Schönberger. 2016. COLMAP. https://colmap.github.io. (2016).
[33]
Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In CVPR.
[34]
Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In ECCV.
[35]
Thomas Schöps, Torsten Sattler, Christian Häne, and Marc Pollefeys. 2015. 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In 3DV.
[36]
Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR.
[37]
Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
[38]
Qi Shan, Riley Adams, Brian Curless, Yasutaka Furukawa, and Steven M. Seitz. 2013. The visual Turing test for scene reconstruction. In 3DV.
[39]
Noah Snavely. 2010. Bundler: Structure from motion (SfM) for unordered image collections. https://github.com/snavely/bundler_sfm. (2010).
[40]
Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 2 (2008).
[41]
Christoph Strecha, Wolfgang von Hansen, Luc J. Van Gool, Pascal Fua, and Ulrich Thoennessen. 2008. On benchmarking camera calibration and multi-view stereo for high resolution imagery. In CVPR.
[42]
Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In IROS.
[43]
Chris Sweeney. 2016. Theia multiview geometry library. http://theia-sfm.org. (2016).
[44]
Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, and Marc Pollefeys. 2013. Live metric 3D reconstruction on mobile phones. In ICCV.
[45]
Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012).
[46]
Bill Triggs, Philip Mclauchlan, Richard Hartley, and Andrew Fitzgibbon. 2000. Bundle adjustment - a modern synthesis. In Vision Algorithms: Theory and Practice.
[47]
Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. Pattern Analysis and Machine Intelligence 13, 4 (1991).
[48]
George Vogiatzis and Carlos Hernández. 2011. Video-based, real-time multi-view stereo. Image and Vision Computing 29, 7 (2011).
[49]
Hoang-Hiep Vu, Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2012. High accuracy and visibility-consistent dense multiview stereo. Pattern Analysis and Machine Intelligence 34, 5 (2012).
[50]
Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics 36, 1 (2017).
[51]
Andreas Wendel, Michael Maurer, Gottfried Graber, Thomas Pock, and Horst Bischof. 2012. Dense reconstruction on-the-fly. In CVPR.
[52]
Changchang Wu. 2011. VisualSFM: A visual structure from motion system. http://ccwu.me/vsfm. (2011).
[53]
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In 3DV.
[54]
Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In CVPR.
[55]
Jianxiong Xiao and Yasutaka Furukawa. 2014. Reconstructing the world's museums. International Journal of Computer Vision 110, 3 (2014).
[56]
Qian-Yi Zhou and Vladlen Koltun. 2013. Dense scene reconstruction with points of interest. ACM Transactions on Graphics 32, 4 (2013).

Cited By

View all
  • (2025)Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural NetworkIEEE Transactions on Image Processing10.1109/TIP.2024.351235234(154-169)Online publication date: 1-Jan-2025
  • (2025)Fast 3D Gaussian Splatting Rendering via Easily Integrable ImprovementsIEEE Signal Processing Letters10.1109/LSP.2024.352137932(381-385)Online publication date: 2025
  • (2025)DISORF: A Distributed Online 3D Reconstruction Framework for Mobile RobotsIEEE Robotics and Automation Letters10.1109/LRA.2024.351811110:2(1329-1336)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. Tanks and temples: benchmarking large-scale scene reconstruction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 36, Issue 4
    August 2017
    2155 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/3072959
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2017
    Published in TOG Volume 36, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image-based reconstruction
    2. large-scale scene reconstruction
    3. multi-view stereo
    4. structure from motion

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)591
    • Downloads (Last 6 weeks)85
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural NetworkIEEE Transactions on Image Processing10.1109/TIP.2024.351235234(154-169)Online publication date: 1-Jan-2025
    • (2025)Fast 3D Gaussian Splatting Rendering via Easily Integrable ImprovementsIEEE Signal Processing Letters10.1109/LSP.2024.352137932(381-385)Online publication date: 2025
    • (2025)DISORF: A Distributed Online 3D Reconstruction Framework for Mobile RobotsIEEE Robotics and Automation Letters10.1109/LRA.2024.351811110:2(1329-1336)Online publication date: Feb-2025
    • (2025)DORec: Decomposed Object Reconstruction and Segmentation Utilizing 2D Self-Supervised FeaturesIEEE Robotics and Automation Letters10.1109/LRA.2024.351142510:1(804-811)Online publication date: Jan-2025
    • (2025)Characterization and Analysis of the 3D Gaussian Splatting Rendering PipelineIEEE Computer Architecture Letters10.1109/LCA.2024.350457924:1(13-16)Online publication date: Jan-2025
    • (2025)NeRF View Synthesis: Subjective Quality Assessment and Objective Metrics EvaluationIEEE Access10.1109/ACCESS.2024.352276813(26-41)Online publication date: 2025
    • (2025)NeRF-based Polarimetric Multi-view StereoPattern Recognition10.1016/j.patcog.2024.111036158(111036)Online publication date: Feb-2025
    • (2025)Transformer-guided Feature Pyramid Network for Multi-View StereoNeurocomputing10.1016/j.neucom.2024.129066617(129066)Online publication date: Feb-2025
    • (2025)ICDDPMExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125370259:COnline publication date: 1-Jan-2025
    • (2025)Benchmarking neural radiance fields for autonomous robots: An overviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109685140(109685)Online publication date: Jan-2025
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media