3DNN: 3D Nearest Neighbor

Satkin, Scott; Rashid, Maheen; Lin, Jason; Hebert, Martial

doi:10.1007/s11263-014-0734-4

3DNN: 3D Nearest Neighbor

Data-Driven Geometric Scene Understanding Using 3D Models

Published: 22 July 2014

Volume 111, pages 69–97, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Scott Satkin¹,
Maheen Rashid²,
Jason Lin³ &
…
Martial Hebert²

1466 Accesses
9 Altmetric
Explore all metrics

Abstract

In this paper, we describe a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was taken, and computing a similarity measure between each 3D model and an input image. We demonstrate this data-driven approach in the context of geometry estimation and show the ability to find the identities, poses and styles of objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-before-seen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation, as well as two novel applications: affordance estimation and photorealistic object insertion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

Training was performed using 10-fold cross validation on a subset of the SUN Database (Xiao et al. 2010), for which there exist LabelMe annotations (Torralba et al. 2010).
GIST: $4\times 4$ blocks, 8 orientations (code from Oliva and Torralba (2006)). HoG: $20 \times 20$ blocks, 8 orientations (code from Vondrick et al. (2013)).
Note In Gupta et al. (2011), their approach outperformed the appearance baseline. This is due to the selection bias in creating the dataset for that paper, such that only images with accurate autocalibration estimates from Hedau et al. (2009) were used. However, for these experiments an unbiased sample of images were selected.
Note The object insertion renderings included in this section were created by Kevin Karsch at the University of Illinois at Urbana Champaign during a collaboration with the author.
Fig. 30
Examples of synthetically inserted objects. Note the accurate occlusion and depth ordering of the dresser, and the realistic reflections, a input image, b synthetically inserted objects
Full size image

References

Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(5), 898–916.
Baatz, G., Saurer, O., Köser, K., & Pollefeys, M. (2012). Large scale visual geo-localization of images in mountainous terrain. Proceedings of the European Conference on Computer Vision (ECCV).
Baboud, L., Cadik, M., Eisemann, E., & Seidel, H.-P. (2011). Automatic photo-to-terrain alignment for the annotation of mountain pictures. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Bao, S. Y., Sun, M., & Savarese, S. (2010). Toward coherent object detection and scene layout understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Brooks, R. (1981). Symbolic reasoning among 3D models and 2D images. Artificial Intelligence, Vol. 17.
Choi, W., Pantofaru, C., & Savarese, S. (2013). Understanding indoor scenes using 3D geometric phrases. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Colburn, A., Agarwala, A., Hertzmann, A., Curless, B., & Cohen, M. F. (2013). Image-based remodeling. IEEE Transactions on Visualization and Computer Graphics, 19(1), 56–66.
Article Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Debevec, P. (1998). Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. Proceedings of the 25th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’98.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Fisher, M., & Hanrahan, P. (2010). Context-based search for 3D models. ACM Transactions on Graphics, 29, 182:1–182:10.
Article Google Scholar
Fisher, M., Savva, M., & Hanrahan, P. (2011). Characterizing structural relationships in scenes using graph Kernels. ACM Transactions on Graphics, 30, 34:1–34:12.
Article Google Scholar
Fouhey, D., Gupta, A., & Hebert, M. (2013). Data-driven 3d primatives. Submission to the Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Fouhey, D.F., Delaitre, V., Gupta, A., Efros, A. A., Laptev, I., & Sivic, J. (2012). People watching: Human actions as a cue for single-view geometry. Proceedings of the European Conference on Computer Vision (ECCV).
Girshick, R.B., Felzenszwalb, P.F., & McAllester, D. (2012). Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/rbg/latent-release5/.
Google Inc. (2000). Google SketchUp. http://sketchup.google.com/.
Grabner, H., Gall, J., & Gool, L.J.V. (2011). What makes a chair a chair? Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1529–1536.
Grimson, W., Lozano-Pérez, T., & Huttenlocher, D. (1990). Object recognition by computer. Cambridge: MIT Press.
Google Scholar
Grosse, R., Johnson, M. K., Adelson, E. H., & Freeman, W. T. (2009). Ground truth dataset and baseline evaluations for intrinsic image algorithms. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2335–2342.
Guo, R., Dai, Q., & Hoiem, D. (2011). Single-image shadow detection and removal using paired regions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Gupta, A., Efros, A. A., & Hebert, M. (2010). Blocks world revisited: Image understanding using qualitative geometry and mechanics. Proceedings of the European Conference on Computer Vision (ECCV).
Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007). doi:10.1145/1275808.1276382.
Hays, J., & Efros, A. A. (2008). im2gps: estimating geographic information from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. Proceedings of the European Conference on Computer Vision (ECCV).
Hedau, V., Hoiem, D., & Forsyth, D. (2012). Recovering free space of indoor scenes from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector learning for ordinal regression. Proceedings of the International Conference on Artificial Neural Networks (ANN), 1, 97–102.
Article Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision (IJCV), 75(1), 151–172.
Article Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Closing the loop on scene interpretation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
IKEA (2013). IKEA Catalog.
Karsch, K., Hedau, V., Forsyth, D., & Hoiem, D. (2011). Rendering synthetic objects into legacy photographs. Proceedings of the 2011 SIGGRAPH Asia Conference. ACM.
Karsch, K., Sunkavalli, K., Hadap, S., Carr, N., Jin, H., Fonte, R., et al. (2014). Automatic scene inference for 3D object compositing. ACM Transactions on Graphics (to appear).
Lai, K., & Fox, D. (2009). 3D laser scan classification using web data and domain adaptation. Proceedings of Robotics: Science and Systems, Seattle, USA.
Lalonde, J.-F., Hoiem, D., Efros, A. A., Rother, C., Winn, J., & Criminisi, A. (2007). Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007), 26(3), 3.
Article Google Scholar
Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. Proceedings of the Conference on Neural Information Processing Systems (NIPS).
Lee, D., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Levin, A., Rav-Acha, A., & Lischinski, D. (2008). Spectral matting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 30(10), 1699–1712.
Article Google Scholar
Li, L.-J., Su, H., Xing, E. P., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification & semantic feature sparsification. Neural Information Processing Systems (NIPS), 2(3), 5.
Google Scholar
Lim, J. J., Pirsiavash, H., & Torralba, A. (2013). Parsing ikea objects: Fine pose estimation. Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Liu, C., Yuen, J., Torralba, A., Sivic, J., & Freeman, W. T. (2008). SIFT flow: Dense correspondence across different scenes. Proceedings of the European Conference on Computer Vision (ECCV).
Liu, M.-Y., Tuzel, O., Veeraraghavan, A., & Chellappa, R. (2010). Fast directional chamfer matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Longuet-Higgins, H. C. (1981). A computer algorithm for reconstructing a scene from two projections. Nature, 293, 133–135.
Lowe, D. (1987). Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence, 31(3), 355–395.
Article Google Scholar
Microsoft Corporation (2010). Kinect for Xbox 360.
Munoz, D., Bagnell, J. A., & Hebert, M. (2010). Stacked hierarchical labeling. Proceedings of the European Conference on Computer Vision (ECCV).
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV), 42, 145–175.
Article MATH Google Scholar
Oliva, A., & Torralba, A. (2006). Building the gist of a scene: the role of global image features in recognition. Progress in Brain Research, 42, 145–175.
Google Scholar
Pero, L. D., Bowdish, J., Fried, D., Kermgard, B., Hartley, E. & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pero, L. D., Bowdish, J., Hartley, E., Kermgard, B., & Barnard, K. (2013). Understanding bayesian rooms using composite 3D object models. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pero, L. D., Guan, J., Brau, E., Schlecht, J., & Barnard, K. (2011). Sampling bedrooms. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Polantis (2010). Polantis 3D Catalog. http://www.polantis.com/ikea.
Ramalingam, S., Bouaziz, S., Sturm, P., & Brand, M. (2010). SKYLINE2GPS: Localization in Urban Canyons using Omni-Skylines. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Satkin, S. (2013). Data-Driven Geometric Scene Understanding. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, August.
Satkin, S., & Hebert, M. (2013). 3DNN: Viewpoint invariant 3D geometry matching for scene understanding. Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Satkin, S., Lin, J., & Hebert, M. (2012). CMU 3D-Annotated Scene Database. http://cmu.satkin.com/bmvc2012/.
Satkin, S., Lin, J., & Hebert, M. (2012). Data-driven scene understanding from 3D models. Proceedings of the 23rd British Machine Vision Conference (BMVC).
Saxena, A., Sun, M., & Ng, A. Y. (2009). Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(5), 824–840.
Article Google Scholar
Schwing, A. G., & Urtasun, R. (2012). Efficient Exact Inference for 3D Indoor Scene Understanding. Proceedings of the European Conference on Computer Vision (ECCV).
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., & Guo, B. (2012). An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics, 31(6), 136:1–136:11.
Article Google Scholar
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision (ECCV).
Sing, G., & Košecká, J. (2013). Nonparametric scene parsing with adaptive feature relevance and semantic context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. Proceedings of the European Conference on Computer Vision (ECCV).
Torralba, A., Fergus, R., & Freeman, W. T. (Nov. 2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 30(11), 1958–1970.
Torralba, A., Russell, B. C., & Yuen, J. (2010). Labelme: Online image annotation and applications. Proceedings of the IEEE, 98(8), 1467–1484.
Article Google Scholar
Trimble Inc. (2012). Trimble 3D Warehouse. http://sketchup.google.com/3dwarehouse.
Vondrick, C., Khosla, A., & Malisiewicz, T., & Torralba, A. (2013). Inverting and visualizing features for object detection. MIT Technical Report.
Wang, H., Gould, S., & Koller, D. (2010). Discriminative learning with latent variables for cluttered indoor scene understanding. Proceedings of the European Conference on Computer Vision (ECCV).
Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Yu, S., Zhang, H., & Malik, J. (2008). Inferring spatial layout from a single image via depth-ordered grouping. Proceedings of the IEEE Workshop on Perceptual Organization in Computer Vision (CVPR-POCV).
Zhao, Y., & Zhu, S.-C. (2013). Scene parsing by integrating function, geometry and appearance models. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Acknowledgments

This work was supported in part by ONR MURI N000141010934.

Author information

Authors and Affiliations

Google Inc., Mountain View, CA, USA
Scott Satkin
Carnegie Mellon University, Pittsburgh, PA, USA
Maheen Rashid & Martial Hebert
Microsoft Corp., Redmond, WA, USA
Jason Lin

Authors

Scott Satkin
View author publications
You can also search for this author inPubMed Google Scholar
Maheen Rashid
View author publications
You can also search for this author inPubMed Google Scholar
Jason Lin
View author publications
You can also search for this author inPubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Scott Satkin.

Additional information

Communicated by Cordelia Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satkin, S., Rashid, M., Lin, J. et al. 3DNN: 3D Nearest Neighbor. Int J Comput Vis 111, 69–97 (2015). https://doi.org/10.1007/s11263-014-0734-4

Download citation

Received: 11 November 2013
Accepted: 23 May 2014
Published: 22 July 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11263-014-0734-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3DNN: 3D Nearest Neighbor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

From Images to Depths and Back

SHIC: Shape-Image Correspondences with No Keypoint Supervision

FoundPose: Unseen Object Pose Estimation with Foundation Features

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

3DNN: 3D Nearest Neighbor

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

From Images to Depths and Back

SHIC: Shape-Image Correspondences with No Keypoint Supervision

FoundPose: Unseen Object Pose Estimation with Foundation Features

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now