Abstract
We present confocal stereo, a new method for computing 3D shape by controlling the focus and aperture of a lens. The method is specifically designed for reconstructing scenes with high geometric complexity or fine-scale texture. To achieve this, we introduce the confocal constancy property, which states that as the lens aperture varies, the pixel intensity of a visible in-focus scene point will vary in a scene-independent way, that can be predicted by prior radiometric lens calibration. The only requirement is that incoming radiance within the cone subtended by the largest aperture is nearly constant. First, we develop a detailed lens model that factors out the distortions in high resolution SLR cameras (12MP or more) with large-aperture lenses (e.g., f1.2). This allows us to assemble an A×F aperture-focus image (AFI) for each pixel, that collects the undistorted measurements over all A apertures and F focus settings. In the AFI representation, confocal constancy reduces to color comparisons within regions of the AFI, and leads to focus metrics that can be evaluated separately for each pixel. We propose two such metrics and present initial reconstruction results for complex scenes, as well as for a scene with known ground-truth shape.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adelson, E. H., & Wang, J. Y. A. (1992). Single lens stereo with a plenoptic camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 99–106.
Asada, N., Fujiwara, H., & Matsuyama, T. (1998a). Edge and depth from focus. International Journal of Computer Vision, 26(2), 153–163.
Asada, N., Fujiwara, H., & Matsuyama, T. (1998b). Seeing behind the scene: Analysis of photometric properties of occluding edges by the reversed projection blurring model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(2), 155–167.
Baker, S., & Matthews, I. (2004). Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3), 221–225.
Bertalmio, M., Sapiro, G., Caselles, V., & Ballester, C. (2000). Image inpainting. In Proc. ACM SIGGRAPH (pp. 417–424).
Bhasin, S. S., & Chaudhuri, S. (2001). Depth from defocus in presence of partial self occlusion. Proc. International Conference on Computer Vision, 2, 488–493.
Bouguet, J.-Y. (2004). Camera calibration toolbox for Matlab (Oct. 14, 2004). http://vision.caltech.edu/bouguetj/calib_doc/.
Darrell, T., & Wohn, K. (1988). Pyramid based depth from focus. In Proc. computer vision and pattern recognition (pp. 504–509).
Debevec, P., & Malik, J. (1997). Recovering high dynamic range radiance maps from photographs. In Proc. ACM SIGGRAPH (pp. 369–378).
Farid, H., & Simoncelli, E. P. (1998). Range estimation by optical differentiation. Journal of the Optical Society of America A, 15(7), 1777–1786.
Favaro, P., & Soatto, S. (2002). Learning shape from defocus. Proc. European Conference on Computer Vision, 2, 735–745.
Favaro, P., & Soatto, S. (2003). Seeing beyond occlusions (and other marvels of a finite lens aperture). In. Proc. Computer Vision and Pattern Recognition, 2, 579–586.
Favaro, P., & Soatto, S. (2005). A geometric approach to shape from defocus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3).
Favaro, P., Mennucci, A., & Soatto, S. (2003a). Observing shape from defocused images. International Journal of Computer Vision, 52(1), 25–43.
Favaro, P., Osher, S., Soatto, S., & Vese, L. A. (2003b). 3D shape from anisotropic diffusion. Proc. Computer Vision and Pattern Recognition, 1, 179–186.
Fitzgibbon, A., Wexler, Y., & Zisserman, A. (2005). Image-based rendering using image-based priors. International Journal of Computer Vision, 63(2), 141–151.
Fraser, C. S., & Shortis, M. R. (1992). Variation of distortion within the photographic field. Photogrammetric Engineering and Remote Sensing, 58(6), 851–855.
Green, P., Sun, W., Matusik, W., & Durand, F. (2007). Multi-aperture photography. In Proc. ACM SIGGRAPH.
Grossberg, M. D., & Nayar, S. K. (2004). Modeling the space of camera response functions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10), 1272–1282.
Hasinoff, S. W., & Kutulakos, K. N. (2007). A layer-based restoration framework for variable-aperture photography. In Proc. international conference on computer vision.
Healey, G. E., & Kondepudy, R. (1994). Radiometric CCD camera calibration and noise estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(3), 267–276.
Hertzmann, A., & Seitz, S. M. (2005). Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1254–1264.
Isaksen, A., McMillan, L., & Gortler, S. J. (2000). Dynamically reparameterized light fields. In Proc. ACM SIGGRAPH (pp. 297–306).
Jin, H., & Favaro, P. (2002). A variational approach to shape from defocus. Proc. European Conference on Computer Vision, 2, 18–30.
Kang, S. B., & Weiss, R. S. (2000). Can we calibrate a camera using an image of a flat, textureless Lambertian surface? Proc. European Conference on Computer Vision, 2, 640–653.
Krotkov, E. (1987). Focusing. International Journal of Computer Vision, 1(3), 223–237.
Kubota, A., Takahashi, K., Aizawa, K., & Chen, T. (2004). All-focused light field rendering. In Proc. eurographics symposium on rendering.
Kutulakos, K. N., & Seitz, S. M. (2000). A theory of shape by shape carving. International Journal of Computer Vision, 38(3), 197–216.
Levin, A., Fergus, R., Durand, F., & Freeman, W. T. (2007). Image and depth from a conventional camera with a coded aperture. In Proc. ACM SIGGRAPH.
Levoy, M., & Hanrahan, P. (1996). Light field rendering. In Proc. ACM SIGGRAPH (pp. 31–42).
Levoy, M., Chen, B., Vaish, V., Horowitz, M., McDowall, I., & Bolas, M. T. (2004). Synthetic aperture confocal imaging. In Proc. ACM SIGGRAPH (pp. 825–834).
McGuire, M., Matusik, W., Pfister, H., Hughes, J. F., & Durand, F. (2005). Defocus video matting. In Proc. ACM SIGGRAPH (pp. 567–576).
Moreno-Noguer, F., Belhumeur, P. N., & Nayar, S. K. (2007). Active refocusing of images and videos. In Proc. ACM SIGGRAPH.
Nair, H., & Stewart, C. (1992). Robust focus ranging. In Proc. computer vision and pattern recognition (pp. 309–314).
Nayar, S., Watanabe, M., & Noguchi, M. (1996). Real-time focus range sensor. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(12), 1186–1198.
Ng, R. (2005). Fourier slice photography. In Proc. ACM SIGGRAPH (pp. 735–744).
Paris, S., Briceño, H., & Sillion, F. (2004). Capture of hair geometry from multiple images. In Proc. ACM SIGGRAPH (pp. 712–719).
Park, S. C., Park, M. K., & Kang, M. G. (2003). Super-resolution image reconstruction: a technical overview. IEEE Signal Processing Magazine, 20(3), 21–36.
Pentland, A. P. (1987). A new sense for depth of field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4), 523–531.
Rajagopalan, A. N., & Chaudhuri, S. (1999). An MRF model-based approach to simultaneous recovery of depth and restoration from defocused images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7), 577–589.
Schechner, Y. Y., & Kiryati, N. (2000). Depth from defocus vs. stereo: How different really are they? International Journal of Computer Vision, 39(2), 141–162.
Smith, W. J. (2000). Modern Optical Engineering (3rd ed.) New York: McGraw-Hill.
Subbarao, M., & Surya, G. (1994). Depth from defocus: A spatial domain approach. International Journal of Computer Vision, 13(3), 271–294.
Technical Innovations. http://www.robofocus.com/.
Vaish, V., Szeliski, R., Zitnick, C. L., & Kang, S. B. (2006). Reconstructing occluded surfaces using synthetic apertures: Stereo, focus and robust measures. In Proc. computer vision and pattern recognition (pp. 2331–2338).
Veeraraghavan, A., Raskar, R., Agrawal, A., Mohan, A., & Tumblin, J. (2007). Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing. In Proc. ACM SIGGRAPH.
Watanabe, M., & Nayar, S. K. (1997). Telecentric optics for focus analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(12), 1360–1365.
Watanabe, M., & Nayar, S. K. (1998). Rational filters for passive depth from defocus. International Journal of Computer Vision, 27(3), 203–225.
Webb, R. H. (1996). Confocal optical microscopy. Reports on Progress in Physics, 59(3), 427–471.
Wei, Y., Ofek, E., Quan, L., & Shum, H.-Y. (2005). Modeling hair from multiple views. In Proc. ACM SIGGRAPH (pp. 816–820).
Willson, R. (1994a). Modeling and calibration of automated zoom lenses. In Proc. SPIE #2350: Videometrics III (pp. 170–186).
Willson, R. (1994b). Modeling and calibration of automated zoom lenses. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Willson, R., & Shafer, S. (1994). What is the center of the image? Journal of the Optical Society of America A, 11(11), 2946–2955.
Xiong, Y., & Shafer, S. (1997). Moment and hypergeometric filters for high precision computation of focus, stereo and optical flow. International Journal of Computer Vision, 22(1), 25–59.
Zhang, L., & Nayar, S. K. (2006). Projection defocus analysis for scene capture and image display. In Proc. ACM SIGGRAPH (pp. 907–915).
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. In Proc. ACM SIGGRAPH (pp. 600–608).
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this work was done while the authors were visiting Microsoft Research Asia, in the roles of research intern and Visiting Scholar respectively.
Rights and permissions
About this article
Cite this article
Hasinoff, S.W., Kutulakos, K.N. Confocal Stereo. Int J Comput Vis 81, 82–104 (2009). https://doi.org/10.1007/s11263-008-0164-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-008-0164-2