Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems

Wang, Chong; Zhu, Zhen-Yu; Chan, Shing-Chow; Shum, Heung-Yeung

doi:10.1007/s11265-013-0819-2

Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems

Published: 20 July 2013

Volume 79, pages 1–18, (2015)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Chong Wang¹,
Zhen-Yu Zhu¹,
Shing-Chow Chan¹ &
…
Heung-Yeung Shum²

1139 Accesses
104 Citations
3 Altmetric
Explore all metrics

Abstract

Depth information is an important ingredient in image-based rendering (IBR) systems. Traditional depth acquisition is mainly based on computer vision or depth sensing devices. With the advent of electronics, low-cost and high-speed depth acquisition devices, such as the recently launched Microsoft Kinect, are getting increasingly popular. A comprehensive review of these important and emerging problems and their solutions are thus highly desirable. This paper aims to 1) review and summarize the various approaches to depth acquisition and highlight their advantages and disadvantages, 2) review problems arising from calibration and imperfections of these devices and state-of-the-art solutions, and 3) propose a surface-normal-based joint-bilateral filtering method for fast spatial-only restoration of missing depth data and a confidence-based IBR algorithm for reducing artifacts under depth uncertainties. For the latter, we propose a confidence measure based on color-depth, spatial and restoration information. A joint color-depth Bayesian matting approach is proposed for refining the depth discontinuities and the alpha matte for rendering. Improved rendering results are obtained compared with rendering using conventional restored depth maps. Possible future work and research directions are also briefly outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Resolution Depth Refinement by Photometric and Multi-shading Constraints

Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey

A new depth image quality metric using a pair of color and depth images

Article 05 March 2016

References

Chen, S. E. (1995). QuickTime VR—an image-based approach to virtual environment navigation. In Proc. Annu. Comput. Graph. (SIGGRAPH’95), Aug., pp. 29–38.
Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry – and image-based approach. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’96), Aug., pp. 11–20.
Gortler, S. J., Grzeszczuk, R., Szeliski, R., & Cohen, M. F. (1996). The lumigraph. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’96), Aug., pp. 43–54.
Levoy, M., & Hanrahan, P. (1996). Light field rendering. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’96), Aug., pp. 31–42.
McMillan, L., & Bishop, G. (1995). Plenoptic modeling: An image-based rendering system. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’95), Aug., pp. 39–46.
Peleg, S., & Herman, J. (1997). Panoramic mosaics by manifold projection. In Proc. IEEE Comput. Soc. Conf. CVPR, Jun., pp. 338–343.
Szeliski, R., & Shum, H. Y. (1997). Creating full view panoramic image mosaics and environment maps. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’97), Aug., pp. 251–258.
Shade, J., Gortler, S., He, L. W., & Szeliski, R. (1998). Layered depth images. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’98), Jul, pp. 231–242.
Chen, S. E., & Williams, L. (1993). View interpolation for image synthesis. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’93), Aug., pp. 279–288.
Shum, H. Y., & He, L. W. (1999). Rendering with concentric mosaics. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’99), Aug, pp. 299–306.
Zhou, K., Hu, Y., Lin, S., Guo, B., & Shum, H. Y. (2005). Precomputed shadow fields for dynamic scenes. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’05), Aug., pp. 1196–1201.
Shum, H. Y., Chan, S. C., & Kang, S. B. (2007). Image-based rendering. New York: Springer.
Google Scholar
Adelson, E. H., & Bergen, J. (1991). The plenoptic function and the elements of early vision. In Comput. Models Visual Process (pp. 3–20). Cambridge: MIT Press.
Chan, S. C., Shum, H. Y., & Ng, K. T. (2007). Image-based rendering and synthesis: technological advances and challenges. IEEE Signal Processing Magazine, 24(6), 22–33.
Article Google Scholar
Redert, P. A., Op de Beeck, M., Fehn, C., IJsselsteijn, W., Pollefeys, M., Van Gool, L., et al. (2002). ATTEST: Advanced Three-dimensional Television System Technologies. In Proc. of 1st Int. Symp. on 3D Processing, Visualization, Transmission (3DPVT) pp. 313–319.
Op de Beeck, M., Wilinski, P., Fehn, C., & Kauff, P. (2002). Towards an optimized 3D broadcast chain. In ITCOM 2002, 3D-TV, Video and Display, SPIE Int. Symposium pp. 42–50.
Blum, M., Springenberg, J. T., Wülfing, J., & Riedmiller, M. (2012). A learned feature descriptor for object recognition in RGB-D data. IEEE International Conference on Robotics and Automation (ICRA). St. Paul, Minnesota, USA.
Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. In IEEE Int. Conf. on Intell. Robots and Systems (IROS).
Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. In IEEE Int. Conf. Intell. Robots and Systems (IROS).
Merkle, P., Smolic, A., Müller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In Proc. IEEE Int. Conf. Image Process. San Antonio, Texas, Sep., pp. 201–204.
Liu, Y., Huang, Q., Ma, S., Zhao, D., Gao, W., Ci, S., et al. (2011). A novel rate control technique for multiview video plus depth based 3D video coding. IEEE Transactions on Broadcasting, 57(2), 562–571.
Article Google Scholar
Shao, F., Jiang, G., Yu, M., Chen, K., & Ho, Y. S. (2012). Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia, 14(1), 1–11.
Article Google Scholar
Herrera Castro, D., Kannala, J., & Heikkila, J. (2011). Accurate and practical calibration of a depth and color camera pair. In Int. Conf. Computer Analysis of Images and Pattern, vol. II, LNCS 6855 pp. 437–445.
Zhang, C., & Zhang, Z. (2011). Calibration between depth and color sensors for commodity depth cameras. In Int. Workshop Hot Topics in 3D, in conjunction with ICME.
Herrera Castro, D., Kannala, J., & Heikkila, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Trans. Pattern Anal. Mach. Intell., vol. 99, no. PrePrints, May.
Smisek, J., Jancosek, M., & Pajdla, T. (2011). 3D with Kinect. In IEEE Workshop on Consumer Depth Cameras for Computer Vision.
Khoshelham, K., & Oude Elberink, S. (2012). Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 1437–1454.
Article Google Scholar
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In Proc. of International Conference on Robotics and Automation (ICRA).
Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the Kinect to work. In ICCV Workshop on Consumer Depth Cameras in Computer Vision.
Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proc. of the International Conference on Computer Vision- Workshop on 3D Representation and Recognition.
Matyunin, S., Vatolin, D., Berdnikov, Y., & Smirnov, M. (2011). Temporal filtering for depth maps generated by Kinect depth camera. In 3DTV Conference: The true vision—capture, transmission and display of 3D video (3DTV-CON) (pp.1–4) May.
Camplani, M., & Salgado, L. (2012). Efficient spatio temporal hole filling strategy for Kinect depth maps. In IS&T/SPIE Int. Conf. on 3D Image Processing (3DIP) and Applications, San Francisco Airport (CA), USA, SPIE vol. 8290, pp. 82900E 1–10, Jan.
[Online]. Available: http://nicolas.burrus.name.
Paris, S., & Durand, F. (2006). A fast approximation of the bilateral filter using a signal processing approach. In European Conf. Computer Vision (ECCV’06), Mar., pp. 568–580.
Schaffalizky, F., & Zisserman, A. (2000). A six point solution for structure and motion. In European Conf. Computer Vision(ECCV’00), pp. 632–648.
Salvi, J., Fernandez, S., Pribanic, T., & Llado, X. (2010). A state of the art in structured light patterns for surface profilometry. Pattern Recognition, 43(8), 2666–2680.
Article MATH Google Scholar
Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1), 139–144.
Article Google Scholar
Zhang, L., & Seitz, S. (2005). Parameter estimation for MRF stereo. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 288–295.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Article Google Scholar
Sun, J., Li, Y., Kang, S. B., & Shum, H. Y. (2005). Symmetric stereo matching for occlusion handling. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 399–406.
Kolmogorov, V., & Zabih, R. (2001). Computation visual correspondence with occlusions using graph cuts. In Proc. Int. Conf. Comput. Vision, vol.2, Jul., pp. 508–515.
Yang, Q., Wang, L., Yang, R., Stewénius, H., & Nistér, D. (2009). Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 492–504.
Article Google Scholar
Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In Proc. IEEE Int. Conf. Pattern Recognit., vol. 3, Sep., pp. 15–18.
Wang, Z., & Zheng, Z. (2008). A region based stereo matching algorithm using cooperative optimization. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 887–894.
Bleyer, M., Rother, C., & Kohli, P. (2010). Surface stereo with soft segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, Aug., pp. 1570–1577.
Taguchi, Y., Wilburn, B., & Zitnick, L. (2008). Stereo reconstruction with mixed pixels using adaptive over-segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 2720–2727.
[Online]. Available: http://www.ptgrey.com/products/stereo.asp.
Robertson, D. P., & Cipolla, R. (2002). Building architectural models from many views using map constrains. In European Conf. Computer Vision(ECCV’02), pp. 155–169.
Hartley, R., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68, 146–157.
Article Google Scholar
Triggs, P. M. B., Hartley, R., & Fiztgibbon, A. (2000). Bundle adjustment—a mordern synthesis. In Vision algorithms: Theory and practice. Springer-Verlag, (vol. 1883, pp. 298–372).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge Univ Press.
Google Scholar
Triggs, B. (1996). Factorization methods for projective structure and motion. IEEE Int. Conf. on Computer Vision & Pattern Recognition, pp. 845–851.
Sturm, P., & Triggs, B. (1996). A factorization based algorithm for multi-image projective structure and motion. In European Conf. Computer Vision(ECCV’96), pp. 709–720.
Heyden, A., Berthilsson, R., & Sparr, G. (1999). An iterative factorization method for projective structure and motion from image sequences. Image and Vision Computing, 17, 981–991.
Article Google Scholar
Heyden, A. (1997) Projective structure and motion from image sequences using subspace methods. In Conf. Image Analysis, pp. 963–968.
Tang, W. T., & Hung, Y. S. (2006). A column-space approach to projective reconstruction. Computer Vision and Image Understanding, 101, 166–176.
Article Google Scholar
Tang, W. K., & Hung, Y. S. (2006). A subspace method for projective reconstruction from multiple images with missing data. Image and Vision Computing, 54, 515–524.
Article Google Scholar
[Online]. Available: http://www.vicon.com/boujou/.
Papagiannakis, G., Schertenleib, S., O’Kennedy, B., Arevalo-Poizat, M., Magnenat-Thalmann, N., & Thalmann, D. (2005). Mixing virtual and real scenes in the site of ancient Pompeii. Computer Animation and Virtual Worlds, 16(1), 11–24.
Article Google Scholar
Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., et al. (2000). The digital Michelangelo Project: 3D scanning of large statues. In Proc. Annu. Comput. Graph. (SIGGRAPH’00), pp. 131–144.
Ikeuchi, K., Nakazawa, A., Hasegawa, K., & Ohishi, T. (2003). The Great Buddha Project: Modeling cultural heritage for VR systems through observation. In Proc. of the 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 7–16.
Kovacs, L., Zimmermann, A., Brockmann, G., Baurecht, H., Schwenzer-Zimmerer, K., Papadopulos, N. A., et al. (2006). Accuracy and precision of the three-dimensional assessment of the facial surface using a 3-D laser scanner. IEEE Transactions on Medical Imaging, 25(6), 742–754.
Article Google Scholar
Foix, S., Alenyà, G., & Torras, C. (2011). Lock-in time-of-flight (ToF) cameras: a survey. IEEE Sensors Journal, 11(9), 1917–1926.
Article Google Scholar
Cui, Y., Schuon, S., Chan, D., Thrun, S., & Theobalt, C. (2010). 3D shape scanning with a time-of-flight camera. In Proc. IEEE Comput. Soc. Conf. CVPR, 2010, pp. 1173–1180.
Bartczak, B., Schiller, I., Beder, C., & Koch, R. (2008). Integration of a time-of-flight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in real-time. In Proc. of the 3DPVT Workshop, Jun.
Schiller, I., Bartczak, B., Kellner, F., & Koch, R. (2010). Increasing realism and supporting content planning for dynamic scenes in a mixed reality system incorporating a time-of-flight camera. Journal of Virtual Reality and Broadcasting, 7(4) urn:nbn:de:0009-6-25786, ISSN 1860–2037.
Google Scholar
Bohme, M., Haker, M., Martinetz, T., & Barth, E. (2008). A facial feature tracker for human-computer interaction based on 3D time-of-flight cameras. International Journal of Intelligent Systems Technologies and Applications, 5(3/4), 264–273.
Article Google Scholar
Kolb, A., Barth, E., Koch, R., & Larsen, R. (2010). Time-of-flight sensors in computer graphics. Computer Graphics Forum, 29(1), 141–159.
Article Google Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R. A., Kohli, P., et al. (2011). KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. ACM UIST Symposium, pp. 559–568.
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al. (2011). KinectFusion: Real-time dense surface mapping and tracking. In Proc. Int. Conf. Research, Tech. Application Mixed Augmented Reality, pp. 127–136.
Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012) Scanning 3D full human bodies using kinects. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643–650.
Google Scholar
Cui, Y., & Stricker, D. (2011). 3D shape scanning with a Kinect. In Proc. Annu. Comput. Graph. (SIGGRAPH’11), pp. 57–57.
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). Sparse distance learning for object recognition combining RGB and depth information. In Proc. Int. Conf. Robotics Automation (ICRA), pp. 4007–4013.
Simonyan, K., Grishin, S., & Vatolin, D. (2008). Confidence measure for block-based motion vector field. In Proc. GraphiCon, pp. 110–113.
Chuang, Y., Curless, B., Salesin, D. H., & Szeliski, R. (2001). A Bayesian approach to digital matting. In Proc. IEEE Comput. Soc. Conf. CVPR, Dec., vol. II, pp. 264–271.
Berman, A., Dadourian, A., & Vlahos, P. (2000). Method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,346.
Berman, A., Vlahos, P., & Dadourian, A. (2000). Comprehensive method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,345.
McMillan, L. (1997). An image-based approach to three-dimensional computer graphics, PhD thesis, University of North Carolina, Chapel Hill, USA, Apr..
Morvan, Y. (2009). Acquisition, compression and rendering of depth and texture for multi-view video, PhD thesis, Eindhoven University of Technology, The Netherlands, Jun.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.
Article MathSciNet Google Scholar
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Image Processing, 54(11), 4311–4322.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
Chong Wang, Zhen-Yu Zhu & Shing-Chow Chan
Microsoft Corporation, Redmond, WA, USA
Heung-Yeung Shum

Authors

Chong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen-Yu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shing-Chow Chan
View author publications
You can also search for this author in PubMed Google Scholar
Heung-Yeung Shum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shing-Chow Chan.

Additional information

This work was supported in part by Hong Kong Research Grant Council (RGC) and the Innovation and Technology fund (ITF).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhu, ZY., Chan, SC. et al. Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems. J Sign Process Syst 79, 1–18 (2015). https://doi.org/10.1007/s11265-013-0819-2

Download citation

Received: 22 January 2013
Revised: 26 June 2013
Accepted: 26 June 2013
Published: 20 July 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11265-013-0819-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems

Abstract

Access this article

Similar content being viewed by others

High-Resolution Depth Refinement by Photometric and Multi-shading Constraints

Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey

A new depth image quality metric using a pair of color and depth images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems

Abstract

Access this article

Similar content being viewed by others

High-Resolution Depth Refinement by Photometric and Multi-shading Constraints

Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey

A new depth image quality metric using a pair of color and depth images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation