Skip to main content
Log in

Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Depth information is an important ingredient in image-based rendering (IBR) systems. Traditional depth acquisition is mainly based on computer vision or depth sensing devices. With the advent of electronics, low-cost and high-speed depth acquisition devices, such as the recently launched Microsoft Kinect, are getting increasingly popular. A comprehensive review of these important and emerging problems and their solutions are thus highly desirable. This paper aims to 1) review and summarize the various approaches to depth acquisition and highlight their advantages and disadvantages, 2) review problems arising from calibration and imperfections of these devices and state-of-the-art solutions, and 3) propose a surface-normal-based joint-bilateral filtering method for fast spatial-only restoration of missing depth data and a confidence-based IBR algorithm for reducing artifacts under depth uncertainties. For the latter, we propose a confidence measure based on color-depth, spatial and restoration information. A joint color-depth Bayesian matting approach is proposed for refining the depth discontinuities and the alpha matte for rendering. Improved rendering results are obtained compared with rendering using conventional restored depth maps. Possible future work and research directions are also briefly outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

References

  1. Chen, S. E. (1995). QuickTime VR—an image-based approach to virtual environment navigation. In Proc. Annu. Comput. Graph. (SIGGRAPH’95), Aug., pp. 29–38.

  2. Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry – and image-based approach. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’96), Aug., pp. 11–20.

  3. Gortler, S. J., Grzeszczuk, R., Szeliski, R., & Cohen, M. F. (1996). The lumigraph. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’96), Aug., pp. 43–54.

  4. Levoy, M., & Hanrahan, P. (1996). Light field rendering. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH96), Aug., pp. 31–42.

  5. McMillan, L., & Bishop, G. (1995). Plenoptic modeling: An image-based rendering system. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH95), Aug., pp. 39–46.

  6. Peleg, S., & Herman, J. (1997). Panoramic mosaics by manifold projection. In Proc. IEEE Comput. Soc. Conf. CVPR, Jun., pp. 338–343.

  7. Szeliski, R., & Shum, H. Y. (1997). Creating full view panoramic image mosaics and environment maps. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH97), Aug., pp. 251–258.

  8. Shade, J., Gortler, S., He, L. W., & Szeliski, R. (1998). Layered depth images. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH98), Jul, pp. 231–242.

  9. Chen, S. E., & Williams, L. (1993). View interpolation for image synthesis. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH93), Aug., pp. 279–288.

  10. Shum, H. Y., & He, L. W. (1999). Rendering with concentric mosaics. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH99), Aug, pp. 299–306.

  11. Zhou, K., Hu, Y., Lin, S., Guo, B., & Shum, H. Y. (2005). Precomputed shadow fields for dynamic scenes. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH05), Aug., pp. 1196–1201.

  12. Shum, H. Y., Chan, S. C., & Kang, S. B. (2007). Image-based rendering. New York: Springer.

    Google Scholar 

  13. Adelson, E. H., & Bergen, J. (1991). The plenoptic function and the elements of early vision. In Comput. Models Visual Process (pp. 3–20). Cambridge: MIT Press.

  14. Chan, S. C., Shum, H. Y., & Ng, K. T. (2007). Image-based rendering and synthesis: technological advances and challenges. IEEE Signal Processing Magazine, 24(6), 22–33.

    Article  Google Scholar 

  15. Redert, P. A., Op de Beeck, M., Fehn, C., IJsselsteijn, W., Pollefeys, M., Van Gool, L., et al. (2002). ATTEST: Advanced Three-dimensional Television System Technologies. In Proc. of 1st Int. Symp. on 3D Processing, Visualization, Transmission (3DPVT) pp. 313–319.

  16. Op de Beeck, M., Wilinski, P., Fehn, C., & Kauff, P. (2002). Towards an optimized 3D broadcast chain. In ITCOM 2002, 3D-TV, Video and Display, SPIE Int. Symposium pp. 42–50.

  17. Blum, M., Springenberg, J. T., Wülfing, J., & Riedmiller, M. (2012). A learned feature descriptor for object recognition in RGB-D data. IEEE International Conference on Robotics and Automation (ICRA). St. Paul, Minnesota, USA.

  18. Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. In IEEE Int. Conf. on Intell. Robots and Systems (IROS).

  19. Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. In IEEE Int. Conf. Intell. Robots and Systems (IROS).

  20. Merkle, P., Smolic, A., Müller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In Proc. IEEE Int. Conf. Image Process. San Antonio, Texas, Sep., pp. 201–204.

  21. Liu, Y., Huang, Q., Ma, S., Zhao, D., Gao, W., Ci, S., et al. (2011). A novel rate control technique for multiview video plus depth based 3D video coding. IEEE Transactions on Broadcasting, 57(2), 562–571.

    Article  Google Scholar 

  22. Shao, F., Jiang, G., Yu, M., Chen, K., & Ho, Y. S. (2012). Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia, 14(1), 1–11.

    Article  Google Scholar 

  23. Herrera Castro, D., Kannala, J., & Heikkila, J. (2011). Accurate and practical calibration of a depth and color camera pair. In Int. Conf. Computer Analysis of Images and Pattern, vol. II, LNCS 6855 pp. 437–445.

  24. Zhang, C., & Zhang, Z. (2011). Calibration between depth and color sensors for commodity depth cameras. In Int. Workshop Hot Topics in 3D, in conjunction with ICME.

  25. Herrera Castro, D., Kannala, J., & Heikkila, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Trans. Pattern Anal. Mach. Intell., vol. 99, no. PrePrints, May.

  26. Smisek, J., Jancosek, M., & Pajdla, T. (2011). 3D with Kinect. In IEEE Workshop on Consumer Depth Cameras for Computer Vision.

  27. Khoshelham, K., & Oude Elberink, S. (2012). Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 1437–1454.

    Article  Google Scholar 

  28. Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In Proc. of International Conference on Robotics and Automation (ICRA).

  29. Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the Kinect to work. In ICCV Workshop on Consumer Depth Cameras in Computer Vision.

  30. Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proc. of the International Conference on Computer Vision- Workshop on 3D Representation and Recognition.

  31. Matyunin, S., Vatolin, D., Berdnikov, Y., & Smirnov, M. (2011). Temporal filtering for depth maps generated by Kinect depth camera. In 3DTV Conference: The true vision—capture, transmission and display of 3D video (3DTV-CON) (pp.1–4) May.

  32. Camplani, M., & Salgado, L. (2012). Efficient spatio temporal hole filling strategy for Kinect depth maps. In IS&T/SPIE Int. Conf. on 3D Image Processing (3DIP) and Applications, San Francisco Airport (CA), USA, SPIE vol. 8290, pp. 82900E 1–10, Jan.

  33. [Online]. Available: http://nicolas.burrus.name.

  34. Paris, S., & Durand, F. (2006). A fast approximation of the bilateral filter using a signal processing approach. In European Conf. Computer Vision (ECCV’06), Mar., pp. 568–580.

  35. Schaffalizky, F., & Zisserman, A. (2000). A six point solution for structure and motion. In European Conf. Computer Vision(ECCV’00), pp. 632–648.

  36. Salvi, J., Fernandez, S., Pribanic, T., & Llado, X. (2010). A state of the art in structured light patterns for surface profilometry. Pattern Recognition, 43(8), 2666–2680.

    Article  MATH  Google Scholar 

  37. Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1), 139–144.

    Article  Google Scholar 

  38. Zhang, L., & Seitz, S. (2005). Parameter estimation for MRF stereo. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 288–295.

  39. Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  40. Sun, J., Li, Y., Kang, S. B., & Shum, H. Y. (2005). Symmetric stereo matching for occlusion handling. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 399–406.

  41. Kolmogorov, V., & Zabih, R. (2001). Computation visual correspondence with occlusions using graph cuts. In Proc. Int. Conf. Comput. Vision, vol.2, Jul., pp. 508–515.

  42. Yang, Q., Wang, L., Yang, R., Stewénius, H., & Nistér, D. (2009). Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 492–504.

    Article  Google Scholar 

  43. Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In Proc. IEEE Int. Conf. Pattern Recognit., vol. 3, Sep., pp. 15–18.

  44. Wang, Z., & Zheng, Z. (2008). A region based stereo matching algorithm using cooperative optimization. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 887–894.

  45. Bleyer, M., Rother, C., & Kohli, P. (2010). Surface stereo with soft segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, Aug., pp. 1570–1577.

  46. Taguchi, Y., Wilburn, B., & Zitnick, L. (2008). Stereo reconstruction with mixed pixels using adaptive over-segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 2720–2727.

  47. [Online]. Available: http://www.ptgrey.com/products/stereo.asp.

  48. Robertson, D. P., & Cipolla, R. (2002). Building architectural models from many views using map constrains. In European Conf. Computer Vision(ECCV’02), pp. 155–169.

  49. Hartley, R., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68, 146–157.

    Article  Google Scholar 

  50. Triggs, P. M. B., Hartley, R., & Fiztgibbon, A. (2000). Bundle adjustment—a mordern synthesis. In Vision algorithms: Theory and practice. Springer-Verlag, (vol. 1883, pp. 298–372).

  51. Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge Univ Press.

    Google Scholar 

  52. Triggs, B. (1996). Factorization methods for projective structure and motion. IEEE Int. Conf. on Computer Vision & Pattern Recognition, pp. 845–851.

  53. Sturm, P., & Triggs, B. (1996). A factorization based algorithm for multi-image projective structure and motion. In European Conf. Computer Vision(ECCV’96), pp. 709–720.

  54. Heyden, A., Berthilsson, R., & Sparr, G. (1999). An iterative factorization method for projective structure and motion from image sequences. Image and Vision Computing, 17, 981–991.

    Article  Google Scholar 

  55. Heyden, A. (1997) Projective structure and motion from image sequences using subspace methods. In Conf. Image Analysis, pp. 963–968.

  56. Tang, W. T., & Hung, Y. S. (2006). A column-space approach to projective reconstruction. Computer Vision and Image Understanding, 101, 166–176.

    Article  Google Scholar 

  57. Tang, W. K., & Hung, Y. S. (2006). A subspace method for projective reconstruction from multiple images with missing data. Image and Vision Computing, 54, 515–524.

    Article  Google Scholar 

  58. [Online]. Available: http://www.vicon.com/boujou/.

  59. Papagiannakis, G., Schertenleib, S., O’Kennedy, B., Arevalo-Poizat, M., Magnenat-Thalmann, N., & Thalmann, D. (2005). Mixing virtual and real scenes in the site of ancient Pompeii. Computer Animation and Virtual Worlds, 16(1), 11–24.

    Article  Google Scholar 

  60. Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., et al. (2000). The digital Michelangelo Project: 3D scanning of large statues. In Proc. Annu. Comput. Graph. (SIGGRAPH00), pp. 131–144.

  61. Ikeuchi, K., Nakazawa, A., Hasegawa, K., & Ohishi, T. (2003). The Great Buddha Project: Modeling cultural heritage for VR systems through observation. In Proc. of the 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 7–16.

  62. Kovacs, L., Zimmermann, A., Brockmann, G., Baurecht, H., Schwenzer-Zimmerer, K., Papadopulos, N. A., et al. (2006). Accuracy and precision of the three-dimensional assessment of the facial surface using a 3-D laser scanner. IEEE Transactions on Medical Imaging, 25(6), 742–754.

    Article  Google Scholar 

  63. Foix, S., Alenyà, G., & Torras, C. (2011). Lock-in time-of-flight (ToF) cameras: a survey. IEEE Sensors Journal, 11(9), 1917–1926.

    Article  Google Scholar 

  64. Cui, Y., Schuon, S., Chan, D., Thrun, S., & Theobalt, C. (2010). 3D shape scanning with a time-of-flight camera. In Proc. IEEE Comput. Soc. Conf. CVPR, 2010, pp. 1173–1180.

  65. Bartczak, B., Schiller, I., Beder, C., & Koch, R. (2008). Integration of a time-of-flight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in real-time. In Proc. of the 3DPVT Workshop, Jun.

  66. Schiller, I., Bartczak, B., Kellner, F., & Koch, R. (2010). Increasing realism and supporting content planning for dynamic scenes in a mixed reality system incorporating a time-of-flight camera. Journal of Virtual Reality and Broadcasting, 7(4) urn:nbn:de:0009-6-25786, ISSN 1860–2037.

    Google Scholar 

  67. Bohme, M., Haker, M., Martinetz, T., & Barth, E. (2008). A facial feature tracker for human-computer interaction based on 3D time-of-flight cameras. International Journal of Intelligent Systems Technologies and Applications, 5(3/4), 264–273.

    Article  Google Scholar 

  68. Kolb, A., Barth, E., Koch, R., & Larsen, R. (2010). Time-of-flight sensors in computer graphics. Computer Graphics Forum, 29(1), 141–159.

    Article  Google Scholar 

  69. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R. A., Kohli, P., et al. (2011). KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. ACM UIST Symposium, pp. 559–568.

  70. Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al. (2011). KinectFusion: Real-time dense surface mapping and tracking. In Proc. Int. Conf. Research, Tech. Application Mixed Augmented Reality, pp. 127–136.

  71. Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012) Scanning 3D full human bodies using kinects. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643–650.

    Google Scholar 

  72. Cui, Y., & Stricker, D. (2011). 3D shape scanning with a Kinect. In Proc. Annu. Comput. Graph. (SIGGRAPH11), pp. 57–57.

  73. Lai, K., Bo, L., Ren, X., & Fox, D. (2011). Sparse distance learning for object recognition combining RGB and depth information. In Proc. Int. Conf. Robotics Automation (ICRA), pp. 4007–4013.

  74. Simonyan, K., Grishin, S., & Vatolin, D. (2008). Confidence measure for block-based motion vector field. In Proc. GraphiCon, pp. 110–113.

  75. Chuang, Y., Curless, B., Salesin, D. H., & Szeliski, R. (2001). A Bayesian approach to digital matting. In Proc. IEEE Comput. Soc. Conf. CVPR, Dec., vol. II, pp. 264–271.

  76. Berman, A., Dadourian, A., & Vlahos, P. (2000). Method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,346.

  77. Berman, A., Vlahos, P., & Dadourian, A. (2000). Comprehensive method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,345.

  78. McMillan, L. (1997). An image-based approach to three-dimensional computer graphics, PhD thesis, University of North Carolina, Chapel Hill, USA, Apr..

  79. Morvan, Y. (2009). Acquisition, compression and rendering of depth and texture for multi-view video, PhD thesis, Eindhoven University of Technology, The Netherlands, Jun.

  80. Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.

    Article  MathSciNet  Google Scholar 

  81. Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Image Processing, 54(11), 4311–4322.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shing-Chow Chan.

Additional information

This work was supported in part by Hong Kong Research Grant Council (RGC) and the Innovation and Technology fund (ITF).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhu, ZY., Chan, SC. et al. Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems. J Sign Process Syst 79, 1–18 (2015). https://doi.org/10.1007/s11265-013-0819-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0819-2

Keywords

Navigation