Skip to main content
Log in

An automatic 2D to 3D video conversion approach based on RGB-D images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

3D movies/videos have become increasingly popular in the market; however, they are usually produced by professionals. This paper presents a new technique for the automatic conversion of 2D to 3D video based on RGB-D sensors, which can be easily conducted by ordinary users. To generate a 3D image, one approach is to combine the original 2D color image and its corresponding depth map together to perform depth image-based rendering (DIBR). An RGB-D sensor is one of the inexpensive ways to capture an image and its corresponding depth map. The quality of the depth map and the DIBR algorithm are crucial to this process. Our approach is twofold. First, the depth maps captured directly by RGB-D sensors are generally of poor quality because there are many regions missing depth information, especially near the edges of objects. This paper proposes a new RGB-D sensor based depth map inpainting method that divides the regions with missing depths into interior holes and border holes. Different schemes are used to inpaint the different types of holes. Second, an improved hole filling approach for DIBR is proposed to synthesize the 3D images by using the corresponding color images and the inpainted depth maps. Extensive experiments were conducted on different evaluation datasets. The results show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://www.youtube.com/playlist?list=PLARCNrViIwY2MgnLy94gqzk-JbSN3OD7D

References

  1. Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24

    Article  Google Scholar 

  2. Basso F, Menegatti E, Pretto A (2018) Robust intrinsic and extrinsic calibration of rgb-d cameras. IEEE Trans on Robot 34(5):1315–1332

    Article  Google Scholar 

  3. Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: The ACM special interest group on computer graphics, pp. 417–424

  4. Bertalmio M, Vese L, Sapiro G, Osher S (2003) Simultaneous structure and texture image inpainting. IEEE Trans Image Processing 12(8):882–889

    Article  Google Scholar 

  5. Bhattacharya S, Gupta S, Venkatesh KS (2014) High accuracy depth filtering for kinect using edge guided inpainting. In: International conference on advances in computing communications and informatics, pp. 868–874

  6. Chen L, He Y, Chen J, Li Q, Zou Q (2017) Transforming a 3-d lidar point cloud into a 2-d dense depth map through a parameter self-adaptive framework. IEEE trans Intell Transp Syst 18(1):165–176

    Article  Google Scholar 

  7. Chen Y, Hu H (2019) An improved method for semantic image inpainting with gans: Progressive inpainting. Neural Process Lett 49:1355–1367

    Article  Google Scholar 

  8. Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: CVPR, pp. 1475–1483

  9. Efros AA, Leung TK (1999) Texture synthesis by non-parametric sampling. In: International conference on computer vision, vol. 2, pp. 1033–1038

  10. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Conference on Neural Information Processing Systems, pp. 2366–2374

  11. Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187

    Article  Google Scholar 

  12. Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM (2020) Rethinking RGB-d salient object detection: Models, datasets, and large-scale benchmarks IEEE Trans Neur Net Lear

  13. Fan Q, Zhang L (2018) A novel patch matching algorithm for exemplar-based image inpainting. Multimed Tools Appl 77(9):10807–10821

    Article  Google Scholar 

  14. Fehn C (2003) A 3d-tv approach using depth-image-based rendering (dibr). In: The international association of science and technology for development international conference on visualization, imaging and image processing. benalmadena, Spain

  15. Fu H, Xu D, Lin S, Liu J (2015) Object-based rgbd image co-segmentation with mutex constraint. In: CVPR, pp. 4428–4436

  16. Fu K, Fan DP, Ji GP, Zhao Q (2020) Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: CVPR, pp. 3052–3062

  17. Hai-Tao Z, Yu J, Zeng-Fu W (2018) Probability contour guided depth map inpainting and superresolution using non-local total generalized variation. Multimed Tools Appl 77(7):9003–9020

    Article  Google Scholar 

  18. Hamout H, Elyousfi A (2020) Fast depth map intra coding for 3d video compression-based tensor feature extraction and data analysis. IEEE Trans Circuits Syst Video Technol 30(7):1933–1945

    Google Scholar 

  19. Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: ICIP, pp. 1440–1444

  20. Kang S, Kang M, Kim D, Ko S (2014) A novel depth image enhancement method based on the linear surface model. IEEE Trans Consum Electron 60(4):710–718

    Article  Google Scholar 

  21. Kao CC (2017) Stereoscopic image generation with depth image based rendering. Multimed Tools Appl 76(11):12981–12999

    Article  Google Scholar 

  22. Kim S, Ho Y (2012) Fast edge-preserving depth image upsampler. IEEE Trans Consum Electron 58(3):971–977

    Article  Google Scholar 

  23. Klingensmith M, Sirinivasa SS, Kaess M (2016) Articulated robot motion for simultaneous localization and mapping (arm-slam). IEEE Robot Auto Lett 1(2):1156–1163

    Article  Google Scholar 

  24. Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE International conference on robotics and automation, pp. 1817–1824

  25. Lee J, Lee D, Park R (2012) Robust exemplar-based inpainting algorithm using region segmentation. IEEE Trans Consum Electron 58(2):553–561

    Article  Google Scholar 

  26. Lei J, Zhang C, Wu M, You L, Fan K, Hou C (2017) A divide-and-conquer hole-filling method for handling disocclusion in single-view rendering. Multimed Tools Appl 76(6):7661–7676

    Article  Google Scholar 

  27. Liang C, Qi L, He Y (2018) Guan, l.: 3d human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans Circuits Syst Video Technol 28(10):2920–2932

    Article  Google Scholar 

  28. Liu J, Gong X, Liu J (2012) Guided inpainting and filtering for kinect depth maps. In: International conference on pattern recognition, pp. 2055–2058. IEEE

  29. Ma F, Cavalheiro GV, Karaman S (2019) Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In: ICRA

  30. Ma F, Karaman S (2018) Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA

  31. Mariwan Abdulla A (2020) Quality improvement for exemplar-based image inpainting using a modified searching mechanism. UHD J Sci Tech 4(1):1–8

    Article  Google Scholar 

  32. Mayer N, Ilg E, Häusser P., Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR

  33. McMillan Jr L (1997) An image-based approach to three-dimensional computer graphics. Ph.D. thesis, Dept. CS NC Chapel Hill Univ

  34. Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: IEEE International conference on information technology and computer science

  35. Miao D, Fu J, Lu Y, Li S, Chen CW (2012) Texture-assisted kinect depth inpainting. In: The IEEE international symposium on circuits and systems, pp. 604–607

  36. Minoli D (2010) 3DTV content capture encoding and transmission: building the transport infrastructure for commercial services

  37. Park H, Lee KM (2017) Look wider to match image patches with convolutional neural networks. IEEE Signal Process Lett 24(12):1788–1792

    Article  Google Scholar 

  38. Park J, Kim H, Tai YW, Brown MS, Kweon I (2011) High quality depth map upsampling for 3d-tof cameras. In: International conference on computer vision, pp. 1623–1630

  39. Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE Conference on computer vision and pattern recognition, pp. 2536–2544

  40. Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE International conference on information technology and computer science, pp. 2536–2544

  41. Richard MMOBB, Chang MYS (2001) Fast digital image inpainting. In: The international association of science and technology for development international conference on visualization, imaging and image processing, pp. 106–107

  42. Rui S, Hyunsuk K, Jay KCC (2014) Mcl-3d: a database for stereoscopic image quality assessment using 2d-image-plus-depth source. J Inf Sci Eng 31:1593–1611

    Google Scholar 

  43. Shih ML, Su SY, Kopf J (2020) Huang, J.B.: 3d photography using context-aware layered depth inpainting. In: CVPR

  44. Smolic A, Kauff P, Knorr S, Hornung A, Kunter M, Müller M., Lang M (2011) Three-dimensional video postproduction and processing. Proc of the IEEE 99(4):607–625

    Article  Google Scholar 

  45. Tao W, Jin H, Zhang Y (2007) Color image segmentation based on mean shift and normalized cuts. IEEE Trans Syst Man Cybern B Cybern 37 (5):1382–1389

    Article  Google Scholar 

  46. Telea A (2004) An image inpainting technique based on the fast marching method. J Graphics Tool 9(1):23–34

    Article  Google Scholar 

  47. Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: ECCV

  48. Wang W, Ramesh A, Zhu J, Li J, Zhao D (2020) Clustering of driving encounter scenarios using connected vehicle trajectories. IEEE Trans Intel Vehicles 5(3):485–496

    Article  Google Scholar 

  49. Xu Y, Zhu X, Shi J, Zhang G, Bao H, Li H (2019) Depth completion from sparse lidar data with depth-normal constraints. In: ICCV, pp. 2811–2820

  50. Yang J, Ye X, Li K, Hou C, Wang Y (2014) Color-guided depth recovery from rgb-d data using an adaptive autoregressive model. IEEE Trans Image Process 23(8):3443–3458

    Article  MathSciNet  MATH  Google Scholar 

  51. Yao L, Han Y, Li X (2019) Fast and high-quality virtual view synthesis from multi-view plus depth videos. Multimed Tools Appl 78(14):19325–19340

    Article  Google Scholar 

  52. Ying H, Zhang L, Luo G, Zhu Y (2015) A new disocclusion filling approach in depth image based rendering for stereoscopic imaging. In: Interface conference on control, automation and information sciences, pp. 313–317

  53. Yu Y, Song Y, Zhang Y, Wen S (2012) A shadow repair approach for kinect depth maps. In: Asian conference on computer vision, pp. 615–626. Springer

  54. Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(65):1–32

    MATH  Google Scholar 

  55. Zhang J, Fan DP, Dai Y, Anwar S, Sadat Saleh F, Zhang T, Barnes N (2020) Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In: CVPR

  56. Zhang L, Lan J, Yin H, Luo G, Zhu Y (2016) Kinect based 3d video generation. In: IADIS International conference computer graphics, visualization, computer vision and image processing, pp 278–282, Madeira, Portugal

  57. Zhang S, Zhu Y, Po LM (2011) A new depth-aided multidirectional disocclusion restoration method for depth-image-based rendering. In: International conference on information technology and computer science. ASME press

  58. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334

    Article  Google Scholar 

  59. Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10

    Article  Google Scholar 

  60. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

Download references

Acknowledgment

This study is supported by the research grants: The Science and Technology Development Fund of Macao SAR FDCT079/2016/A2, and MYRG2017-00218-FST, MYRG2018-00111-FST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baiyu Pan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, B., Zhang, L., Yin, H. et al. An automatic 2D to 3D video conversion approach based on RGB-D images. Multimed Tools Appl 80, 19179–19201 (2021). https://doi.org/10.1007/s11042-021-10662-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10662-0

Keywords

Navigation